Datadog APM and distributed tracing provide teams with an end-to-end view of requests across services, uncovering dependencies and performance bottlenecks to enable real-time troubleshooting and optimization. However, traditional manual instrumentation, while customizable, is often time consuming, error prone, and resource intensive, requiring developers to configure each service individually and closely collaborate with SRE teams. In complex, dynamic environments, this approach can lead to gaps in visibility and increased challenges in maintaining comprehensive observability.
Recognizing the need for a faster, more reliable way to achieve comprehensive visibility, Datadog is introducing Single Step Instrumentation—a new APM configuration mechanism that activates distributed tracing across all critical services with just one command. Where it may have taken hours to manually instrument all your critical services, Single Step Instrumentation reduces this process to just minutes. This eliminates the need for frontend code changes or manually adding the APM SDK to individual applications, ensuring seamless and immediate observability.
In this blog post, we’ll dive into how Single Step Instrumentation provides you with a fast, efficient way to enable distributed tracing.
Send traces from your services in minutes
Single Step Instrumentation is now generally available for services running on Linux operating systems and Docker containers hosted on Linux. This new mechanism simplifies the process of enabling distributed tracing, whether you’re installing the Datadog Agent on a new host for the first time or adding tracing to a host that already has the Agent running.
Let’s consider Shopist, a rapidly growing e-commerce platform serving millions of customers that operates on AWS with applications running on Amazon Linux. Its microservices architecture is divided into Java-based services for critical business logic, such as login and checkout, and Python-based services for data science tasks, including personalized recommendations and dynamic pricing.
While this architecture provides scalability and flexibility, it also introduces complexity, especially as user traffic grows and complaints about slow checkouts and delayed recommendations increase. Logs and metrics offer only fragmented snapshots of individual services, leaving the team without the visibility needed to trace requests end-to-end or identify bottlenecks. Recognizing these limitations, the team realizes they need distributed tracing to address these challenges effectively.
Enable distributed tracing with a single command
Single Step Instrumentation provides exactly what Shopist’s team needs to enable distributed tracing seamlessly across their Linux-based services. With no SDK integrations or code changes required, the SRE team can instrument all critical services on their Linux hosts with a single command.
To get started with Single Step Instrumentation, the team decides to instrument APM while configuring the Datadog Agent on their Linux hosts. They first navigate to the Datadog Agent installation page in the Datadog UI and enable APM Instrumentation with a simple toggle. Before running the installation command in the terminal of their host, the SRE team setting up Datadog also specifies that only the Java and Python tracers are necessary.
The Single Step APM SDK injector is then loaded alongside the main Datadog Agent modules, where it automatically detects and instruments the necessary processes, such as identifying the Java and Python services. This seamless auto-instrumentation enables the SRE team at Shopist to begin receiving detailed traces for all their service calls in Datadog APM within minutes. As the traces populate, the team starts analyzing the data to identify potential bottlenecks and issues impacting their platform’s performance.
One trace immediately highlights a persistent error during calls to the POST /add_purchases
endpoint of the Product-Recommendation
service, where latency spikes and request failures are observed. By examining the flame graph and stack trace, the team identifies the issue within the RecommendationController.addPurchases
method, which is making a query to the Cassandra database. The query relies on ALLOW FILTERING
, a known inefficiency that bypasses indexed lookups, leading to high latency and occasional failures when no nodes are available to execute the query.
With Datadog’s insights, the team rewrites the query to use indexed keys instead of filtering, significantly reducing the processing burden on the database. To further address the problem, they add more nodes to the Cassandra cluster, ensuring adequate capacity during peak traffic. Finally, they configure alerts in Datadog to monitor query latency and detect regressions, providing a safeguard against similar issues in the future.
Once the fixes are deployed, the team observes an immediate drop in error rates for the POST /add_purchases
endpoint, with latency improving significantly. The Product-Recommendation
service stabilizes, enabling seamless purchase updates and ensuring a smoother user experience. Because of Single Step Instrumentation and Datadog APM, the team resolves a critical bottleneck efficiently, demonstrating the power of real-time observability for optimizing complex systems.
Get started with Single Step Instrumentation and APM today
Single Step Instrumentation allows you to start monitoring your entire applications in minutes with one command—no application code changes required. This instrumentation method is generally available for services running in Linux OS or as Docker containers running in Linux, specifically for Java, .NET, Python, and Node.js applications.
Check out our documentation to find more information about Single Step Instrumentation and the environments currently supported as well as distributed tracing. And if you aren’t already a Datadog customer, get started with a 14-day free trial.