Get Deeper Visibility Into Your AWS Serverless Apps With Enhanced Distributed Tracing | Datadog

Get deeper visibility into your AWS serverless apps with enhanced distributed tracing

Author Sumedha Mehta

Published: 11月 22, 2024

Serverless or event-driven applications can comprise many different distributed components, including serverless compute services such as AWS Lambda and AWS Fargate for Amazon ECS, as well as managed data streams, data stores, workflow orchestration tools, queues, and more. Having full end-to-end visibility into requests as they propagate across all of these parts of your application is crucial to monitoring performance, locating affected up- or downstream services, and troubleshooting issues.

With Datadog APM, you can instrument your serverless workloads to get deep insight into requests as they flow across your functions, containers, and other infrastructure components. But due to the decoupled nature of serverless architectures, there can still be challenges in getting comprehensive visibility into each part of your application, leading to observability gaps.

We are excited to announce enhancements to Datadog’s distributed tracing for serverless applications that provide visibility into additional serverless patterns. In this post, we’ll look at how you can use Datadog Serverless Monitoring to:

Span Auto-linking for Amazon S3 and DynamoDB change events

A common pattern in serverless applications is to have a state change in, for example, an S3 bucket or a DynamoDB table trigger a Lambda function that in turn initiates a downstream set of events. You can use S3 Event Notifications to start a machine learning pipeline to process image files as they are uploaded to S3. Or you can use a DynamoDB change streams to load personalized recommendations when a user updates their profile in your application.

Because Amazon S3 and DynamoDB are cloud-managed data stores, installing a tracing library within these services is not currently possible. This can make it difficult to understand the cause and effect of change events, such as why an upstream payload or business logic may have caused a downstream change event to fail or produce undesirable results.

With Span Auto-linking, Datadog can now automatically identify traces that are related to S3 Event Notifications or DynamoDB stream events and connect them to provide more complete end-to-end visibility. For each relevent span within a trace flame graph, Datadog lists the linked traces along with context, such as whether the linked span occurs before or after the current span. This way, you can easily navigate your distributed trace through related spans that could be causing downstream failures.

Span Auto-linking detects related upstream and downstream events

Enhanced AWS Step Functions trace propagation

AWS Step Functions is often one part of a broader serverless application that also uses other managed services, most frequently AWS Lambda functions. The ability to troubleshoot the path of a request from an upstream service all the way to each individual state within a workflow execution is crucial to get visibility into errors or latency in your service.

AWS recently announced that customers of AWS Step Functions can now define state machine payload arguments with JSONata, as well as use workflow variables. JSONata is a declarative open source query and transformation language for JSON data. Compared to JSONPath, which was previously the only way developers could pass payloads to each state in a state machine, JSONata provides more flexibility to transform variables in-line without having to do computation within explicit states. Datadog instrumentation fully supports JSONata-format serverless payloads for standard tracing. Additionally, you can take advantage of JSONata to add deeper trace context to your AWS Step Functions workflows, meaning that you can link up- and downstream Lambda function spans to your state machines.

In order to achieve this, any state machine that is invoking an AWS Lambda function can be modified with the following JSONata format:

{
  "StartAt": "Invoke Lambda",
  "QueryLanguage": "JSONata",
  "States": {
    },
    "Invoke Lambda": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Arguments": {
        "Payload": "{% (\$execInput := \$states.context.Execution.Input; \$hasDatadogTraceId := \$exists(\$execInput._datadog.`x-datadog-trace-id`); \$hasDatadogRootExecutionId := \$exists(\$execInput._datadog.RootExecutionId); \$ddTraceContext := \$hasDatadogTraceId ? {'x-datadog-trace-id': \$execInput._datadog.`x-datadog-trace-id`, 'x-datadog-tags': \$execInput._datadog.`x-datadog-tags`} : {'RootExecutionId': \$hasDatadogRootExecutionId ? \$execInput._datadog.RootExecutionId : \$states.context.Execution.Id}; \$sfnContext := \$merge([\$states.context, {'Execution': \$sift(\$states.context.Execution, function(\$v, \$k) { \$k != 'Input' })}]); \$merge([{'_datadog': \$merge([\$sfnContext, \$ddTraceContext, {'serverless-version': 'v2'}])}, \$sift(\$states.input, function(\$v, \$k) { \$k != '_datadog' })])) %}",
        "FunctionName": "..."
      },
      "End": true
  }
}

With the additional flexibility provided by JSONata, you can now see an upstream Lambda function linked to your state machines and other nested Lambda traces, all visualized via a flame graph or waterfall view. In the example below, you can see an AWS Lambda function (Lambda1) trace propagating through several state machines, including StateMachine2. StateMachine2 invokes two separate Lambda functions, both of which have been instrumented with the Datadog Lambda Extension, showing operations such as making a call to S3.

Add additional trace context to connect function traces to Step Function executions.

Having all the context around your services that are upstream from an AWS Step Functions workflow and its execution can decrease your investigation time by providing full visibility into how those upstream services might be impacting the execution of your state machine, or enabling you to see business logic issues within your state machines that might be causing anomalous scenarios.

Get started today

Serverless applications depend on requests propagating across a variety of decoupled components, introducing unique monitoring challenges. Datadog now provides enriched support for connecting trace data between parts of your serverless environment. This, in addition to out-of-the-box automatic instrumentation of popular serverless technologies, cold start tracing, support for trace propagation across Datadog RUM, LLM Observability, and Data Streams Monitoring, and more, means that Datadog provides even deeper end-to-end insights into application performance and the ability to troubleshoot issues even faster.

To get started, see our documentation for Span Auto-linking and instrumenting AWS Step Functions. If you’re not already a customer, start a .