AWS Lambda enables engineering teams to build modern, scalable services without the need to provision underlying infrastructure resources. But monitoring Lambda functions requires visibility into performance indicators that differ from those of traditional architectures—and cold starts are a key example. Cold starts occur when a serverless compute platform like Lambda needs to create a new execution environment in order to serve a request, which increases the application’s overall response time.
Datadog Serverless Monitoring already detects cold starts in Lambda functions, visualizes their impact on services via distributed tracing, and lets users set alerts based on the rates at which cold starts occur. Today, we’re proud to announce our partnership with AWS to provide support for AWS Lambda SnapStart, a new feature for functions running the Amazon Corretto Java 11 runtime. Lambda SnapStart improves startup performance for latency-sensitive Java applications by up to 10 times at no extra cost, and typically without modification to function code. Now teams can efficiently mitigate cold starts by enabling SnapStart for their functions and using Datadog Serverless Monitoring to monitor how the new AWS Lambda feature affects application performance.
In this post, we’ll overview how teams can:
- Use Datadog to detect cold starts and understand their performance impact
- Reduce cold start latency with Lamba SnapStart and monitor these changes in Datadog
Detect cold starts and understand their effect on performance
Datadog Serverless Monitoring provides curated insights into the performance of serverless applications, including the rate of cold starts. One key insight indicates when cold starts are affecting more than 1 percent of a function’s invocations.
Teams can also use the Datadog Lambda Extension, which leverages the new Lambda Telemetry API, to view cold starts in context with other application activity within Datadog APM. This visibility enables them to easily see how cold starts from individual functions affect the performance of downstream dependencies, such as API endpoints or other functions.
Datadog takes this visibility a step further by enabling teams to easily create monitors based on their service level objectives and specific performance indicators, such as the rate at which a cold start occurs for a particular function. This ensures that they are notified of significant shifts in cold start activity across any of their functions. For example, the following screenshot shows a monitor that will trigger an alert when more than 20 percent of a function’s invocations were cold starts:
Reduce cold start latency with Lambda SnapStart
AWS Lambda SnapStart delivers performance improvements by initializing a function’s code once when it is published, saving a snapshot of the execution environment’s memory and disk state, and then caching and reusing that state for faster access. To enable SnapStart for Java functions, teams can leverage their existing infrastructure-as-code or other deployment tooling, such as AWS CDK, AWS SAM, and AWS CloudFormation—check out AWS’s post for more details about Lambda SnapStart configurations and technical considerations, such as maintaining unique identifiers.
Whatever method users choose for enabling SnapStart, they can also instrument their newly deployed functions with the Datadog Lambda Extension. Datadog will automatically detect new functions and overlay metric time-series charts with the relevant deployment events, so teams can get complete visibility into how their serverless architecture changes over time.
If Datadog identifies cold starts in a particular function, teams can use the Serverless view to inspect the function’s current configuration and determine if it is running a supported runtime for SnapStart. They can then pivot directly to the AWS Console from this view to update the function as needed.
Complete visibility into the impact of cold starts on Lambda functions
Lambda SnapStart helps developers optimize their Java functions and significantly improve application performance. And with Datadog Serverless Monitoring, teams can track how SnapStart optimizes function performance over time, in addition to monitoring cold starts across the rest of their services. These capabilities help teams ensure that their serverless applications deliver a consistent user experience for their customers. Check out our documentation to learn more, or sign up for a 14-day free trial to start monitoring your functions with Datadog today.