Unify visibility into changes to your services and dependencies with Datadog Change Tracking

Aaron Weber

Evan Pandya

In modern application development, changes happen constantly: Deployments are pushed, feature flags are toggled, and Kubernetes events reshape infrastructure, to name just a few. While these practices drive innovation and scalability, they also introduce complexity---especially during incidents. Fragmented tools and workflows across teams and organizations make it difficult to pinpoint the root causes of issues, leading to longer resolution times.

To address these challenges, Datadog Change Tracking offers real-time visibility into a wide array of system changes by surfacing relevant changes directly within Datadog's monitors, dashboards, and service pages. Change Tracking provides an immediate and unified view of changes such as application deployments, feature toggle updates, Kubernetes events, and database modifications. Teams can promptly identify relevant changes and take informed action, accelerating root cause analysis and reducing time to resolution.

In this post, we'll walk you through an example of how Change Tracking can help you:

View changes to your services and dependencies directly in context
Correlate recent changes with shifts in health and performance metrics
Analyze detailed change insights and take next steps, all within the Datadog platform

View changes to your services and dependencies directly in context

Imagine that you receive an alert that the error rate for a critical API is spiking. Your team’s first step is to investigate potential root causes. Without a unified view of changes, the investigation process is fragmented. Your team must manually gather data from various logs, monitoring tools, and pipelines to try to understand what might have changed. This manual work delays incident resolution and increases the risk of focusing on the wrong area.

With Change Tracking, changes made to the API and its dependencies are immediately accessible in context across the Datadog platform, including on:

Monitor status pages: When a monitor is in a warning or alerting state, use Change Tracking to review recent changes related to the affected service directly on the status page. This functionality helps you quickly identify if a recent deployment, feature flag, or other change might have contributed to the monitor’s change in state.
Service pages: On the Service Summary page in APM, view the timeline of changes to the service and its dependencies alongside metrics such as latency, error rate, and throughput. This functionality helps you see how recent updates align with fluctuations in performance or health metrics.
Dashboards: Use the Show Overlays button to display tracked changes as interactive overlays directly on timeseries widgets or directly in the change timeline for clear visual correlation between changes and metric trends.

The following screenshot shows a monitor status page for our example API that is experiencing an increase in error rate. Here, we see that Change Tracking is surfacing a deployment change and a feature flag change as potential root causes for you to investigate.

A monitor status page shows the deployment change and feature flag change, along with an event timeline and event details.

Correlate recent changes with disturbances in service health and performance

With relevant changes automatically surfaced, you can use Change Tracking to correlate these events with the spike in error rate. Hovering over individual change events, such as the feature flag change, reveals details to streamline your analysis. Among these details are the associated service name and the timestamp of the change.

A monitor status page shows the name, timestamp, and associate service for a feature flag change. — You can access details about the feature flag change by hovering over the change.

By choosing View Details on the feature flag change, you open the change details side panel. Here, you discover that the feature flag was toggled on and introduced a configuration that directs the API to a different data store. While this change could have contributed to the increased error rate, you don’t know for sure.

The side panel shows more details about the feature flag change, including the user who made the change, an identifier, and a description. — Details of the feature flag change that directed the API to a different data store.

You can then shift your investigation to the deployment. When you review the latest commit linked to the deployment, you uncover code changes that didn’t properly account for the increased load conditions on the new data store. The code changes caused the spike in errors when the feature flag was enabled.

The side panel shows more details about the deployment change, including the environment and a timestamp. The panel also shows request rate, error rate, and latency. — Details of the deployment change, including the corresponding increase in error rate.

Access detailed change insights and take next steps within Datadog

Now that you have identified the root cause as a combination of the faulty deployment and the feature flag, you can focus on stabilizing the system. You determine that the most efficient way to address the issue is to toggle off the feature flag, redirecting traffic back to the original data store. If you have configured the LaunchDarkly integration in Datadog, you can toggle off the feature flag directly in the Change Tracking side panel by using the LaunchDarkly remediation workflow powered by Datadog Workflow Automation. Alternatively, you can set up a custom workflow or manage the flag externally, depending on your internal processes and preferences.

A configuration screen provides the option to connect to LaunchDarkly and toggle off the feature flag. The screen also presents options to toggle on Slack notifications and action approvals. — After you identify the root cause of the issue, you can toggle off the feature flag in Datadog.

The change details and available actions in Change Tracking are dynamic and tailored to the type of change. For example, application deployment changes include rollout details, version differences, and links to CI/CD pipelines. Conversely, Kubernetes deployment changes provide diffs, cluster metadata, and quick links to logs and pods. Change Tracking gives you the right context and tools for different types of tracked changes, enabling in-depth analysis and efficient responses across a wide range of use cases.

Get started with Change Tracking today

Datadog Change Tracking provides comprehensive visibility into changes across your services and dependencies, integrating insights into monitor status pages, dashboards, and service pages. With Change Tracking, you can view changes, correlate the changes with performance data, and take next steps to remediate issues. As a result, you can improve efficiency and reliability by identifying root causes and resolving incidents faster.

You can find a comprehensive list of supported change types and tracking requirements in the Change Tracking documentation. If you don’t already have a Datadog account, sign up for a 14-day free trial today.

Unify visibility into changes to your services and dependencies with Datadog Change Tracking

View changes to your services and dependencies directly in context

Correlate recent changes with disturbances in service health and performance

Access detailed change insights and take next steps within Datadog

Get started with Change Tracking today

Related Articles

Find what's driving errors and latency with Tag Analysis

Too many alert notifications? Learn how to combat alert storms

Automatically detect error and latency patterns with Watchdog Insights for APM

Explore high-cardinality trace data with App Analytics

Start monitoring your metrics in minutes

Get Started with Datadog

View changes to your services and dependencies directly in context

Correlate recent changes with disturbances in service health and performance

Access detailed change insights and take next steps within Datadog

Get started with Change Tracking today

Related Articles

Find what's driving errors and latency with Tag Analysis

Too many alert notifications? Learn how to combat alert storms

Automatically detect error and latency patterns with Watchdog Insights for APM

Explore high-cardinality trace data with App Analytics

Related jobs at Datadog

We're always looking for talented people to collaborate with

Start monitoring your metrics in minutes