Kickstart Your Investigations and Reduce Alert Noise With Doctor Droid’s Offering in the Datadog Marketplace | Datadog

Kickstart your investigations and reduce alert noise with Doctor Droid’s offering in the Datadog Marketplace

Author Addie Beach
Author Erica Ho
Author Alex Guo

Published: 1月 8, 2025

Being an on-call engineer is often overwhelming, requiring you to pivot between tickets, dashboards, runbooks, and different data sources as you try to separate legitimate incidents from unnecessary noise. Not only does the process of investigating irrelevant alerts take time away from remediating important issues, but it also compounds alert fatigue.

To help you focus your response activity, Doctor Droid automates your investigations by running triage workflows on incoming alerts and intelligently organizing the findings from these workflows into overviews of impacted resources and potential root causes. This enables you to quickly evaluate the severity of the alert and decide on next steps. With Datadog’s Doctor Droid integration, you can easily enrich your Datadog monitors with insights from these summaries. This helps you determine the potential source of an issue as soon as you’re notified of it and immediately start remediation.

In this post, we’ll explore how Doctor Droid and Datadog help you:

Enrich your monitors and prioritize your alerts with critical context

Doctor Droid conducts pattern analysis to group relevant alerts together and determine their legitimacy. By analyzing the frequency and severity of your alerts, Doctor Droid provides you with a quality score that helps you identify noisy alerts and evaluate various aspects of your alerting strategy, such as your thresholds.

To help you quickly triage incoming alerts, you can also configure custom playbooks within Doctor Droid to perform initial investigative actions, such as fetching metrics from monitoring platforms like Datadog or querying relevant databases. Additionally, you can configure Doctor Droid to draw on organization-specific information in these investigative steps by uploading your past incident reports and post-mortems. The findings derived from these playbooks are then summarized into key contextual insights, such as any recent releases that may have introduced a bug or related services experiencing similar problems, to help you determine if this alert may be related. From there, you can quickly decide on how to respond.

With the Datadog Doctor Droid integration, you can view these summaries as Datadog events (shown below).

A Doctor Droid event, with the deployment history and analysis of impacted metrics displayed.

You can easily configure these events by adding a Doctor Droid webhook to your Datadog monitors, establishing Doctor Droid as a destination for your alerts. Doctor Droid will then automatically send findings from any connected runbooks to the alerting platform of your choice when these monitors are triggered. This enables you to easily access the vital monitoring details provided by Datadog alerts—including timeseries graphs, details about the affected services, and contact information for the relevant teams—alongside the high-level findings generated by Doctor Droid. Additionally, Datadog’s integrations with communication platforms like PagerDuty, Slack, and Teams enable you to receive these enriched alerts within the tools that are already part of your workflows. With a single click, you can then pivot from viewing these notifications to troubleshooting within Datadog.

Let’s say you receive an alert in Slack of high memory usage for a few of your hosts. By viewing the alert details, you can access an evaluation of the issue side-by-side with a visualization of past memory usage for these hosts. The Datadog timeseries graph included in the alert helps you determine that this is a sudden, unusual spike in memory usage—as opposed to the predictable cycles or gradual increase in usage that might result from something like underprovisioning, for instance. At the same time, the Doctor Droid playbook summary indicates there are security alerts on related services that occurred around the same time that the memory spike was detected. Together, these insights from Datadog and Doctor Droid help you determine that the memory problem is likely the result of an active attack.

Access alert details for granular troubleshooting

As you carry on in your investigations, the Doctor Droid integration helps you conduct more in-depth root cause analysis. You can start by viewing the Doctor Droid out-of-the-box dashboard in Datadog to access additional insights about your alerts. This dashboard includes a collection of Doctor Droid’s analyses of resources impacted by recent incidents. As with alerts, Doctor Droid draws on a number of sources to collect this data, including metrics from Datadog Infrastructure Monitoring, traces within Datadog APM, and logs from Datadog Log Management. You can also view alerting trends over time, helping you spot unusual activity or track improvements to your alerting strategy.

The Doctor Droid dashboard showing a list of impacted resource analysis events.

Continuing the example from above, you decide to view the list of impacted resources on the Doctor Droid dashboard in Datadog. By helping you identify which services are being targeted and how, these summaries give you a starting point for further troubleshooting. You can then easily investigate further by correlating your findings with other observability data you’re already collecting within Datadog. In this instance, you might view security signals in Cloud Security Management. By filtering the signals to the impacted services identified by Doctor Droid, you can identify security data related to the incident that you’re investigating. For each of these signals, Datadog provides the severity of the issue, details the type of threat detected, and suggests next steps, helping you quickly identify and remediate the attack. To organize your response, you can also create a case, declare an incident, or run an automated workflow directly from the signal details.

The details panel for a critical Cloud Security event.

Start enhancing your alerts with Doctor Droid and Datadog today

With automated playbooks and detailed evaluations, Doctor Droid helps you quickly assess critical issues and reduce alerting noise. When combined with Datadog’s troubleshooting and incident response features, you’re able to kickstart your investigations even faster. Integrating Doctor Droid with your Datadog monitors helps you easily access enriched context, while also giving you the ability to quickly pivot from this context directly to related metrics and signals within Datadog.

You can learn more about the Doctor Droid integration in our documentation. Or, if you’re new to Datadog, you can sign up for a .

The ability to promote branded monitoring tools in the Datadog Marketplace is one of the benefits of membership in the Datadog Partner Network. You can learn more about the Datadog Marketplace in our blog post, and you can contact us at marketplace@datadog.com if you’re interested in developing an integration or application.