Stay Ahead of Service Disruptions With Watchdog Cloud & API Outage Detection | Datadog

Stay ahead of service disruptions with Watchdog Cloud & API Outage Detection

Author Hugo Pucéat
Author Maya Perry

Published: January 17, 2025

Even with the best monitoring in place, outages are unavoidable. Complex, modern IT environments rely on multiple third-party services, including critical cloud and API providers, and when any one of those goes down, it can trigger a domino effect of increased error rates and latency spikes across your system. And, because you don’t have as much visibility into external services, it can be difficult to identify that the problem is due to an outside outage or disrupted service. This leads to slower incident response times, unnecessary guesswork, and longer service disruptions, which can affect business continuity and customer trust.

Cloud & API Outage Detection, powered by Watchdog, is a new AI-driven capability that automatically identifies external cloud and API provider degradations impacting your services, helping you save time on investigations.

In this post we will show you how Watchdog Cloud & API Outage Detection:

Proactive outage detection and faster root cause analysis

When your system’s performance starts to deteriorate, how can you know whether it’s due to your internal infrastructure or an external provider outage? Datadog continuously monitors for elevated error rates in requests to external providers—such as AWS, OpenAI, Slack, Stripe, and more—in order to detect service degradation as soon as it occurs. This proactive detection gives you a head start in identifying and mitigating issues before they escalate, significantly reducing time spent on root cause analysis and improving response times.

For example, during a recent incident at the analytics service provider Mixpanel, Watchdog detected an issue just minutes after it was publicly announced on the provider’s status page. Customers whose applications made calls to Mixpanel APIs were seeing increasing latency and error rates. Watchdog’s Cloud & API Outage Detection revealed that this was a result of an incident Mixpanel was experiencing rather than a problem with the applications themselves. Instead of wasting hours investigating internal systems, customers were able to focus on mitigating the impact and keep critical services running smoothly.

Clear impact visibility and accelerated troubleshooting

Knowing that an external service is down is just the first step. The real challenge lies in understanding how that outage is affecting your own environment. What’s the blast radius? Are all your services impacted, or just a subset? Can you quickly pinpoint the affected parts of your infrastructure, or will you have to dig through layers of monitoring data to figure that out?

Datadog knows which parts of your application are making calls to third-party providers, so when it does detect a cloud or API degradation, it includes clear, actionable insights into which services are impacted by the problem and the extent of the disruption, allowing you to differentiate between external and internal issues quickly.

Watchdog story indicating a problem caused by an external provider and identifying affected service

Datadog also provides direct links to the provider’s status page and support channels, making it easy to inform the service provider of the issue and escalate if necessary.

Datadog provides next steps, including links to the provider's status page and support channels

Actionable insights with notifications

Detecting outages is important, but ensuring the right teams know about it and have next steps is critical. You can create Watchdog monitors that will send automatic notifications to your teams for third-party service degradations that are impacting your services.

To configure your monitor, simply click on New Monitor from the Watchdog Explorer and set the alert category to Third Party to be notified of detected problems with external services. You can also specify which supported providers you want to watch.

Creating a Watchdog alert for third-party outages

Get started with Datadog Cloud & API Outage Detection

Watchdog Cloud & API Outage Detection helps you stay ahead of service disruptions by proactively identifying external service degradations, detecting impacted services, reducing time spent on troubleshooting, and ensuring faster incident response. With clear impact visibility and actionable notifications, your team can minimize downtime and maintain business continuity, even in the face of external outages. See our documentation to get started. Or, if you’re not a customer, start a free .