When your frontend applications are experiencing degraded performance, it’s paramount to investigate and resolve issues as quickly as possible so you can minimize user frustration, churn, loss of revenue, and other critical consequences. Watchdog Insights for Datadog Real User Monitoring is an AI-powered engine that augments investigations by intelligently surfacing outlier attributes in the errors and latency affecting your applications.
Watchdog Insights for Datadog RUM watches your RUM events in real time, analyzing them to detect tagged outliers associated with higher-than-usual errors, latency, and Core Web Vitals values in your frontend views. This autonomously surfaced context gives your teams more clarity with less effort, helping them more efficiently investigate ongoing issues and pointing them towards potential problems that may worsen in the future. In this post, we’ll show you how to leverage RUM Watchdog Insights to investigate frontend incidents and track application performance more efficiently.
Quickly build context for frontend incident response
Watchdog forms Insights in RUM from user session events associated with higher-than-usual error rates or latency. When you enter a query, Watchdog will automatically surface these outliers based on tagged facets such as user location, browser version, or code release. An anomalous pattern of latency or errors associated with one of these facets can often point to potential root causes when your application is experiencing degraded performance—helping you find the right path to investigate.
After identifying an issue surfaced as a Watchdog error outlier, you can quickly drill into relevant view and error events to discover possible root causes. In this way, Watchdog Insights can supplement Datadog RUM’s alerting and incident response workflows. For example, let’s say your team receives an alert indicating a high error rate in one of your app’s views. The responder can take the associated query for that monitor to the RUM Explorer and use it to scope down to the monitor’s associated RUM events. After the query is entered, Watchdog will automatically hunt for error outliers and display them in the Insights carousel.
In the preceding screenshot, Watchdog has revealed an error outlier showing an unusual correlation where a particular screen size represents less than half of total view loads but accounts for over 75 percent of errors. From the sidepanel for this outlier, shown below, the responder can quickly gather key context, including:
- The total customer impact (the number of users affected)
- A breakdown of other related tags, indicating further correlations hinting at the root cause
- Highlighted errors from the insight and related Error Tracking issues
By investigating individual view load events from the outlier’s list of impacted views, you can see a full breakdown of the timing of each load, including frozen frames, content fetches, client-side JavaScript payloads, and other key potential speed bottlenecks and error sources. You can also pivot directly to related traces and logs for any of the individual error events on the view in question. Sparing you from having to manually click through individual view events to search for a correlation, Watchdog has helped bring the investigation steps closer to remediation—surfacing all the relevant views, pinpointing what they have in common, and providing key context for root cause analysis.
Efficiently analyze your application performance and surface warning signs
In addition to helping your team respond to ongoing issues, Watchdog Insights can also help you characterize your frontend’s performance during broader investigations. By automatically finding UX performance bottlenecks scoped to a specific segment of your application or user base, Watchdog helps your team spot regressions before they become problematic. Along with error outliers, Watchdog can find latency outliers in initial page load times as well as Core Web Vitals, and Mobile Vitals metrics, helping you recognize potential problems that affect your views’ load performance, interactivity, and visual stability.
For example, let’s say that during a canary deployment of a new piece of your UI, you’ve created a RUM views query that filters your events for the relevant view and new code version. Watchdog will automatically analyze the views to surface outliers in page load time, largest contentful paint, first input delay, and cumulative layout shift. This enables you to quickly discover whether the new release has introduced any UX regressions scoped to a particular facet. If Watchdog surfaces an interesting insight (such as high cumulative layout shift for a particular browser), you can click the “View in Analytics” button in the outlier sidepanel to immediately graph it in the RUM Analytics tool.
RUM Analytics enables you to look at the behavior of this outlier metric over time, so you can continually track it as your UX team mobilizes to resolve the underlying issue. You can use the tool to quickly create a new monitor that will notify you when the metric returns to an acceptable threshold, or when it worsens. You can also combine it with other telemetry from your frontend by saving it in a dashboard, to form a holistic view of your application that you can easily share around your organization.
Get started with intelligent RUM insights
By intelligently surfacing outliers in your views’ errors and latency, Watchdog Insights for RUM automatically highlights parts of your application that show an outsize proportion of warning signs. This gives your team more clarity with less required effort, helping them both investigate ongoing issues and spot potential new ones more efficiently. Watchdog Insights is generally available in Datadog query tools, including the APM Trace Explorer, Log Explorer, Live Containers view, and now the RUM Explorer. See the preceding docs links for more information about Watchdog. Or, if you’re brand new to Datadog, sign up for a 14-day free trial to get started.