Complex systems require many different monitors to assess the health of their infrastructure and applications, creating a wealth of alerts that can be hard to track. Due to a lack of effective triage processes, many organizations page engineers for every alert that comes in, making it difficult to separate false positives from issues that actually require immediate attention. In an ideal system, on-call engineers would be paged only for major incidents, and issues that don’t have any urgent customer impact could be left to longer-term investigations. However, your system still needs procedures for handling these back-burner issues—they can easily fall through the cracks and, if left unaddressed, grow to cause bottlenecks or a cascade of impacts.
Datadog Case Management—now GA—provides a centralized ticketing system for tracking, triaging, and troubleshooting these types of issues. Across the Datadog platform, you can easily create cases directly from telemetry data you want to investigate, including everything from events, alerts, and security signals to Watchdog Alerts, Sensitive Data Scanner findings, and cloud cost recommendations—you can even set up cases to be automatically created from issues using Datadog Workflow Automation. Alternatively, you can use Case Management to keep track of operational tasks outside of the Datadog platform, such as organizing gamedays or building out infrastructure. Then in a single view, you can track all of your cases, helping ensure that every issue is properly addressed. You can easily assign cases to users, establishing clear lines of ownership that persist throughout the lifespan of the case. Additionally, Case Management is built for teams of any size, from network or security operations center groups (NOCs and SOCs) dedicated to triaging tickets to DevOps engineers handling individual issues, helping streamline collaboration across your organization.
In the post, we’ll explore how Datadog Case Management helps you:
- Prioritize and delegate all cases from within one view
- Organize investigations using an observability-enhanced source of truth
Prioritize and delegate all cases from within one view
With Datadog Case Management, you have centralized access to alerts, security signals, and Error Tracking issues that haven’t yet escalated into customer-impacting incidents. From the Case Management overview page, you can view crucial context for each case, including associated environments, services, incidents, and teams. You can also quickly determine whether a case is already being worked on—this helps you delineate ownership, identify points of contact, and avoid duplicating assignments. You’re even able to create cases for work items without any linked alerts or signals.
The overview page enables you to sort your cases using filters—such as status, team, or priority—to find the issues you’re looking for (or discover ones you didn’t know existed). You can also create projects to organize related cases. Projects enable you to group and manage cases based on team, service, or initiative, providing distinct workspaces to triage from. For instance, your frontend, backend, and security teams may want to create their own projects to keep their cases separate from one another.
If you want to further narrow the cases displayed, you can save individual queries as views within a project. You can even configure notifications based on these customized views to Slack, Microsoft Teams, PagerDuty, email, or other third-party platforms via webhooks.
The features on the overview page help you organize your queue of ongoing alerts, enabling your teams to act quickly and prevent critical customer impact. For example, one of the primary responsibilities of central support teams is to ensure that every issue is being handled by the appropriate team or engineer. In modern cloud environments, however, the sheer amount of telemetry data makes this an enormous task requiring constant vigilance, decision making, and context switching. Using Datadog Case Management, these teams can view all open cases, determine the appropriate assignees and priority level, and assign investigators without ever leaving the overview page.
Let’s say a case comes in for an alert showing high throughput on one of your services. The tags on the Case Management overview page help you determine which service is experiencing the issues and which team is responsible for managing it. You can then assign an engineer to troubleshoot the issue by selecting a team member from the drop-down menu. That designated investigator is added to the case overview so that other engineers and support team members can see that the issue is being handled.
Organize investigations using an observability-enhanced source of truth
By integrating ticketing with observability data, Datadog Case Management streamlines investigation so you can quickly resolve issues of any size with minimal context switching. For example, many security teams need a place to consolidate troubleshooting efforts that are unrelated to active threats but are still necessary for remedying vulnerabilities. By creating cases for these issues, security teams can organize any contextual data and information related to these threats—including relevant graphs, logs, alerts, and notebooks—as they continue their ongoing investigations. The case then acts as a single source of truth, with timestamps for key events, activities, and comments.
Additionally, cases are automatically linked to Datadog Cloud SIEM signals or monitors, so that anyone viewing those resources can see that a security engineer is already working on the problem. And if that engineer decides that a case should be an incident after further investigation, they can escalate it directly within Case Management by declaring a Datadog incident or by using our one-click integration with third-party ticketing systems.
Once established, a case also becomes the central hub for all external context and communication related to an issue. You can easily create ServiceNow incidents directly from the case. You can also sync your Jira account to Datadog via our bidirectional integration—this enables you to automatically create, update, and close Jira tickets simply by working with Datadog cases and vice versa. These integrations give you easy access to relevant information about your case investigation across platforms, as well as enable you to retain ticketing records in multiple places for audit purposes.
Start using Datadog Case Management today
Datadog combs through metrics, traces, and logs to surface unusual behavior and concerning trends. Without a centralized location for processing and addressing these findings, key issues can go unnoticed. By using Datadog Case Management, you can organize your investigations around alerts, security signals, and error-tracking issues in the same platform you already use to troubleshoot. You’re able to easily pivot from your cases to observability data during investigations, and you can enrich your cases with context from Datadog.
Case Management is available to Datadog customers at no additional charge—you can use our documentation to get started. Or, if you’re not yet a Datadog user, you can sign up for a 14-day free trial today.