Increase Visibility Into Network Incidents Using Moovingon.ai and Datadog | Datadog

Increase visibility into network incidents using moovingon.ai and Datadog

Author Lauren Lowe
Author Erica Ho
Author Alex Guo

Published: December 11, 2024

moovingon.ai is a platform that consolidates alerts, incidents, audits, runbooks, and other resources for 24/7 network operations center (NOC) engineering teams. These teams often have to work collaboratively to maintain uptime for mission-critical cloud infrastructure and applications and need specialized resources to facilitate investigations in the event of an issue.

Datadog now partners with MoovingON—the makers of moovingon.ai—to enable NOC engineers to monitor metrics, logs, and alerts from their moovingon.ai environment directly in Datadog, alongside telemetry from across the stack. Additionally, moovingon.ai pulls metrics, logs, and events from Datadog into their platform, which then analyzes these to identify similar incidents and help teams remediate the issue through troubleshooting steps pushed back to Datadog as events.

moovingon.ai’s offering in the Datadog Marketplace comes with an out-of-the-box (OOTB) integration that enables you to send incidents and alerts to moovingon.ai, and audits and postmortem data from moovingon.ai to Datadog. You can monitor this data in an OOTB dashboard in Datadog, giving you a single pane of glass when triaging and managing cloud platform incidents.

In this post, we’ll show you how NOC engineers can use moovingon.ai and Datadog to:

Manage live cloud platform incidents

Once you’ve set up the moovingon.ai integration, alerts from Datadog will start streaming into moovingon.ai to alert your NOC teams of critical issues. From there, NOC engineers can perform their troubleshooting and other analysis steps in moovingon.ai, which also provides no-code conditional runbooks to help specialized NOC teams quickly understand how best to respond to an incident.

Likewise, actions your NOC team takes that are recorded in moovingon.ai—e.g., routine maintenance, service checkups, investigating security breaches, and configuration drifts—will be pushed to Datadog as events. These events will populate in the OOTB moovingon.ai dashboard in Datadog. This allows NOC and CloudOps teams to see the total audit history of any incident in Datadog, while utilizing specialized runbooks and workflows from moovingon.ai.

For example, let’s say Datadog triggers an alert that database CPU is high. moovingon.ai will have tier 0 and tier 1 teams, such as NOC and CloudOps, troubleshoot the issue using relevant runbooks. If necessary, higher tiers can troubleshoot if the issue remains unresolved.

Alert generated from Datadog in the moovingon.ai UI

During their investigation, tier 0 and tier 1 teams can take advantage of moovingon.ai’s runbooks, which offer specialized how-to guidance on resolving specific types of incidents, including historical context from previously resolved issues. Alternatively, the team may decide they need to escalate the issue for higher-tier analysis and resolution. Since all troubleshooting actions in moovingon.ai stream into Datadog as events, the team handling this next step can see existing troubleshooting history for this incident in the OOTB dashboard in Datadog, helping speed up resolution.

Conduct detailed postmortem analysis

moovingon.ai also generates audits for incidents recorded in the platform, capturing all remediation actions to help NOC teams streamline reporting. This allows for more efficient root cause analysis and helps teams prevent similar issues from recurring in the future. If you have the moovingon.ai integration set up, these audits will be sent to Datadog as events. You can explore the logs associated with these audits directly in the moovingon.ai dashboard in Datadog and use Datadog’s unified tagging system to easily filter, aggregate, and compare logs. This enables you to analyze performance across different data types and refine the results by specific elements.

Events from moovingon.ai in Datadog

These insights help speed up postmortem analysis by making audit information easily available for NOC engineers and CloudOps teams. To continue our example from earlier, let’s say your team has identified that the cause of the increase in database CPU was a spike in active connections. Your NOC team can see the troubleshooting and audit history for this incident in Datadog to understand what steps were taken in the investigation, so they automate certain actions for similar issues in the future, helping them save time.

Get started

The integration between moovingon.ai and Datadog helps NOC engineers and 24/7 on-call teams ensure all incidents are captured and remedied in a timely fashion. Joint users now have a single platform for operations, with all incidents from Datadog automatically fed back with audits and status updates.

If you’d like to try moovingon.ai with Datadog, purchase and install the integration and the moovingon.ai software license from the Datadog Marketplace page. If you’re new to Datadog, sign up for a .

The ability to promote branded marketing tools is a membership benefit offered through the Datadog Partner Network. If you’re interested in developing an integration or application that you’d like to promote, you can contact us at marketplace@datadog.com.