Machine Learning | Datadog

Optimize LLM application performance with Datadog's vLLM integration

Learn how to use our vLLM integration to monitor the performance and resource usage of your LLM workloads.

Troubleshooting RAG-based LLM applications

Learn how to mitigate some of the common challenges to building RAG-based LLM applications.

ML platform monitoring: Best practices

Learn about what to monitor through each step of an ML workflow.

Stay up to date on the latest incidents with Bits AI

Learn how Bits AI can enhance your incident responses with quick summaries and natural language queries.

Monitor Ray applications and clusters with Datadog

Learn how to monitor your AI workloads and their resource consumption as you scale them with Ray.

Monitor Amazon Bedrock with Datadog

Learn how to monitor your foundation models' usage, API performance, error rate, and more with Datadog's ...

Monitoring Amazon SageMaker with Datadog

Learn how Datadog's integration with Amazon SageMaker can help you monitor resource utilization and identify ...

Monitor your NVIDIA GPUs with Datadog

Learn how our NVIDIA DCGM integration provides visibility into all of your NVIDIA GPUs in a single platform.

Monitor machine learning models with Fiddler's offering in the Datadog Marketplace

Learn how to centralize monitoring of your machine learning–based applications, proactively maintain model ...

Understand the scope of user impact with Watchdog Impact Analysis

See how many users are affected by service performance issues so that you can troubleshoot more effectively.

Augmented troubleshooting with Watchdog Insights

Watchdog Insights surfaces clues and helps reduce MTTR—and now supports Log Management.

Automated root cause analysis with Watchdog RCA

Learn how Watchdog can automatically identify the root cause of performance issues across your stack.

Datadog APM gains 3 superpowers: App Analytics, Service Map & Watchdog

With three major new features and support for numerous languages and frameworks, Datadog APM is more powerful ...

Auto-smooth noisy metrics to reveal trends

Datadog's new Auto Smoother function makes it simple to smooth out noisy metrics without losing sight of the ...

Watchdog: Auto-detect performance anomalies without setting alerts

Watchdog uses machine learning to sniff out potential performance problems without any setup or configuration.

Robust statistical distances for machine learning

Designing powerful outlier and anomaly detection algorithms requires using the right tools. Discover how ...

Introducing new scaled algorithms for improved outlier detection

Our new outlier detection algorithms take magnitude and dispersion into account for better alerting.

Introducing anomaly detection in Datadog

Anomaly detection analyzes recent metric patterns to identify abnormalities.

Introducing outlier detection in Datadog

Datadog's new outlier detection feature allows you to automatically identify any host (or group of hosts) that ...

...
...