Anthropic is an AI research and development company focused on building reliable and safe artificial intelligence systems. Their flagship product is Claude, an advanced language model and conversational AI assistant known for its strong capabilities in natural language processing, reasoning, and task completion. Anthropic places a particular emphasis on AI safety and ethics, and its models and APIs are used by organizations across various industries to build powerful, safe, and performant AI applications.
We are pleased to announce Datadog’s native integration with Anthropic and LLM Observability, which you can use to monitor, troubleshoot, and secure your Anthropic LLM applications. The integration enables Anthropic customers to use Datadog for:
- Enhanced visibility and control with real-time metrics that provide insights into Anthropic models’ performance and usage
- Streamlined troubleshooting and debugging with granular visibility into LLM chains via distributed traces
- Quality and safety assurance with out-of-the-box evaluation checks
In this post, we will discuss how these features within Datadog LLM Observability can help AI engineers and software developers develop accurate, cost-efficient, safe, and secure Anthropic-powered LLM applications at scale.
Track Anthropic usage patterns
Cost efficiency and performance are two of the most important concerns of modern LLM applications. As AI application teams rapidly scale up their usage of Anthropic APIs to tackle more complex use cases, it becomes increasingly crucial to monitor requests, latencies, and token consumption effectively.
Token usage can fluctuate depending on the models employed, each of which comes with its own pricing structure. LLM Observability’s included metrics can help organizations manage and understand these complexities. You can monitor many of these metrics by using LLM Observability’s out-of-the-box dashboard, which provides a comprehensive view of application performance and usage trends across your organization. The dashboard includes detailed operational performance metrics, including trace- and span-level errors, latency, token consumption, model usage statistics, and any triggered monitors.
Troubleshoot your Anthropic application faster with end-to-end tracing
Rapid advancements in generative AI technology have made LLMs faster, cheaper, and equipped with larger context windows, allowing developers to create specialized applications. Anthropic Claude models have demonstrated strong performance in reasoning tasks, making them well-suited for complex chain-of-thought applications. This allows developers to design LLM agents for more sophisticated and nuanced tasks.
As customers build complex LLM chains where an initial request can trigger a series of distributed system calls, they also introduce multiple points of failure. An LLM application request could fail not only due to “hard” errors—such as timeouts or bad API calls—but also “soft” errors, where the request executes successfully but returns an incorrect or poor response. These soft errors are particularly important to track—and also particularly difficult and time-consuming to find and diagnose.
LLM Observability’s traces help you identify and solve these errors by providing detailed information about each step in your LLM chain’s execution and highlighting errors and latency bottlenecks. LLM Observability’s deep integration with Anthropic automatically captures Anthropic API requests without requiring any manual instrumentation. This allows you to focus on instrumenting other parts of your LLM application.
When tracing an Anthropic API call, you can carefully inspect the input prompt and observe each of the steps your application took to form the final response. By looking at these intermediate steps, you can quickly discover the root cause of unexpected responses.
Evaluate your LLM application inputs and outputs for quality and safety issues
LLM applications must be rigorously evaluated for quality and safety, particularly due to their non-deterministic nature. These applications are susceptible to various attack techniques, posing significant risks. Datadog LLM Observability provides out-of-the-box quality and safety checks to help you monitor the quality of your application’s output, as well as detect any prompt injections and toxic content in your application’s LLM responses. These features enable you to maintain high standards of performance and ethical AI usage, aligning with Anthropic’s commitment to developing safe and effective AI technologies.
The trace side panel allows you to view these quality checks, which include metrics like “Failure to answer” and “Topic relevancy” to assess the success of responses. Additionally, checks for “Toxicity” and “Negative sentiment” are included to indicate potential poor user experiences. By leveraging these tools, you can ensure your LLM applications operate reliably and ethically, addressing both performance and safety concerns.
LLM Observability integrates with Sensitive Data Scanner to scrub personally identifiable information (PII) from prompt traces by default in order to help you detect when customer PII was inadvertently passed in an LLM call or shown to the user.
Monitor your Anthropic applications with Datadog
LLM-based applications are incredibly powerful and unlock many new product opportunities. However, there remains a pressing need for granular visibility into their behavior. By monitoring your LLM applications using Datadog LLM Observability, you can form actionable insights about their health, performance, and security from a consolidated view.
LLM Observability is now generally available for all Datadog customers—see our documentation for more information about how to get started. If you’re brand new to Datadog, sign up for a free trial.