DASH 2024: Guide to Datadog's Newest Announcements for Infrastructure

Understanding the total cost to power your services requires having visibility into costs beyond just the cloud infrastructure they are hosted on. Datadog Cloud Cost Management now has native cost integrations with key SaaS providers, including Snowflake, Databricks, MongoDB, Confluent Cloud, OpenAI, Fastly, Elastic Cloud, and Twilio. These out-of-the-box integrations enable granular cost allocation and provide instant visibility into the total cost to run your service. Additionally, Cloud Cost Management can ingest costs from any source with Custom Costs. By uploading Custom Costs in FinOps FOCUS format or via API, customers can connect every one of their cost sources to Datadog.

With your cloud, SaaS, and Custom Costs in Datadog, you can understand and report on the total cost to run your services, put these costs in front of your engineering teams, and get alerted to any unexpected cost changes. See our documentation to learn more about the integrations we support and set up Cloud Cost Management today.

Ingest cost data from any source with Custom Costs.

Monitor and manage your Datadog costs in Cloud Cost Management

Your teams rely on Datadog’s suite of observability and security products to help deliver a reliable end user experience. With teams increasingly focused on costs as a KPI, it’s key for SREs and FinOps practitioners to understand their Datadog usage and associated costs with daily granularity.

Now, with Datadog Cloud Cost Management, you can visualize and create alerts on daily Datadog costs—scoped to your teams and services—so that you and your engineering teams are empowered to take ownership of and optimize their costs within their day-to-day workflows. Users can see Datadog costs in dashboards, notebooks, cost monitors, the Service Catalog, Tag Pipelines, and more.

Datadog costs are available in private beta. To sign up, please fill out this form.

Monitor Datadog costs alongside cost data from the rest of your stack.

Optimize your cloud resources with Cloud Cost Recommendations for AWS

By combining observability data with underlying AWS billing data, Datadog Cloud Cost Management automatically generates recommendations that provide engineers and FinOps practitioners with a clear direction on how they can optimize their AWS resources.

Datadog currently generates recommendations for opportunities to:

Terminate orphaned resources such as unused RDS instances
Rightsize overprovisioned resources like EC2 instances
Migrate legacy hardware like GP2 volumes

When Datadog generates a recommendation for your environment, you can easily create a case or Jira issue from it, putting recommendations directly in front of the teams who need to see them and empowering engineers to take action. See our documentation to learn more about Datadog Cloud Cost Recommendations for AWS.

Optimize your cloud resources with Cloud Cost Recommendations for AWS

Monitor your Twilio resources and usage costs with Datadog

We’re excited to announce Datadog’s new Twilio integration, to help your organization monitor all of its Twilio resources. Simply connect your Twilio account to collect different log types—including Alerts, Messages, Call Summaries, and Events—to analyze performance issues. The out-of-the-box dashboard will help you aggregate alerts, troubleshoot business-impacting errors, and collect comprehensive event-logging data for your Twilio resources. Additionally, gain deeper insights into your Twilio usage costs with Datadog Cloud Cost Management, available in beta, and detect potential security threats in your Twilio environment with Cloud SIEM.

See our documentation for more information and to get started.

Monitor your Twilio resources and usage costs with Datadog.

Serverless Monitoring

Instrument your AWS Lambda fleet with remote bulk instrumentation

In today’s rapidly evolving cloud environments, it’s more important than ever to have end-to-end visibility on transaction-level data for your production serverless applications, so you can solve issues quickly and make your services efficient. However, ensuring that all critical serverless functions are adequately instrumented for performance and security monitoring can be a time-consuming task, especially during critical incidents. Datadog’s remote bulk instrumentation for AWS Lambda enables users to seamlessly add observability like enhanced metrics to multiple functions at once, directly from the Datadog UI. Datadog instruments the selected functions with the Datadog Lambda extension in minutes and ensures that they stay instrumented regardless of any further deployments of the Lambda function. To get more information see our documentation and use this form to request access to the private beta.

Instrument your Lambda function fleet remotely from the Datadog UI

Get full visibility into your AWS Step Functions state machines with Datadog

AWS Step Functions is a service that abstracts distributed applications into state machines, with each state representing a component of an application. Whether your states are AWS Lambda functions, Elastic Container Service tasks, or AI/ML models hosted on Amazon Bedrock, you can use AWS Step Functions to seamlessly orchestrate your workloads. However, debugging these complex workflows can be challenging, and finding issues can often feel like finding a needle in a haystack of logs.

Datadog Serverless Monitoring now fully supports AWS Step Functions. You can get full visibility into the health and performance of your Step Functions through visualizations of key execution metrics, as well as distributed traces to get transaction-level visibility into executions across an entire state machine. Customers can visualize how long each state ran for and whether any errors occurred while executing the workflow.

Datadog Serverless Monitoring for AWS Step Functions is now generally available. To enable Step Function monitoring, add it to your Datadog account. To learn more, check out our blog.

Get deep visibility into Step Functions with Datadog's Serverless view

Automatically instrument your Azure App Service Linux Web Apps with the Datadog sidecar

Datadog, in collaboration with Azure, now offers an automatically instrumented Datadog sidecar that you can deploy with your App Service Linux Web Apps to seamlessly capture metrics, traces, and logs, helping you lower deployment and maintenance overhead and focus on app development. This new functionality takes advantage of Azure App Service’s sidecar pattern and enables you to run the Datadog Agent alongside their main application container, offering one-step instrumentation of your workloads and immediate access to Datadog’s suite of observability solutions. To get started, use this form to request access to this feature.

Auto-instrument your Azure App Service Linux Web Apps with the Datadog sidecar

Automatically instrument your Google Cloud Run services with the Datadog sidecar

Datadog, in collaboration with Google, now offers an automatically instrumented Datadog sidecar that you can deploy with your Cloud Run Apps to seamlessly capture metrics, traces, and logs, helping you lower deployment and maintenance overhead and focus on app development. This new functionality takes advantage of Google Cloud Run’s sidecar support and enables you to run the Datadog Agent alongside their main application container, offering one-step instrumentation of your workloads and immediate access to Datadog’s suite of observability solutions. To get started, use this form to request access to this feature.

Auto-instrument your Google Cloud Run services with the Datadog sidecar

Log Management

Redact sensitive data on-prem with the Datadog Agent

Datadog Sensitive Data Scanner (SDS) helps you redact sensitive information in your telemetry and triage ongoing security issues to expedite remediation efforts. Now, you can easily redact sensitive data on-prem by enabling SDS for the Datadog Agent. As the Agent collects logs, it will automatically apply your configured scanning rules to redact the data before it’s ingested into Datadog. Redacting sensitive data within your local environment provides an additional layer of security and can also help organizations in strictly regulated industries maintain data compliance with laws such as HIPAA and GDPR that prevent sensitive data from leaving their premises. You can learn more about using SDS with the Datadog Agent in our dedicated blog post, and access the private beta with this form.

Configure Agent scanning rules to redact sensitive data on-prem.

Store and analyze high-volume logs efficiently with Flex Logs

To help organizations respond to the challenges of cost-effectively storing, accessing, and analyzing ever-growing volumes of logs, Datadog’s Flex Logs is now generally available. Building on the flexibility offered by Logging Without Limits™, which decouples log ingestion from storage—enabling Datadog customers to enrich, parse, and archive 100 percent of their logs while storing only what they choose to—Flex Logs decouples the costs of log storage from the costs of querying. It provides both short- and long-term log retention for a nominal monthly fee without sacrificing visibility, eliminating the need for self-maintained databases and enabling seamless correlation between all of your logs, metrics, and traces. With Flex Logs, Datadog provides a solution to all of your logging use cases within one platform. See our blog post for more information and our documentation to get started.

Network monitoring

Pinpoint network issues with Network Path

Datadog Network Performance Monitoring (NPM) provides visibility into TCP and DNS network traffic being sent across your on-premises, cloud, and hybrid environments. Now with NPM Network Path, you can visualize the individual hops taken by traffic between your network’s sources and destinations. Network Path enables you to identify potential network misrouting, pinpoint the root cause of network latency problems, and see packet loss over time to narrow down the scope of incidents. See our documentation and use this form to request access to the private beta.

Visualize individual network hops with Network Path

Get context around and understand IP addresses with the IP pill

Knowing what entity an IP address represents can be difficult and involve multiple steps. The IP pill—available in Datadog Network Device Monitoring (NDM)’s NetFlow Monitoring, Network Performance Monitoring (NPM), and Container Monitoring—allows you to hover over an IP address and immediately understand what resource the address is tied to, such as a host, pod, or device, as well as additional information about the IP address, including geolocation or cloud provider. The IP pill also makes it easy to dig deeper into the network connections related to the IP address by allowing you to seamlessly pivot into NPM.

Get alerted when network metrics pass a threshold with NPM Monitors

Staying ahead of network incidents requires knowing when metrics like network latency pass a threshold and responding before it starts causing problems. Datadog Network Performance Monitoring (NPM) allows you to create monitors on an assortment of network metrics like TCP latency, TCP retransmits, and DNS failures and get alerted when that metric passes a threshold that you’ve set. Stay on top of any abnormalities or changes in your network by leveraging NPM monitors. See our documentation on NPM monitors to get started today.

Troubleshoot your network faster with failed connection count

TCP failed connections can be a common source of network-related problems. Datadog Network Performance Monitoring (NPM) can now surface failed connection counts for individual and aggregated traffic flows in your network. By understanding how many TCP connections are failing and why, be it timeout, refusal, or reset, you can more easily troubleshoot and find the root cause of your potential network issues. Learn more about how NPM can help you monitor failed connection counts and other network analytics in our documentation, and join the private beta with this form.

Customize metrics and tags from any on-prem device with Network Device Monitoring

Datadog Network Device Monitoring (NDM) already supports leading network vendors such as Cisco, F5, Palo Alto, Arista, Juniper, Fortinet, and many others with our out-of-the-box profiles. Network teams can also use Autodiscovery to automatically discover all SNMP-configured devices on your network. Now, NDM is equipped with a guided, GUI-based experience to customize the metrics and tags from any network devices that comprise the unique setup of your on-premises network. This means it is now even easier to onboard your network devices to Datadog, eliminating the need to manually configure profile YAML files. Learn more in our documentation and fill out this form to request access to the new onboarding view, currently available in private beta.

Monitor Cisco SD-WAN with Datadog

Software-defined wide area networks (SD-WANs) are a programming-driven approach to centrally control and manage WANs and connect branch, enterprise, and cloud locations. This is an increasingly popular architecture for organizations because SD-WANs are more cost-effective, flexible, and scalable than traditional WANs. You can now monitor your Cisco SD-WAN environment with Datadog Network Device Monitoring. This means your network teams now have a single tool to get visibility into your Cisco SD-WAN control plane and data plane to quickly diagnose issues related to edge device performance.

Datadog collects key metrics such as the number of reboots and crashes and CPU, memory, and disk usage over time, as well as the health metrics for your SD-WAN tunnels including latency, jitter, and packet loss, so you can streamline operational efficiency and reduce operating costs of the SD-WAN infrastructure. Datadog’s support for monitoring Cisco SD-WAN helps you stay on top of the dynamic needs of your network teams as you move from MPLS and traditional WANs to SD-WAN during your cloud migration journey. See our documentation to get started.

Set threshold-based monitors on NetFlow traffic

Datadog NetFlow Monitoring helps network engineers identify the top contributors to their network traffic (i.e., top talkers) to understand what applications are using the available bandwidth and causing congestion on the network. You can now proactively set threshold-based monitors on NetFlow traffic so your network team can be alerted on abnormal spikes before device interfaces become congested. You can also alert on traffic going to unexpected or unsanctioned destinations and proactively take action, such as by blocking traffic before volumes increase substantially and impact the network performance of other services. Fill out this form to request access to this feature, currently available in private beta.

Set threshold monitors on NetFlow traffic

Investigate and troubleshoot

View your infrastructure data through SQL queries with DDSQL Editor

DDSQL Editor lets you access all of your infrastructure data through SQL queries. By joining tables like hosts, containers, or Kubernetes clusters, you can write queries to get answers to complex questions about your environment. For example, you can easily write queries to list all of your Java libraries across services, to count the number of hosts per Agent version and per region. Thanks to Bits AI, if you’re not a SQL expert, you can write queries in natural language and get them translated to SQL queries.

Join our private beta by filling out this form.

Access infrastructure data using SQL with DDSQL Editor

Home in on root causes faster with Watchdog Explains on graphs

In Datadog, troubleshooting problems typically starts with checking graphs before branching out into investigating individual assets. Watchdog Explains is a powerful new investigation assistant that instantly guides you to the root cause of anomalies on any timeseries graph. Watchdog Explains auto-scans every graph on a dashboard to look for anomalies. It then compares the same timeseries data across each applicable tag group against the source graph to identify which ones represent that anomalous behavior. Watchdog Explains makes investigation more efficient by automatically showing which individual tags account for a given spike. This allows you to narrow in directly on problematic areas of your infrastructure or software stack. Read more about Watchdog’s capabilities in our documentation.

Watchdog Explains identifies anomaly root causes in your time series graphs

Investigate alerts faster with the revamped monitor status page

Engineers often must do a lot of manual work when investigating an alert. With the broad set of products Datadog offers, engineers can navigate through multiple views and troubleshoot an alert from different angles. This can become overwhelming and time-consuming. We have revamped the monitor status page in order to better support responder workflows. The monitor status page now surfaces more key information that provides context around the alert, and gives you direct access to troubleshooting tools for faster investigation and identification of the root cause. To access this revamped page and give responders a supportive troubleshooting experience, fill out this form and join the private beta.

Troubleshoot infrastructure problems faster with Recent Changes

Infrastructure changes often trigger incidents, but troubleshooting these incidents is challenging when responders have to navigate through multiple tools to correlate telemetry with configuration changes.

Datadog now streamlines incident troubleshooting by making critical infrastructure information accessible to you from within dashboards and monitor status pages, the starting points of a typical investigative workflow. From these locations, you can now open a Resource side panel by clicking on any timeseries graph widgets, enabling you to access detailed telemetry, monitor insights, and—through a new Recent Changes tab—inspect configuration changes. Seeing all this information consolidated in the typical contexts in which you perform infrastructure troubleshooting makes it much easier for you to identify probable root causes of infrastructure issues and take action to remediate them sooner.

For more information, see our blog post.

Perform more complex analysis of your metrics data with nested queries

We’re excited to announce that you can now flexibly create more robust analyses on any metric in Datadog with nested queries. Nested queries enable users to reuse the aggregated results of an existing metric query as input to a subsequent one. With nested queries, you can perform more complex metrics data analysis, such as:

Multilayer aggregation, or the ability to add additional aggregations on top of your metrics in both time and space
Percentiles and standard deviation calculations on aggregated count, rate, and gauge metrics
Higher-resolution queries over long, historical time frames

Nested queries are now in technical preview. Please fill out this private beta form for access to this feature in the future.

Datadog platform

Meet your observability availability and business continuity goals using Datadog Disaster Recovery

As customers migrate to the cloud and digitally transform their businesses, visibility into their infrastructure and applications using Datadog has become integral to their business continuity. Customers have deeply embedded Datadog in their workflows, from building software to delivering it to supporting their users. This means they may have to temporarily pause their code deployment pipelines and infrastructure troubleshooting activities if a rare, unexpected outage event impacted a cloud service provider region or Datadog services running within a cloud provider region.

Datadog Disaster Recovery (DDR) provides customers observability continuity in such rare outage events, enabling them to meet their observability, availability, and business continuity goals. Using DDR, customers can recover live observability at an alternate, functional Datadog site in typically under an hour. With DDR, customers can also periodically conduct disaster recovery drills, not only to test their ability to recover from outage events but also meet their business and regulatory compliance needs. This new capability complements Datadog’s existing high-availability platform capabilities that provide customers with continuous observability coverage within a single Datadog site. This new capability is available now in private beta. To request access, fill out this form.

Ensure observability continuity in rare outage events with Datadog Disaster Recovery.

Upgrade your Datadog Agents from Fleet Automation

Ensuring you have upgraded your fleet of Agents to the latest version can be a time-consuming task that often requires working with multiple deployment tools and coordinating among multiple internal service teams. With Fleet Automation, Datadog makes it easier for you to stay up to date with the latest Agent version with a few simple clicks, to ensure that you have access to the latest Agent features and performance improvements. Fleet Automation provides one central platform to upgrade all of your Agents, and provides complete visibility into Agent upgrade status and Agent versions deployed across your fleet. To easily upgrade your Agents, you can request access to the private beta here.

Upgrade your Datadog Agents from one location with Fleet Automation

Enable Datadog products from Fleet Automation

Ensuring you have consistent observability coverage requires standardized deployment of Datadog Agent configurations across your fleet of Agents. Fleet Automation provides the capability to enable and modify product configurations from one central platform with just a few simple clicks, while avoiding the need to redeploy your infrastructure. Enabling products from Fleet Automation allows you to get complete visibility into Datadog product deployment to ensure your internal teams are getting the most of Datadog. To easily get setup with Datadog products across your fleet of Agents, you can request access to the private beta here.

Enable Datadog products from one location with Fleet Automation

Monitor services running in AWS GovCloud (US) with Datadog Network Device Monitoring, Database Monitoring, and Cloud Security Management

We recently added Network Device Monitoring (NDM), Database Monitoring (DBM), and Cloud Security Management (CSM) to our FedRAMP Moderate Impact authorized region, making these products available to public-sector organizations operating in AWS GovCloud (US). The ability to use NDM, DBM, and CSM alongside other Datadog solutions helps government agencies monitor and secure their IT infrastructure in a unified platform. Learn more about using Datadog to monitor AWS GovCloud (US) infrastructure in our blog post.