Optimize and Troubleshoot Cloud Storage at Scale With Storage Monitoring | Datadog

Optimize and troubleshoot cloud storage at scale with Storage Monitoring

Author Mahashree Rajendran

Published: 12月 4, 2024

Organizations today rely on cloud object storage to power diverse workloads, from data analytics and machine learning pipelines to content delivery platforms. But as data volumes explode and storage patterns become more complex, teams often struggle to understand and proactively optimize their storage utilization. When issues arise—such as unexpected costs or performance bottlenecks—these teams frequently lack the visibility needed to quickly identify and resolve root causes.

To address these issues and provide critical visibility into your cloud storage infrastructure, we’re introducing Datadog Storage Monitoring. By providing both bucket- and prefix-level analytics for Amazon S3—with support for more providers to come—Storage Monitoring helps you understand exactly how your cloud storage is being used, detect potential issues before they impact operations, and make data-driven decisions about storage optimization.

In this post, we’ll discuss how you can use Datadog Storage Monitoring to:

Comprehensively monitor your cloud storage with bucket-level metrics

Bucket-level metrics enable high-level analysis of your S3 usage, performance, and costs. When you access Datadog Storage Monitoring from the Monitoring tab of the Resource Catalog, you’ll find a breakdown of your S3 resources by bucket (Storage Monitoring metrics are grouped by bucket by default), including metrics for storage consumption, object count distribution, latency patterns, request volume analysis, and more. This page also surfaces and summarizes a range of bucket-level issues:

  • Stale prefixes, which indicate unused data that could be inflating storage costs.
  • Sharp increases in prefix size, which may indicate unexpected application behavior or security issues.
  • Elevated latencies and error counts, which could disrupt application performance and have a negative impact on user experience. These issue summaries can help you expedite troubleshooting by quickly zeroing in on the affected buckets. You can also select any bucket from this page for a detailed overview of its contents (based on data prefixes), usage, and performance. Next, we’ll look at how you can use Storage Monitoring for more granular analysis of your S3 utilization and performance.

Get granular, prefix-level insights into the datasets powering your most important workloads

In S3, prefixes are used to organize data objects within buckets. With prefix-level analytics, Storage Monitoring enables you to understand the S3 utilization and performance associated with each of the subsets of data stored in your buckets. This type of visibility can be essential for analyzing your S3 usage and optimizing the health, performance, and costs of the datasets underpinning your most important workloads. Using prefix-level storage metrics to track prefix growth rates, write patterns, and object update frequencies can help DevOps and other teams stay ahead of a range of issues, from setbacks in application performance to cost overruns.

Prefix-level storage metrics can help DevOps and other teams stay ahead of a range of issues, from setbacks in application performance to cost overruns

For example, you can use prefix-level metrics to:

  • Manage costs: To stay ahead of rapid prefix growth and preempt spikes in storage costs, you can track the aws.s3.inventory.total_prefix_size metric. For example, you might want to use a change alert monitor to ensure that you’re notified of any prefix size increases of more than 50 percent within a 24-hour period.
  • Monitor data pipeline health: To detect delays in data delivery that might compromise downstream processing deadlines, you can compare the aws.s3.inventory.prefix_object_count and aws.s3.inventory.total_prefix_size metrics. By using a composite monitor on these metrics, you can ensure that data is flowing as expected in your pipelines by automatically checking for new (non-empty) files on a regular basis.
  • Track growth patterns: Abnormal data accumulation may indicate application issues. To help ensure a quick response to these issues, you can track prefix growth rates against historical patterns via anomaly monitors on the aws.s3.inventory.total_prefix_size and aws.s3.inventory.prefix_object_count metrics.
  • Optimize data organization: To ensure that your data organization is optimized to your workload-specific access patterns, you can use aws.s3.inventory.prefix_object_count to analyze how different file types and storage tiers are distributed across your prefixes.

Getting started

Storage Monitoring provides actionable visibility into your Amazon S3 usage with bucket- and prefix-level metrics that offer a comprehensive picture of your S3 usage and can help you proactively identify and quickly troubleshoot performance issues, investigate access patterns, and optimize costs. Storage Monitoring is currently in Preview, with more capabilities and support for other cloud storage providers coming soon—you can sign up here to receive updates and check out our documentation to learn more. And if you’re not yet a Datadog user, you can get started with a 14-day .