Key Metrics for Monitoring Snowflake Cost and Data Quality | Datadog

Key metrics for monitoring Snowflake cost and data quality

Author Nicholas Thomson

Published: 1月 7, 2025

Snowflake is a self-managed data platform that enables users to easily store, process, analyze, and share high volumes of structured and semi-structured data. One of the most popular data platforms on the market, Snowflake has gained widespread adoption because it addresses a range of data challenges with a unified, scalable, and high-performance platform. Snowflake’s flexibility enables users to handle diverse workloads, such as data lake and data warehouse integration. Built-in dashboards and integration with business intelligence (BI) tools like Tableau and Power BI make Snowflake an ideal solution for business analytics, reporting, and other use cases. Data scientists use Snowflake’s integrations with tools like Python and TensorFlow to train machine learning (ML) models, and media platforms can use real-time analytics from Snowflake to track user engagement and optimize content delivery.

Because teams use Snowflake for a wide variety of business-critical use cases, it’s important to ensure that their data in Snowflake is high-quality and processed efficiently. Additionally, Snowflake costs can add up, particularly if you are using Snowflake for multiple purposes or processing large datasets. For these reasons, monitoring Snowflake is important for many different kinds of teams. This series of blog posts will guide you through monitoring Snowflake. In this post, we’ll show you:

What is Snowflake?

Snowflake is a comprehensive platform for storing and processing data in the cloud, and one of its main differentiating features is its architecture, which decouples its compute layer from its storage layer. Snowflake is built on top of the major cloud platforms (AWS, Azure, and Google Cloud), but it abstracts the underlying complexity of tasks like provisioning, configuring, and managing storage infrastructure so users do not need to manually interact with the underlying cloud storage. Snowflake leverages the cloud providers’ native object storage so that persisted data is accessible across the entirety of your Snowflake environment.

Snowflake’s decoupled architecture combines the data-management simplicity of a shared-disk architecture with the performance and scale-out benefits of a shared-nothing architecture. Its centralized cloud storage layer—accessible by all compute resources (virtual warehouses) within the platform—acts similarly to a shared-disk architecture. Since all compute clusters access the same underlying data, there’s no need to replicate or move data across nodes or clusters. This makes data sharing, backups, and disaster recovery much simpler and more efficient.

Meanwhile, Snowflake’s compute layer uses a shared-nothing architecture. In this type of system, each compute node operates independently, with its own local memory and compute resources. Because there is no sharing of memory or CPU between nodes, Snowflake can handle large datasets with fast, high-performance query processing. It can also handle concurrent queries from different users simultaneously without performance degradation.

Snowflake's decoupled architecture
Snowflake's architecture

The nodes that comprise Snowflake’s compute layer are called virtual warehouses. These are compute clusters composed of multiple virtual machines (VMs) that perform tasks such as querying, processing, and analyzing the data stored in the storage layer. This layer of Snowflake is an example of massively parallel processing (MPP), a computational model that allows for the simultaneous processing of large datasets by distributing load and processing across nodes.

Snowflake’s architecture provides a number of benefits that contribute to its popularity. Teams can optimize costs by scaling storage and compute independently. Additionally, because multiple compute clusters can access the same data storage independently, users can run queries concurrently without resource contention. This separate architecture also makes it easy to isolate workloads, which means teams can assign dedicated compute resources to workloads without affecting others, ensuring predictable performance for critical tasks and resource-intensive processes.

However, Snowflake’s complexity results in unique monitoring challenges to ensure the platform runs efficiently and cost-effectively. These include:

While Snowflake generates a vast number of metrics, we’ve highlighted the essential ones to monitor so you can stay on top of these three specific areas. We’ll explain the key metrics for each category in more detail in the following sections.

Monitor Snowflake metrics to optimize compute

A major concern of Snowflake SysAdmins is usage and cost. They will likely want to monitor Snowflake usage metrics to understand their organization’s spending, including what is driving costs, what’s changed month over month, and who is using Snowflake to drive up the bill. To gain these insights, it’s essential to monitor warehouse query performance, because virtual warehouses consume billable compute as Snowflake credits as they execute jobs, load data, and perform other Data Manipulation Language (DML) operations.

Schemas and views

To find metrics on warehouse query performance, it’s first important to understand where Snowflake metrics come from. You can explore Snowflake system metrics using Snowsight—Snowflake’s built-in web interface, or by writing SQL queries against views in various schemas. Data on your account’s Snowflake usage is maintained in databases, each of which consists of one or more schemas, which are logical groupings of database objects (tables, views, etc.). Users can write custom views in SQL or use out-of-the-box Snowflake views to query metrics on usage.

The ACCOUNT_USAGE schema can be found in the Snowflake-provided database called SNOWFLAKE, which contains views with usage and performance-related information for your account. This schema can be queried with views that surface execution times of queries and tasks, as well as views that surface the load on warehouses as they execute queries and tasks. It’s important to note that views and tables in ACCOUNT_USAGE lag one to three hours behind real time.

Let’s say you’re a SysAdmin who wants to determine if a warehouse of inappropriate size is being used. In Snowflake, a virtual warehouse’s compute costs are determined by its size, with larger warehouses consuming more credits per hour, meaning a higher cost for running queries. Improperly sized warehouses can add unnecessary cost to your bill. For example, using a large warehouse to run a small query is a waste of compute resources. Conversely, running a large query on a smaller warehouse could cause failures or timeouts (which also means wasted compute). Right-sizing virtual warehouses is usually the single most impactful way you can reduce your Snowflake costs and often results in performance gains that outweigh the cost of increased warehouse size.

Query history for compute

The QUERY_HISTORY view, which can be queried from the ACCOUNTS_USAGE schema, surfaces metrics that can help you right-size your virtual warehouses. There are three metrics in this schema that are particularly important.

Metrics to watch: EXECUTION_TIME, BYTES_WRITTEN, BYTES_SCANNED

NameDescriptionMetric TypeAvailability
EXECUTION_TIMEThe time it takes each query to executeWork: PerformanceSnowsight
BYTES_WRITTENThe volume of write queries being processed and storedWork: ThroughputSnowsight
BYTES_SCANNEDThe total volume of queries being processed and storedWork: ThroughputSnowsight

A low value for EXECUTION_TIME, BYTES_WRITTEN,or BYTES_SCANNED could indicate that the workload is more appropriate for a smaller warehouse. Conversely, if these values are high, you might consider moving those queries to a larger warehouse. If you see a heterogeneous mix of differently sized queries running on a single warehouse, you could consider redistributing these workloads into appropriately sized warehouses.

Warehouse load history

Warehouse load history metrics show you resource utilization for a single warehouse within a specified date range. It’s important to measure these metrics because they indicate what you are spending your compute credits on. To surface these metrics, you can run the WAREHOUSE_LOAD_HISTORY function on any warehouse. (It’s important to note that this function only surfaces the last 14 days’ worth of data. You can view this data bucketed over a longer time range via the Admin tab in Snowsight.) The following are important virtual warehouse load metrics to monitor.

Metrics to watch: AVG_RUNNING, AVG_QUEUED_LOAD, AVG_BLOCKED

NameDescriptionMetric TypeAvailability
AVG_RUNNINGThe average number of queries actively running on a virtual warehouse at any given timeWork: ThroughputSnowsight
AVG_QUEUED_LOADThe average number of data loading tasks that are queued because the warehouse was overloadedWork: PerformanceSnowsight
AVG_BLOCKEDThe number of queries blocked and unable to execute due to resource contention or conflicting operationsWork: ErrorSnowsight

AVG_RUNNING can provide valuable insights into the resource utilization of your virtual warehouses. For example, when the AVG_RUNNING metric shows a high value at a regular time of day, it can provide visibility into resource contention during peak times and help in planning resource allocation—for example, you might enable auto-scaling for virtual warehouses or reschedule non-essential workloads to off-peak times. Additionally, AVG_RUNNING can help you determine whether to scale up or down virtual warehouses based on usage patterns.

High AVG_QUEUED_LOAD values relative to a historical baseline can indicate bottlenecks in your system. To resolve bottlenecks, you might increase your min/max cluster counts. High values in AVG_BLOCKED suggest that too many queries are contending for the same resources, such as compute power or disk I/O, or are blocked from access to resources by data partitions. To handle excessive concurrent queries on the same virtual warehouse, you could enable auto-scaling to dynamically add clusters during high-concurrency periods.

Warehouse metering history

The WAREHOUSE_METERING_HISTORY view, which can be queried from the ACCOUNT_USAGE schema, shows you hourly credit usage for a single warehouse (or all the warehouses in your account) within the last 365 days. You can use this view to surface metrics about how your virtual warehouses are directly leading to your Snowflake spend. We’ll highlight a few key metrics to keep track of here.

Metrics to watch: CREDITS_USED, CREDITS_USED_COMPUTE, CREDITS_BILLED, CREDITS_ADJUSTMENT_CLOUD_SERVICES

NameDescriptionMetric TypeAvailability
CREDITS_USEDThe total number of credits used for the warehouse in the hourResource: UtilizationSnowsight
CREDITS_USED_COMPUTEThe total of all compute spent on queries, insertions, deletions, and all other DML operations within the last hourResource: UtilizationSnowsight
CREDITS_BILLEDThe cost of background operations and system services that support your Snowflake environment (e.g., query optimization, caching, auto-scaling)Resource: UtilizationSnowsight
CREDITS_ADJUSTMENT_CLOUD_SERVICESCredit refunds applied to cloud servicesResource: UtilizationSnowsight

CREDITS_USED is a sum of CREDITS_USED_COMPUTE and CREDITS_USED_CLOUD_SERVICES. This metric directly correlates to the amount you are spending on a virtual warehouse. A high value of CREDITS_USED_COMPUTE relative to a historical baseline can indicate that there are opportunities to optimize query performance, adjust warehouse sizes, or refine task schedules to reduce cost. CREDITS_BILLED can be found in the METERING_DAILY_HISTORY view. This metric is a sum of CREDITS_USED_COMPUTE, CREDITS_USED_CLOUD_SERVICES, and CREDITS_ADJUSTMENT_CLOUD_SERVICES. These should be relatively predictable and minimal in proportion to total credit usage. Ideally, CREDITS_USED_CLOUD_SERVICES will be entirely offset by CREDITS_ADJUSTMENT_CLOUD_SERVICES. If you see a significantly higher value for CREDITS_BILLED than for CREDITS_USED, you may be performing a high volume of potentially unnecessary operations that use cloud services, and reducing that volume should lower your bill.

Task history

Snowflake tasks are scheduled processes that automate the execution of SQL statements, such as queries or data transformations, at specified intervals or in response to certain conditions. Tasks are commonly used to automate data pipeline workflows, ETL/ELT processes, and other repetitive operations in Snowflake. The ACCOUNT_USAGE schema can be queried with the TASK_HISTORY view to surface metrics on the history of task usage in the past year. Tasks can be a big source of compute spend for your virtual warehouses, so they are important to monitor to get a full picture of where your costs are coming from. Here are some of the important task-related metrics to stay aware of.

Metrics to watch: STATE, ERROR_MESSAGE

NameDescriptionMetric TypeAvailability
STATEThe status of a completed taskEvent: AlertSnowsight
ERROR_MESSAGEAn explanation of why a task failedEvent: AlertSnowsight

It’s important to monitor STATE to ensure that your tasks are running as expected. An elevated number of FAILED or CANCELED tasks can indicate bottlenecks in your system, which can degrade performance and add to your compute spend. Monitoring ERROR_MESSAGE can help provide additional context into why tasks failed, helping you troubleshoot. For example, Syntax error at or near suggests there is a mistake in the SQL code of the task, and a review or correction of the syntax is necessary.

Clustering history

Clustering in Snowflake helps maintain data organization based on clustering keys without manual intervention, but it uses compute resources to reorganize data in micro-partitions as data is inserted, updated, or deleted. Monitoring the AUTOMATIC_CLUSTERING_HISTORY view in Snowflake is essential for managing compute costs associated with automatic clustering, especially in environments with large, frequently updated tables. Here are the most important clustering metrics to monitor.

Metrics to watch: CREDITS_USED, NUM_ROWS_RECLUSTERED

NameDescriptionMetric TypeAvailability
CREDITS_USEDThe total credits consumed by automatic clustering operations for each entryResource: UtilizationSnowsight
NUM_ROWS_RECLUSTEREDThe total number of rows reclusteredWork: ThroughputSnowsight

High CREDIT_USAGE indicates tables or operations that are particularly resource-intensive. For example, if a specific table is showing high CREDIT_USAGE, consider reviewing its clustering key to see if it’s optimal (e.g., a table storing sales data with frequent queries filtering on sale_date would benefit from clustering on this column), or if the table truly needs automatic clustering enabled. If a table consistently shows high values for NUM_ROWS_RECLUSTERED,it means significant compute resources are being used to recluster this data. This may suggest the need to reassess clustering strategies or consider alternatives if clustering is not yielding substantial performance benefits.

Monitor Snowflake metrics to optimize storage

In addition to monitoring your virtual warehouses’ contribution to compute usage, it’s also important to monitor your storage to gain full visibility into your Snowflake spend. Storage pricing is typically lower than compute, so it generally consumes fewer credits relative to virtual warehouse usage, but it is still important to monitor it to gain a full picture of your spend. The cost of storage in Snowflake is a flat, monthly rate per terabyte.

Database usage history

Snowflake data is mainly stored in databases. One of the main ways you can view metrics on your database storage is with the DATABASE_STORAGE_USAGE_HISTORY view, which can be queried from the ACCOUNT_USAGE schema. This view shows you the average daily storage usage for databases.

It’s important to note that Snowflake gives you metrics on table size for the current point in time, not historically. This means you can estimate how much a table will cost—for example, this month—based on how large it is now, but you can’t see how much it cost you in the past. One way around this is to keep your own history of system tables by regularly extracting and storing relevant data from Snowflake’s metadata views or system tables into your own database or data storage solution. Here are some of the most important database metrics to monitor.

Metrics to watch: USAGE_DATE, AVERAGE_DATABASE_BYTES

NameDescriptionMetric TypeAvailability
USAGE_DATEThe date storage of a database was usedResource: UtilizationSnowsight
AVERAGE_DATABASE_BYTESThe average storage consumption of databases over timeResource: UtilizationSnowsight

Monitoring USAGE_DATE over time allows you to observe storage trends in each database. It helps in identifying gradual or sudden increases in storage usage, which can signal growing datasets or an increase in historical data retention. AVERAGE_DATABASE_BYTES provides insights into cost optimization, capacity planning, and understanding data growth trends. A consistent increase in either of these metrics could indicate that archiving or data pruning strategies need to be implemented to avoid excessive storage costs.

Stage storage history

A Snowflake stage is a location where data files are stored temporarily before they are loaded into Snowflake tables or unloaded from Snowflake to external storage. It’s important to monitor stages, as they can accumulate large volumes of data, especially in environments with frequent or continuous data ingestion, and lead to increased storage costs. Monitoring stages helps prevent them from being overwhelmed with unused or old files.

You can monitor your stages using the STAGE_STORAGE_USAGE_HISTORY view, which can be queried from the ACCOUNT_USAGE schema. There are two metrics in this view that are the most important.

Metrics to watch: USAGE_DATE, AVERAGE_STAGE_BYTES

NameDescriptionMetric TypeAvailability
USAGE_DATEThe date storage of the stage was usedResource: UtilizationSnowsight
AVERAGE_STAGE_BYTESThe average number of bytes of stage storage usedResource: UtilizationSnowsight

USAGE_DATE provides valuable insights into storage usage of stages over time. For example, if USAGE_DATE shows consistently high usage for a stage that only supports periodic loading jobs, this may signal that files aren’t being cleared after use, leading to ongoing storage costs. Note that this metric has the same name as the metric in the previous section, but they appear in different views (DATABASE_STORAGE_USAGE_HISTORY versus STAGE_STORAGE_USAGE_HISTORY) and will represent different values—database storage usage date and stage storage usage date, respectively.

Monitoring AVERAGE_STAGE_BYTES helps identify trends in stage storage usage, revealing whether storage requirements are increasing or stable. For example, a steady increase in AVERAGE_STAGE_BYTES may indicate that data loads are growing over time, signaling a need for either storage expansion or data archiving strategies.

Monitor Snowflake metrics to optimize data quality and freshness

Data analysts produce datasets, create tables, and write queries to Snowflake. As the dataset owners, they want to ensure that query performance is high, particularly for frequently used query patterns; that data load times are optimized to avoid bottlenecks; and that data quality is high, so that insights derived from the data are insightful and accurate. This means that data analysts need to monitor Snowflake at the dataset level to answer questions like: how many rows was the last insert? Which tables are being joined together? What are the query patterns on a table? And are queries against a particular table slow?

Query history for data quality

One way to track DML operations (e.g., inserts and joins) in your databases is via the Query History view, which can be used to surface Snowflake query history by various dimensions (time range, session, user, warehouse, etc.) within the last year. The view is available in both the ACCOUNT_USAGE and READER_ACCOUNT_USAGE schemas, and contains a number of important metrics to monitor.

Metrics to watch: ROWS_INSERTED, ROWS_PRODUCED, EXECUTION_TIME, QUERY_RETRY_TIME, QUERY_RETRY_CAUSE

NameDescriptionMetric TypeAvailability
ROWS_INSERTEDThe number of rows inserted by a queryWork: ThroughputSnowsight
ROWS_PRODUCEDThe number of rows that have been returned by a query or operation (e.g., SELECT)Work: ThroughputSnowsight
EXECUTION_TIMEThe time it takes a query to executeWork: PerformanceSnowsight
QUERY_RETRY_TIMEHow long Snowflake spends attempting to re-execute a query, either due to transient issues (e.g., network interruptions, resource contention) or system-level failuresWork: PerformanceSnowsight
QUERY_RETRY_CAUSEThe reason for a query being retriedWork: PerformanceSnowsight

Unusual fluctuations in ROWS_INSERTED could signal issues with data quality or unexpected changes in data sources, such as duplicates, errors, or missing records. For example, if the number of rows inserted drops significantly or varies drastically from the norm, it may indicate a problem in data source reliability or an extract, transform, and load (ETL) error, prompting a review to ensure data quality.

Monitoring ROWS_PRODUCED helps you identify resource-intensive queries and ensure data quality. For example, if a query returns an unexpectedly high number of rows, it may indicate underlying issues with the integrity or accuracy of your data (e.g., the query is missing filters or otherwise incorrectly scoped, leading to unnecessary data being retrieved).

Slow-running queries in ETL processes can delay data updates, leading to outdated or incomplete data being available to users. Monitoring EXECUTION_TIME helps detect bottlenecks that could compromise data freshness. For example, if queries that load or transform data into critical tables are taking longer than expected, you may be delivering stale data, which impacts data accuracy for time-sensitive analytics.

A high QUERY_RETRY_TIME could indicate partial or incomplete data being written to tables, which can lead to data inconsistencies, especially if retries do not fully resolve the issue. For example, a high retry time on a query that populates a staging table could mean that some rows were not loaded on time, creating gaps or duplicates if retries don’t successfully reprocess all records.

Understanding the cause of query retries is essential for diagnosing and addressing issues that affect query execution and overall performance. QUERY_RETRY_CAUSE helps you understand why a query failed and had to be retried. It provides insights into whether the failure was due to a transient system issue, resource contention, or other factors.

Query hashes

While monitoring individual queries is vital to maintaining data quality, teams may also want to monitor patterns in their queries. To do so, it can be useful to include query hashes as a dimension of your inquiries into Snowflake data. The QUERY_HASH view can be surfaced via the ACCOUNT_USAGE schema.

The QUERY_HASH metric in Snowflake is a unique identifier mapping to a specific query’s execution plan. The query hash remains the same for queries that have the same structure but may vary slightly due to different parameters or values. QUERY_HASH can be a valuable additional dimension to use when looking at other Snowflake metrics. You may gain insights out of looking at average EXECUTION_TIME or a total COUNT of queries by QUERY_HASH. This can help you find patterns of workloads with high usage that would most benefit from optimization (at query, warehouse, or dataset levels.)

Metrics to watch: QUERY_PARAMETERIZED_HASH

NameDescriptionMetric TypeAvailability
QUERY_PARAMETERIZED_HASHAbstracts away the literal values (e.g., dates, IDs, or filter conditions), grouping together queries that follow the same pattern but with different parametersWork: ThroughputSnowsight

Monitoring QUERY_PARAMETERIZED_HASH in conjunction with other Snowflake metrics can help you pinpoint queries that are resource-intensive or underperforming, and prioritize optimization efforts. For example, you could identify query hashes with the highest values for ROWS_READ, and then optimize their SQL logic or data structures (e.g., clustering, partitioning) to reduce compute spend.

Query operators

Query operators are symbols or keywords (e.g., + and -, UNION, ANY, etc.) used within SQL queries to perform various operations on data. Just as it can be useful to monitor query patterns with the QUERY_HASH metric, it can be helpful to monitor query operators to gain more granular visibility into these query patterns. The GET_QUERY_OPERATOR_STATS() table function can be used to return statistics about individual query operators within queries executed in the past 14 days. This information is also available in the query profile tab in Snowsight.

Metrics to watch: OPERATOR_STATISTICS, EXECUTION_TIME_BREAKDOWN

NameDescriptionMetric TypeAvailability
OPERATOR_STATISTICSStatistics on the performance of individual operators within a query execution planWork: PerformanceSnowsight
EXECUTION_TIME_BREAKDOWNThe time spent on individual operators during query executionWork: PerformanceSnowsight

OPERATOR_STATISTICS can reveal imbalances in join operations, where one table is significantly larger than the other. Data skew—an uneven distribution of data across nodes or partitions—can cause incomplete joins, incorrect aggregations, or performance issues, which all impact data accuracy. For example, if a join operator shows an unusually high processing time or resource usage, it may indicate that data skew is causing inefficient processing, potentially leading to missing or duplicated rows in the output.

Long EXECUTION_TIME_BREAKDOWN values can suggest inefficient query logic or data structures, which might yield partial or incorrect results. For example, if scan operators are showing high disk I/O times, it might indicate inefficient data access patterns (e.g., full table scans) that could be optimized with better indexing, partitioning, or more selective filters.

Tables

A Snowflake table is a structured object in Snowflake that stores data in a relational format—this is where data is stored and organized for querying, analysis, and processing. It can be useful to monitor Snowflake tables to understand where clustering might help improve data quality and query performance.

Snowflake uses clustering keys to organize data into micro-partitions, which are smaller chunks of data stored on disk. Clustering large tables can optimize query performance by minimizing the data scanned during execution. The TABLES view, which can be queried from from the ACCOUNT_USAGE schema, shows you important metrics, which we will highlight here.

Metrics to watch: CLUSTERING_KEY, AUTO_CLUSTERING_ON

NameDescriptionMetric TypeAvailability
CLUSTERING_KEYThe columns or expressions that comprise the clustering keyEvent: ChangeSnowsight
AUTO_CLUSTERING_ONIndicates whether Automatic Clustering is enabled for a tableEvent: AlertSnowsight

Monitoring CLUSTERING_KEY is crucial for data quality and performance because clustering keys help Snowflake organize data within micro-partitions, which optimize query efficiency and ensure that data retrieval is accurate and timely. For example, if the clustering key metric shows high clustering depth (a measure of clustering inefficiency), it may indicate that data is poorly organized within partitions, which can lead to slower query performance and potentially incomplete results due to inefficient data retrieval.

Over time, as data is inserted, updated, or deleted, the table’s structure can become less optimal, potentially degrading query performance. Auto-clustering solves this by automatically re-clustering data in the background without manual intervention, ensuring that data is stored efficiently. Enabling AUTO_CLUSTERING_ON is generally a good idea for large tables; tables with frequent updates, inserts, or deletes; tables with frequent queries on specific columns; and tables with irregular data distribution.

Monitor Snowflake to ensure the performance of queries and virtual warehouses

Snowflake is a popular data platform, and in this post, we’ve explored the key metrics you can monitor to gain visibility into virtual warehouse performance, database storage, and dataset-level operations. In Part 2, we’ll look at the tools available to collect and visualize key Snowflake metrics. And in Part 3, we’ll show you how you can use Datadog to monitor the health and performance of your Snowflake virtual warehouses, databases, and data quality.