Key metrics for monitoring Snowflake cost and performance

database monitoring snowflake database data analytics data observability

20 minute read

Published

Jan 7, 2025

Nicholas Thomson

Snowflake is a self-managed data platform that enables users to easily store, process, analyze, and share high volumes of structured and semi-structured data. One of the most popular data platforms on the market, Snowflake has gained widespread adoption because it addresses a range of data challenges with a unified, scalable, and high-performance platform. Snowflake's flexibility enables users to handle diverse workloads, such as data lake and data warehouse integration. Built-in dashboards and integration with business intelligence (BI) tools like Tableau and Power BI make Snowflake an ideal solution for business analytics, reporting, and other use cases. Data scientists use Snowflake's integrations with tools like Python and TensorFlow to train machine learning (ML) models, and media platforms can use real-time analytics from Snowflake to track user engagement and optimize content delivery.

Because teams use Snowflake for a wide variety of business-critical use cases, it's important to ensure that their data in Snowflake is high-quality and processed efficiently. Additionally, Snowflake costs can add up, particularly if you are using Snowflake for multiple purposes or processing large datasets. For these reasons, monitoring Snowflake is important for many different kinds of teams. This series of blog posts will guide you through monitoring Snowflake. In this post, we’ll show you:

What Snowflake is
Metrics to monitor to optimize compute
Metrics to monitor to optimize storage
Metrics to monitor to optimize query performance

What is Snowflake?

Snowflake is a comprehensive platform for storing and processing data in the cloud, and one of its main differentiating features is its architecture, which decouples its compute layer from its storage layer. Snowflake is built on top of the major cloud platforms (AWS, Azure, and Google Cloud), but it abstracts the underlying complexity of tasks like provisioning, configuring, and managing storage infrastructure so users do not need to manually interact with the underlying cloud storage. Snowflake leverages the cloud providers' native object storage so that persisted data is accessible across the entirety of your Snowflake environment.

Snowflake's decoupled architecture combines the data-management simplicity of a shared-disk architecture with the performance and scale-out benefits of a shared-nothing architecture. Its centralized cloud storage layer—accessible by all compute resources (virtual warehouses) within the platform—acts similarly to a shared-disk architecture. Since all compute clusters access the same underlying data, there’s no need to replicate or move data across nodes or clusters. This makes data sharing, backups, and disaster recovery much simpler and more efficient.

Meanwhile, Snowflake’s compute layer uses a shared-nothing architecture. In this type of system, each compute node operates independently, with its own local memory and compute resources. Because there is no sharing of memory or CPU between nodes, Snowflake can handle large datasets with fast, high-performance query processing. It can also handle concurrent queries from different users simultaneously without performance degradation.

The nodes that comprise Snowflake’s compute layer are called virtual warehouses. These are compute clusters composed of multiple virtual machines (VMs) that perform tasks such as querying, processing, and analyzing the data stored in the storage layer. This layer of Snowflake is an example of massively parallel processing (MPP), a computational model that allows for the simultaneous processing of large datasets by distributing load and processing across nodes.

Snowflake's architecture provides a number of benefits that contribute to its popularity. Teams can optimize costs by scaling storage and compute independently. Additionally, because multiple compute clusters can access the same data storage independently, users can run queries concurrently without resource contention. This separate architecture also makes it easy to isolate workloads, which means teams can assign dedicated compute resources to workloads without affecting others, ensuring predictable performance for critical tasks and resource-intensive processes.

However, Snowflake's complexity results in unique monitoring challenges to ensure the platform runs efficiently and cost-effectively. These include:

Compute monitoring: tracking the utilization, scaling, and performance of virtual warehouses
Storage monitoring: monitoring data storage and retention policies
Dataset-level monitoring: ensuring efficient query execution across compute nodes

While Snowflake generates a vast number of metrics, we've highlighted the essential ones to monitor so you can stay on top of these three specific areas. We'll explain the key metrics for each category in more detail in the following sections. The terminology we’ll use to describe these metrics refers to the categories from our Monitoring 101 series, which provides a framework for metric collection and alerting.

Monitor Snowflake metrics to optimize compute

A major concern of Snowflake SysAdmins is usage and cost. They will likely want to monitor Snowflake usage metrics to understand their organization's spending, including what is driving costs, what’s changed month over month, and who is using Snowflake to drive up the bill. To gain these insights, it's essential to monitor warehouse query performance, because virtual warehouses consume billable compute as Snowflake credits as they execute jobs, load data, and perform other Data Manipulation Language (DML) operations.

Schemas and views

To find metrics on warehouse query performance, it's first important to understand where Snowflake metrics come from. You can explore Snowflake system metrics using Snowsight—Snowflake's built-in web interface, or by writing SQL queries against views in various schemas. Data on your account's Snowflake usage is maintained in databases, each of which consists of one or more schemas, which are logical groupings of database objects (tables, views, etc.). Users can write custom views in SQL or use out-of-the-box Snowflake views to query metrics on usage.

The ACCOUNT_USAGE schema can be found in the Snowflake-provided database called SNOWFLAKE, which contains views with usage and performance-related information for your account. This schema can be queried with views that surface execution times of queries and tasks, as well as views that surface the load on warehouses as they execute queries and tasks. It's important to note that views and tables in ACCOUNT_USAGE lag one to three hours behind real time.

Let’s say you're a SysAdmin who wants to determine if a warehouse of inappropriate size is being used. In Snowflake, a virtual warehouse's compute costs are determined by its size, with larger warehouses consuming more credits per hour, meaning a higher cost for running queries. Improperly sized warehouses can add unnecessary cost to your bill. For example, using a large warehouse to run a small query is a waste of compute resources. Conversely, running a large query on a smaller warehouse could cause failures or timeouts (which also means wasted compute). Right-sizing virtual warehouses is usually the single most impactful way you can reduce your Snowflake costs and often results in performance gains that outweigh the cost of increased warehouse size.

Query history for compute

The QUERY_HISTORY view, which can be queried from the ACCOUNTS_USAGE schema, surfaces metrics that can help you right-size your virtual warehouses. There are three metrics in this schema that are particularly important.

Metrics to watch: `EXECUTION_TIME`, `BYTES_WRITTEN`, `BYTES_SCANNED`

Name	Description	Metric Type	Availability
`EXECUTION_TIME`	The time it takes each query to execute	Work: Performance	Snowsight
`BYTES_WRITTEN`	The volume of write queries being processed and stored	Work: Throughput	Snowsight
`BYTES_SCANNED`	The total volume of queries being processed and stored	Work: Throughput	Snowsight

A low value for EXECUTION_TIME, BYTES_WRITTEN,or BYTES_SCANNED could indicate that the workload is more appropriate for a smaller warehouse. Conversely, if these values are high, you might consider moving those queries to a larger warehouse. If you see a heterogeneous mix of differently sized queries running on a single warehouse, you could consider redistributing these workloads into appropriately sized warehouses.

Warehouse load history

Warehouse load history metrics show you resource utilization for a single warehouse within a specified date range. It's important to measure these metrics because they indicate what you are spending your compute credits on. To surface these metrics, you can run the WAREHOUSE_LOAD_HISTORY function on any warehouse. (It's important to note that this function only surfaces the last 14 days' worth of data. You can view this data bucketed over a longer time range via the Admin tab in Snowsight.) The following are important virtual warehouse load metrics to monitor.

Metrics to watch: `AVG_RUNNING`, `AVG_QUEUED_LOAD`, `AVG_BLOCKED`

Name	Description	Metric Type	Availability
`AVG_RUNNING`	The average number of queries actively running on a virtual warehouse at any given time	Work: Throughput	Snowsight
`AVG_QUEUED_LOAD`	The average number of data loading tasks that are queued because the warehouse was overloaded	Work: Performance	Snowsight
`AVG_BLOCKED`	The number of queries blocked and unable to execute due to resource contention or conflicting operations	Work: Error	Snowsight

AVG_RUNNING can provide valuable insights into the resource utilization of your virtual warehouses. For example, when the AVG_RUNNING metric shows a high value at a regular time of day, it can provide visibility into resource contention during peak times and help in planning resource allocation—for example, you might enable auto-scaling for virtual warehouses or reschedule non-essential workloads to off-peak times. Additionally, AVG_RUNNING can help you determine whether to scale up or down virtual warehouses based on usage patterns.

High AVG_QUEUED_LOAD values relative to a historical baseline can indicate bottlenecks in your system. To resolve bottlenecks, you might increase your min/max cluster counts. High values in AVG_BLOCKED suggest that too many queries are contending for the same resources, such as compute power or disk I/O, or are blocked from access to resources by data partitions. To handle excessive concurrent queries on the same virtual warehouse, you could enable auto-scaling to dynamically add clusters during high-concurrency periods.

Warehouse metering history

The WAREHOUSE_METERING_HISTORY view, which can be queried from the ACCOUNT_USAGE schema, shows you hourly credit usage for a single warehouse (or all the warehouses in your account) within the last 365 days. You can use this view to surface metrics about how your virtual warehouses are directly leading to your Snowflake spend. We'll highlight a few key metrics to keep track of here.

Metrics to watch: `CREDITS_USED`, `CREDITS_USED_COMPUTE`, `CREDITS_BILLED`, `CREDITS_ADJUSTMENT_CLOUD_SERVICES`

Name	Description	Metric Type	Availability
`CREDITS_USED`	The total number of credits used for the warehouse in the hour	Resource: Utilization	Snowsight
`CREDITS_USED_COMPUTE`	The total of all compute spent on queries, insertions, deletions, and all other DML operations within the last hour	Resource: Utilization	Snowsight
`CREDITS_BILLED`	The cost of background operations and system services that support your Snowflake environment (e.g., query optimization, caching, auto-scaling)	Resource: Utilization	Snowsight
`CREDITS_ADJUSTMENT_CLOUD_SERVICES`	Credit refunds applied to cloud services	Resource: Utilization	Snowsight

CREDITS_USED is a sum of CREDITS_USED_COMPUTE and CREDITS_USED_CLOUD_SERVICES. This metric directly correlates to the amount you are spending on a virtual warehouse. A high value of CREDITS_USED_COMPUTE relative to a historical baseline can indicate that there are opportunities to optimize query performance, adjust warehouse sizes, or refine task schedules to reduce cost. CREDITS_BILLED can be found in the METERING_DAILY_HISTORY view. This metric is a sum of CREDITS_USED_COMPUTE, CREDITS_USED_CLOUD_SERVICES, and CREDITS_ADJUSTMENT_CLOUD_SERVICES. These should be relatively predictable and minimal in proportion to total credit usage. Ideally, CREDITS_USED_CLOUD_SERVICES will be entirely offset by CREDITS_ADJUSTMENT_CLOUD_SERVICES. If you see a significantly higher value for CREDITS_BILLED than for CREDITS_USED, you may be performing a high volume of potentially unnecessary operations that use cloud services, and reducing that volume should lower your bill.

Task history

Snowflake tasks are scheduled processes that automate the execution of SQL statements, such as queries or data transformations, at specified intervals or in response to certain conditions. Tasks are commonly used to automate data pipeline workflows, ETL/ELT processes, and other repetitive operations in Snowflake. The ACCOUNT_USAGE schema can be queried with the TASK_HISTORY view to surface metrics on the history of task usage in the past year. Tasks can be a big source of compute spend for your virtual warehouses, so they are important to monitor to get a full picture of where your costs are coming from. Here are some of the important task-related metrics to stay aware of.

Metrics to watch: `STATE`, `ERROR_MESSAGE`

Name	Description	Metric Type	Availability
`STATE`	The status of a completed task	Event: Alert	Snowsight
`ERROR_MESSAGE`	An explanation of why a task failed	Event: Alert	Snowsight

It's important to monitor STATE to ensure that your tasks are running as expected. An elevated number of FAILED or CANCELED tasks can indicate bottlenecks in your system, which can degrade performance and add to your compute spend. Monitoring ERROR_MESSAGE can help provide additional context into why tasks failed, helping you troubleshoot. For example, Syntax error at or near suggests there is a mistake in the SQL code of the task, and a review or correction of the syntax is necessary.

Clustering history

Clustering in Snowflake helps maintain data organization based on clustering keys without manual intervention, but it uses compute resources to reorganize data in micro-partitions as data is inserted, updated, or deleted. Monitoring the AUTOMATIC_CLUSTERING_HISTORY view in Snowflake is essential for managing compute costs associated with automatic clustering, especially in environments with large, frequently updated tables. Here are the most important clustering metrics to monitor.

Metrics to watch: `CREDITS_USED`, `NUM_ROWS_RECLUSTERED`

Name	Description	Metric Type	Availability
`CREDITS_USED`	The total credits consumed by automatic clustering operations for each entry	Resource: Utilization	Snowsight
`NUM_ROWS_RECLUSTERED`	The total number of rows reclustered	Work: Throughput	Snowsight

High CREDIT_USAGE indicates tables or operations that are particularly resource-intensive. For example, if a specific table is showing high CREDIT_USAGE, consider reviewing its clustering key to see if it’s optimal (e.g., a table storing sales data with frequent queries filtering on sale_date would benefit from clustering on this column), or if the table truly needs automatic clustering enabled. If a table consistently shows high values for NUM_ROWS_RECLUSTERED,it means significant compute resources are being used to recluster this data. This may suggest the need to reassess clustering strategies or consider alternatives if clustering is not yielding substantial performance benefits.

Monitor Snowflake metrics to optimize storage

In addition to monitoring your virtual warehouses' contribution to compute usage, it's also important to monitor your storage to gain full visibility into your Snowflake spend. Storage pricing is typically lower than compute, so it generally consumes fewer credits relative to virtual warehouse usage, but it is still important to monitor it to gain a full picture of your spend. The cost of storage in Snowflake is a flat, monthly rate per terabyte.

Database usage history

Snowflake data is mainly stored in databases. One of the main ways you can view metrics on your database storage is with the DATABASE_STORAGE_USAGE_HISTORY view, which can be queried from the ACCOUNT_USAGE schema. This view shows you the average daily storage usage for databases.

It's important to note that Snowflake gives you metrics on table size for the current point in time, not historically. This means you can estimate how much a table will cost—for example, this month—based on how large it is now, but you can't see how much it cost you in the past. One way around this is to keep your own history of system tables by regularly extracting and storing relevant data from Snowflake's metadata views or system tables into your own database or data storage solution. Here are some of the most important database metrics to monitor.

Metrics to watch: `USAGE_DATE`, `AVERAGE_DATABASE_BYTES`

Name	Description	Metric Type	Availability
`USAGE_DATE`	The date storage of a database was used	Resource: Utilization	Snowsight
`AVERAGE_DATABASE_BYTES`	The average storage consumption of databases over time	Resource: Utilization	Snowsight

Monitoring USAGE_DATE over time allows you to observe storage trends in each database. It helps in identifying gradual or sudden increases in storage usage, which can signal growing datasets or an increase in historical data retention. AVERAGE_DATABASE_BYTES provides insights into cost optimization, capacity planning, and understanding data growth trends. A consistent increase in either of these metrics could indicate that archiving or data pruning strategies need to be implemented to avoid excessive storage costs.

Stage storage history

A Snowflake stage is a location where data files are stored temporarily before they are loaded into Snowflake tables or unloaded from Snowflake to external storage. It's important to monitor stages, as they can accumulate large volumes of data, especially in environments with frequent or continuous data ingestion, and lead to increased storage costs. Monitoring stages helps prevent them from being overwhelmed with unused or old files.

You can monitor your stages using the STAGE_STORAGE_USAGE_HISTORY view, which can be queried from the ACCOUNT_USAGE schema. There are two metrics in this view that are the most important.

Metrics to watch: `USAGE_DATE`, `AVERAGE_STAGE_BYTES`

Name	Description	Metric Type	Availability
`USAGE_DATE`	The date storage of the stage was used	Resource: Utilization	Snowsight
`AVERAGE_STAGE_BYTES`	The average number of bytes of stage storage used	Resource: Utilization	Snowsight

USAGE_DATE provides valuable insights into storage usage of stages over time. For example, if USAGE_DATE shows consistently high usage for a stage that only supports periodic loading jobs, this may signal that files aren’t being cleared after use, leading to ongoing storage costs. Note that this metric has the same name as the metric in the previous section, but they appear in different views (DATABASE_STORAGE_USAGE_HISTORY versus STAGE_STORAGE_USAGE_HISTORY) and will represent different values—database storage usage date and stage storage usage date, respectively.

Monitoring AVERAGE_STAGE_BYTES helps identify trends in stage storage usage, revealing whether storage requirements are increasing or stable. For example, a steady increase in AVERAGE_STAGE_BYTES may indicate that data loads are growing over time, signaling a need for either storage expansion or data archiving strategies.

Monitor Snowflake metrics to optimize query performance

Data analysts produce datasets, create tables, and write queries to Snowflake. As the dataset owners, they want to ensure that query performance is high—particularly for frequently used query patterns—and that data load times are optimized to avoid bottlenecks. This means that data analysts need to monitor how their queries are performing in Snowflake to answer questions like: how many rows was the last insert? Which tables are being joined together? What are the query patterns on a table? And are queries against a particular table slow?

Query history

One way to track DML operations (e.g., inserts and joins) in your databases is via the Query History view, which can be used to surface Snowflake query history by various dimensions (time range, session, user, warehouse, etc.) within the last year. The view is available in both the ACCOUNT_USAGE and READER_ACCOUNT_USAGE schemas, and contains a number of important metrics to monitor.

Metrics to watch: `EXECUTION_TIME`, `QUERY_RETRY_TIME`, `QUERY_RETRY_CAUSE`, `QUEUED_PROVISIONING_TIME`

Name	Description	Metric Type	Availability
`EXECUTION_TIME`	The time it takes a query to execute	Work: Performance	Snowsight
`QUERY_RETRY_TIME`	How long Snowflake spends attempting to re-execute a query, either due to transient issues (e.g., network interruptions, resource contention) or system-level failures	Work: Performance	Snowsight
`QUERY_RETRY_CAUSE`	The reason for a query being retried	Work: Performance	Snowsight
`QUEUED_PROVISIONING_TIME`	The time a query spends waiting in the warehouse queue to be provisioned	Work: Performance	Snowsight

Slow-running queries in ETL processes can delay data updates, leading to outdated or incomplete data being available to users. Monitoring EXECUTION_TIME helps detect bottlenecks that could compromise the accuracy of the insights derived from your data. For example, if queries that load or transform data into critical tables are taking longer than expected, you may be delivering stale data, which impacts data accuracy for time-sensitive analytics.

A high QUERY_RETRY_TIME could indicate partial or incomplete data being written to tables, which can lead to data inconsistencies, especially if retries do not fully resolve the issue. For example, a high retry time on a query that populates a staging table could mean that some rows were not loaded on time, creating gaps or duplicates if retries don’t successfully reprocess all records.

Understanding the cause of query retries is essential for diagnosing and addressing issues that affect query execution and overall performance. QUERY_RETRY_CAUSE helps you understand why a query failed and had to be retried. It provides insights into whether the failure was due to a transient system issue, resource contention, or other factors.

If QUEUED_PROVISIONING_TIME is high, it indicates that queries are being delayed due to the time it takes to allocate resources. Monitoring this metric helps distinguish delays caused by resource unavailability from inefficiencies in query design or data organization. Additionally, QUEUED_PROVISIONING_TIME can help analysts identify whether scaling configurations (e.g., tuning AUTO_RESUME settings to minimize the time warehouses spend idle, or increasing MAX_CLUSTER_SIZE) are appropriate for the workload.

Query hashes

While monitoring individual queries is vital to ensuring high performance, teams may also want to monitor patterns in their queries. To do so, it can be useful to include query hashes as a dimension of your inquiries into Snowflake data. The QUERY_HASH view can be surfaced via the ACCOUNT_USAGE schema.

The QUERY_HASH metric in Snowflake is a unique identifier mapping to a specific query's execution plan. The query hash remains the same for queries that have the same structure but may vary slightly due to different parameters or values. QUERY_HASH can be a valuable additional dimension to use when looking at other Snowflake metrics. You may gain insights out of looking at average EXECUTION_TIME or a total COUNT of queries by QUERY_HASH. This can help you find patterns of workloads with high usage that would most benefit from optimization (at query, warehouse, or dataset levels.)

Metrics to watch: `QUERY_PARAMETERIZED_HASH`

Name	Description	Metric Type	Availability
`QUERY_PARAMETERIZED_HASH`	Abstracts away the literal values (e.g., dates, IDs, or filter conditions), grouping together queries that follow the same pattern but with different parameters	Work: Throughput	Snowsight

Monitoring QUERY_PARAMETERIZED_HASH in conjunction with other Snowflake metrics can help you pinpoint queries that are resource-intensive or underperforming, and prioritize optimization efforts. For example, you could identify query hashes with the highest values for ROWS_READ, and then optimize their SQL logic or data structures (e.g., clustering, partitioning) to reduce compute spend.

Query operators

Query operators are symbols or keywords (e.g., + and -, UNION, ANY, etc.) used within SQL queries to perform various operations on data. Just as it can be useful to monitor query patterns with the QUERY_HASH metric, it can be helpful to monitor query operators to gain more granular visibility into these query patterns. The GET_QUERY_OPERATOR_STATS() table function can be used to return statistics about individual query operators within queries executed in the past 14 days. This information is also available in the query profile tab in Snowsight.

Metrics to watch: `OPERATOR_STATISTICS`, `EXECUTION_TIME_BREAKDOWN`

Name	Description	Metric Type	Availability
`OPERATOR_STATISTICS`	Statistics on the performance of individual operators within a query execution plan	Work: Performance	Snowsight
`EXECUTION_TIME_BREAKDOWN`	The time spent on individual operators during query execution	Work: Performance	Snowsight

OPERATOR_STATISTICS can reveal imbalances in join operations, where one table is significantly larger than the other. Data skew—an uneven distribution of data across nodes or partitions—can cause incomplete joins, incorrect aggregations, or performance issues, which all impact data accuracy. For example, if a join operator shows an unusually high processing time or resource usage, it may indicate that data skew is causing inefficient processing, potentially leading to missing or duplicated rows in the output.

Long EXECUTION_TIME_BREAKDOWN values can suggest inefficient query logic or data structures, which might yield partial or incorrect results. For example, if scan operators are showing high disk I/O times, it might indicate inefficient data access patterns (e.g., full table scans) that could be optimized with better indexing, partitioning, or more selective filters.

Tables

A Snowflake table is a structured object in Snowflake that stores data in a relational format—this is where data is stored and organized for querying, analysis, and processing. It can be useful to monitor Snowflake tables to understand where clustering might help improve query performance.

Snowflake uses clustering keys to organize data into micro-partitions, which are smaller chunks of data stored on disk. Clustering large tables can optimize query performance by minimizing the data scanned during execution. The TABLES view, which can be queried from the ACCOUNT_USAGE schema, shows you important metrics, which we will highlight here.

Metrics to watch: `CLUSTERING_KEY`, `AUTO_CLUSTERING_ON`

Name	Description	Metric Type	Availability
`CLUSTERING_KEY`	The columns or expressions that comprise the clustering key	Event: Change	Snowsight
`AUTO_CLUSTERING_ON`	Indicates whether Automatic Clustering is enabled for a table	Event: Alert	Snowsight

Monitoring CLUSTERING_KEY is crucial for query performance because clustering keys help Snowflake organize data within micro-partitions, which optimize query efficiency and ensure that data retrieval is accurate and timely. For example, if the clustering key metric shows high clustering depth (a measure of clustering inefficiency), it may indicate that data is poorly organized within partitions, which can lead to slower query performance and potentially incomplete results due to inefficient data retrieval.

Over time, as data is inserted, updated, or deleted, the table’s structure can become less optimal, potentially degrading query performance. Auto-clustering solves this by automatically re-clustering data in the background without manual intervention, ensuring that data is stored efficiently. Enabling AUTO_CLUSTERING_ON is generally a good idea for large tables; tables with frequent updates, inserts, or deletes; tables with frequent queries on specific columns; and tables with irregular data distribution.

Monitor Snowflake to ensure the performance of queries and virtual warehouses

Snowflake is a popular data platform, and in this post, we’ve explored the key metrics you can monitor to gain visibility into virtual warehouse performance, database storage, and dataset-level operations. In Part 2, we'll look at the tools available to collect and visualize key Snowflake metrics. And in Part 3, we’ll show you how you can use Datadog to monitor the health and performance of your Snowflake virtual warehouses and databases.