Snowflake is a self-managed data platform that enables users to easily store, process, analyze, and share high volumes of structured and semi-structured data. One of the most popular data platforms on the market, Snowflake has gained widespread adoption because it addresses a range of data challenges with a unified, scalable, and high-performance platform. Snowflake’s flexibility enables users to handle diverse workloads, such as data lake and data warehouse integration. Built-in dashboards and integration with business intelligence (BI) tools like Tableau and Power BI make Snowflake an ideal solution for business analytics, reporting, and other use cases. Data scientists use Snowflake’s integrations with tools like Python and TensorFlow to train machine learning (ML) models, and media platforms can use real-time analytics from Snowflake to track user engagement and optimize content delivery.
Because teams use Snowflake for a wide variety of business-critical use cases, it’s important to ensure that their data in Snowflake is high-quality and processed efficiently. Additionally, Snowflake costs can add up, particularly if you are using Snowflake for multiple purposes or processing large datasets. For these reasons, monitoring Snowflake is important for many different kinds of teams. This series of blog posts will guide you through monitoring Snowflake. In this post, we’ll show you:
- What Snowflake is
- Metrics to monitor to optimize compute
- Metrics to monitor to optimize storage
- Metrics to monitor at the dataset level
What is Snowflake?
Snowflake is a comprehensive platform for storing and processing data in the cloud, and one of its main differentiating features is its architecture, which decouples its compute layer from its storage layer. Snowflake is built on top of the major cloud platforms (AWS, Azure, and Google Cloud), but it abstracts the underlying complexity of tasks like provisioning, configuring, and managing storage infrastructure so users do not need to manually interact with the underlying cloud storage. Snowflake leverages the cloud providers’ native object storage so that persisted data is accessible across the entirety of your Snowflake environment.
Snowflake’s decoupled architecture combines the data-management simplicity of a shared-disk architecture with the performance and scale-out benefits of a shared-nothing architecture. Its centralized cloud storage layer—accessible by all compute resources (virtual warehouses) within the platform—acts similarly to a shared-disk architecture. Since all compute clusters access the same underlying data, there’s no need to replicate or move data across nodes or clusters. This makes data sharing, backups, and disaster recovery much simpler and more efficient.
Meanwhile, Snowflake’s compute layer uses a shared-nothing architecture. In this type of system, each compute node operates independently, with its own local memory and compute resources. Because there is no sharing of memory or CPU between nodes, Snowflake can handle large datasets with fast, high-performance query processing. It can also handle concurrent queries from different users simultaneously without performance degradation.
The nodes that comprise Snowflake’s compute layer are called virtual warehouses. These are compute clusters composed of multiple virtual machines (VMs) that perform tasks such as querying, processing, and analyzing the data stored in the storage layer. This layer of Snowflake is an example of massively parallel processing (MPP), a computational model that allows for the simultaneous processing of large datasets by distributing load and processing across nodes.
Snowflake’s architecture provides a number of benefits that contribute to its popularity. Teams can optimize costs by scaling storage and compute independently. Additionally, because multiple compute clusters can access the same data storage independently, users can run queries concurrently without resource contention. This separate architecture also makes it easy to isolate workloads, which means teams can assign dedicated compute resources to workloads without affecting others, ensuring predictable performance for critical tasks and resource-intensive processes.
However, Snowflake’s complexity results in unique monitoring challenges to ensure the platform runs efficiently and cost-effectively. These include:
- Compute monitoring: tracking the utilization, scaling, and performance of virtual warehouses
- Storage monitoring: monitoring data storage and retention policies
- Dataset-level monitoring: ensuring efficient query execution across compute nodes
While Snowflake generates a vast number of metrics, we’ve highlighted the essential ones to monitor so you can stay on top of these three specific areas. We’ll explain the key metrics for each category in more detail in the following sections.
Monitor Snowflake metrics to optimize compute
A major concern of Snowflake SysAdmins is usage and cost. They will likely want to monitor Snowflake usage metrics to understand their organization’s spending, including what is driving costs, what’s changed month over month, and who is using Snowflake to drive up the bill. To gain these insights, it’s essential to monitor warehouse query performance, because virtual warehouses consume billable compute as Snowflake credits as they execute jobs, load data, and perform other Data Manipulation Language (DML) operations.
Schemas and views
To find metrics on warehouse query performance, it’s first important to understand where Snowflake metrics come from. You can explore Snowflake system metrics using Snowsight—Snowflake’s built-in web interface, or by writing SQL queries against views in various schemas. Data on your account’s Snowflake usage is maintained in databases, each of which consists of one or more schemas, which are logical groupings of database objects (tables, views, etc.). Users can write custom views in SQL or use out-of-the-box Snowflake views to query metrics on usage.
The ACCOUNT_USAGE
schema can be found in the Snowflake-provided database called SNOWFLAKE
, which contains views with usage and performance-related information for your account. This schema can be queried with views that surface execution times of queries and tasks, as well as views that surface the load on warehouses as they execute queries and tasks. It’s important to note that views and tables in ACCOUNT_USAGE
lag one to three hours behind real time.
Let’s say you’re a SysAdmin who wants to determine if a warehouse of inappropriate size is being used. In Snowflake, a virtual warehouse’s compute costs are determined by its size, with larger warehouses consuming more credits per hour, meaning a higher cost for running queries. Improperly sized warehouses can add unnecessary cost to your bill. For example, using a large warehouse to run a small query is a waste of compute resources. Conversely, running a large query on a smaller warehouse could cause failures or timeouts (which also means wasted compute). Right-sizing virtual warehouses is usually the single most impactful way you can reduce your Snowflake costs and often results in performance gains that outweigh the cost of increased warehouse size.
Query history for compute
The QUERY_HISTORY
view, which can be queried from the ACCOUNTS_USAGE
schema, surfaces metrics that can help you right-size your virtual warehouses. There are three metrics in this schema that are particularly important.
Metrics to watch: EXECUTION_TIME
, BYTES_WRITTEN
, BYTES_SCANNED
Name | Description | Metric Type | Availability |
---|---|---|---|
EXECUTION_TIME | The time it takes each query to execute | Work: Performance | Snowsight |
BYTES_WRITTEN | The volume of write queries being processed and stored | Work: Throughput | Snowsight |
BYTES_SCANNED | The total volume of queries being processed and stored | Work: Throughput | Snowsight |
A low value for EXECUTION_TIME
, BYTES_WRITTEN
,or BYTES_SCANNED
could indicate that the workload is more appropriate for a smaller warehouse. Conversely, if these values are high, you might consider moving those queries to a larger warehouse. If you see a heterogeneous mix of differently sized queries running on a single warehouse, you could consider redistributing these workloads into appropriately sized warehouses.
Warehouse load history
Warehouse load history metrics show you resource utilization for a single warehouse within a specified date range. It’s important to measure these metrics because they indicate what you are spending your compute credits on. To surface these metrics, you can run the WAREHOUSE_LOAD_HISTORY
function on any warehouse. (It’s important to note that this function only surfaces the last 14 days’ worth of data. You can view this data bucketed over a longer time range via the Admin tab in Snowsight.) The following are important virtual warehouse load metrics to monitor.
Metrics to watch: AVG_RUNNING
, AVG_QUEUED_LOAD
, AVG_BLOCKED
Name | Description | Metric Type | Availability |
---|---|---|---|
AVG_RUNNING | The average number of queries actively running on a virtual warehouse at any given time | Work: Throughput | Snowsight |
AVG_QUEUED_LOAD | The average number of data loading tasks that are queued because the warehouse was overloaded | Work: Performance | Snowsight |
AVG_BLOCKED | The number of queries blocked and unable to execute due to resource contention or conflicting operations | Work: Error | Snowsight |
AVG_RUNNING
can provide valuable insights into the resource utilization of your virtual warehouses. For example, when the AVG_RUNNING
metric shows a high value at a regular time of day, it can provide visibility into resource contention during peak times and help in planning resource allocation—for example, you might enable auto-scaling for virtual warehouses or reschedule non-essential workloads to off-peak times. Additionally, AVG_RUNNING
can help you determine whether to scale up or down virtual warehouses based on usage patterns.
High AVG_QUEUED_LOAD
values relative to a historical baseline can indicate bottlenecks in your system. To resolve bottlenecks, you might increase your min/max cluster counts. High values in AVG_BLOCKED
suggest that too many queries are contending for the same resources, such as compute power or disk I/O, or are blocked from access to resources by data partitions. To handle excessive concurrent queries on the same virtual warehouse, you could enable auto-scaling to dynamically add clusters during high-concurrency periods.
Warehouse metering history
The WAREHOUSE_METERING_HISTORY
view, which can be queried from the ACCOUNT_USAGE
schema, shows you hourly credit usage for a single warehouse (or all the warehouses in your account) within the last 365 days. You can use this view to surface metrics about how your virtual warehouses are directly leading to your Snowflake spend. We’ll highlight a few key metrics to keep track of here.
Metrics to watch: CREDITS_USED
, CREDITS_USED_COMPUTE
, CREDITS_BILLED
, CREDITS_ADJUSTMENT_CLOUD_SERVICES
Name | Description | Metric Type | Availability |
---|---|---|---|
CREDITS_USED | The total number of credits used for the warehouse in the hour | Resource: Utilization | Snowsight |
CREDITS_USED_COMPUTE | The total of all compute spent on queries, insertions, deletions, and all other DML operations within the last hour | Resource: Utilization | Snowsight |
CREDITS_BILLED | The cost of background operations and system services that support your Snowflake environment (e.g., query optimization, caching, auto-scaling) | Resource: Utilization | Snowsight |
CREDITS_ADJUSTMENT_CLOUD_SERVICES | Credit refunds applied to cloud services | Resource: Utilization | Snowsight |
CREDITS_USED
is a sum of CREDITS_USED_COMPUTE
and CREDITS_USED_CLOUD_SERVICES
. This metric directly correlates to the amount you are spending on a virtual warehouse. A high value of CREDITS_USED_COMPUTE
relative to a historical baseline can indicate that there are opportunities to optimize query performance, adjust warehouse sizes, or refine task schedules to reduce cost. CREDITS_BILLED
can be found in the METERING_DAILY_HISTORY
view. This metric is a sum of CREDITS_USED_COMPUTE
, CREDITS_USED_CLOUD_SERVICES
, and CREDITS_ADJUSTMENT_CLOUD_SERVICES
. These should be relatively predictable and minimal in proportion to total credit usage. Ideally, CREDITS_USED_CLOUD_SERVICES
will be entirely offset by CREDITS_ADJUSTMENT_CLOUD_SERVICES
. If you see a significantly higher value for CREDITS_BILLED
than for CREDITS_USED
, you may be performing a high volume of potentially unnecessary operations that use cloud services, and reducing that volume should lower your bill.
Task history
Snowflake tasks are scheduled processes that automate the execution of SQL statements, such as queries or data transformations, at specified intervals or in response to certain conditions. Tasks are commonly used to automate data pipeline workflows, ETL/ELT processes, and other repetitive operations in Snowflake. The ACCOUNT_USAGE
schema can be queried with the TASK_HISTORY
view to surface metrics on the history of task usage in the past year. Tasks can be a big source of compute spend for your virtual warehouses, so they are important to monitor to get a full picture of where your costs are coming from. Here are some of the important task-related metrics to stay aware of.
Metrics to watch: STATE
, ERROR_MESSAGE
Name | Description | Metric Type | Availability |
---|---|---|---|
STATE | The status of a completed task | Event: Alert | Snowsight |
ERROR_MESSAGE | An explanation of why a task failed | Event: Alert | Snowsight |
It’s important to monitor STATE
to ensure that your tasks are running as expected. An elevated number of FAILED
or CANCELED
tasks can indicate bottlenecks in your system, which can degrade performance and add to your compute spend. Monitoring ERROR_MESSAGE
can help provide additional context into why tasks failed, helping you troubleshoot. For example, Syntax error at or near
suggests there is a mistake in the SQL code of the task, and a review or correction of the syntax is necessary.
Clustering history
Clustering in Snowflake helps maintain data organization based on clustering keys without manual intervention, but it uses compute resources to reorganize data in micro-partitions as data is inserted, updated, or deleted. Monitoring the AUTOMATIC_CLUSTERING_HISTORY
view in Snowflake is essential for managing compute costs associated with automatic clustering, especially in environments with large, frequently updated tables. Here are the most important clustering metrics to monitor.
Metrics to watch: CREDITS_USED
, NUM_ROWS_RECLUSTERED
Name | Description | Metric Type | Availability |
---|---|---|---|
CREDITS_USED | The total credits consumed by automatic clustering operations for each entry | Resource: Utilization | Snowsight |
NUM_ROWS_RECLUSTERED | The total number of rows reclustered | Work: Throughput | Snowsight |
High CREDIT_USAGE
indicates tables or operations that are particularly resource-intensive. For example, if a specific table is showing high CREDIT_USAGE
, consider reviewing its clustering key to see if it’s optimal (e.g., a table storing sales data with frequent queries filtering on sale_date
would benefit from clustering on this column), or if the table truly needs automatic clustering enabled. If a table consistently shows high values for NUM_ROWS_RECLUSTERED
,it means significant compute resources are being used to recluster this data. This may suggest the need to reassess clustering strategies or consider alternatives if clustering is not yielding substantial performance benefits.
Monitor Snowflake metrics to optimize storage
In addition to monitoring your virtual warehouses’ contribution to compute usage, it’s also important to monitor your storage to gain full visibility into your Snowflake spend. Storage pricing is typically lower than compute, so it generally consumes fewer credits relative to virtual warehouse usage, but it is still important to monitor it to gain a full picture of your spend. The cost of storage in Snowflake is a flat, monthly rate per terabyte.
Database usage history
Snowflake data is mainly stored in databases. One of the main ways you can view metrics on your database storage is with the DATABASE_STORAGE_USAGE_HISTORY
view, which can be queried from the ACCOUNT_USAGE
schema. This view shows you the average daily storage usage for databases.
It’s important to note that Snowflake gives you metrics on table size for the current point in time, not historically. This means you can estimate how much a table will cost—for example, this month—based on how large it is now, but you can’t see how much it cost you in the past. One way around this is to keep your own history of system tables by regularly extracting and storing relevant data from Snowflake’s metadata views or system tables into your own database or data storage solution. Here are some of the most important database metrics to monitor.
Metrics to watch: USAGE_DATE
, AVERAGE_DATABASE_BYTES
Name | Description | Metric Type | Availability |
---|---|---|---|
USAGE_DATE | The date storage of a database was used | Resource: Utilization | Snowsight |
AVERAGE_DATABASE_BYTES | The average storage consumption of databases over time | Resource: Utilization | Snowsight |
Monitoring USAGE_DATE
over time allows you to observe storage trends in each database. It helps in identifying gradual or sudden increases in storage usage, which can signal growing datasets or an increase in historical data retention. AVERAGE_DATABASE_BYTES
provides insights into cost optimization, capacity planning, and understanding data growth trends. A consistent increase in either of these metrics could indicate that archiving or data pruning strategies need to be implemented to avoid excessive storage costs.
Stage storage history
A Snowflake stage is a location where data files are stored temporarily before they are loaded into Snowflake tables or unloaded from Snowflake to external storage. It’s important to monitor stages, as they can accumulate large volumes of data, especially in environments with frequent or continuous data ingestion, and lead to increased storage costs. Monitoring stages helps prevent them from being overwhelmed with unused or old files.
You can monitor your stages using the STAGE_STORAGE_USAGE_HISTORY
view, which can be queried from the ACCOUNT_USAGE
schema. There are two metrics in this view that are the most important.
Metrics to watch: USAGE_DATE
, AVERAGE_STAGE_BYTES
Name | Description | Metric Type | Availability |
---|---|---|---|
USAGE_DATE | The date storage of the stage was used | Resource: Utilization | Snowsight |
AVERAGE_STAGE_BYTES | The average number of bytes of stage storage used | Resource: Utilization | Snowsight |
USAGE_DATE
provides valuable insights into storage usage of stages over time. For example, if USAGE_DATE
shows consistently high usage for a stage that only supports periodic loading jobs, this may signal that files aren’t being cleared after use, leading to ongoing storage costs. Note that this metric has the same name as the metric in the previous section, but they appear in different views (DATABASE_STORAGE_USAGE_HISTORY
versus STAGE_STORAGE_USAGE_HISTORY
) and will represent different values—database storage usage date and stage storage usage date, respectively.
Monitoring AVERAGE_STAGE_BYTES
helps identify trends in stage storage usage, revealing whether storage requirements are increasing or stable. For example, a steady increase in AVERAGE_STAGE_BYTES
may indicate that data loads are growing over time, signaling a need for either storage expansion or data archiving strategies.
Monitor Snowflake metrics to optimize data quality and freshness
Data analysts produce datasets, create tables, and write queries to Snowflake. As the dataset owners, they want to ensure that query performance is high, particularly for frequently used query patterns; that data load times are optimized to avoid bottlenecks; and that data quality is high, so that insights derived from the data are insightful and accurate. This means that data analysts need to monitor Snowflake at the dataset level to answer questions like: how many rows was the last insert? Which tables are being joined together? What are the query patterns on a table? And are queries against a particular table slow?
Query history for data quality
One way to track DML operations (e.g., inserts and joins) in your databases is via the Query History view, which can be used to surface Snowflake query history by various dimensions (time range, session, user, warehouse, etc.) within the last year. The view is available in both the ACCOUNT_USAGE
and READER_ACCOUNT_USAGE
schemas, and contains a number of important metrics to monitor.
Metrics to watch: ROWS_INSERTED
, ROWS_PRODUCED
, EXECUTION_TIME
, QUERY_RETRY_TIME
, QUERY_RETRY_CAUSE
Name | Description | Metric Type | Availability |
---|---|---|---|
ROWS_INSERTED | The number of rows inserted by a query | Work: Throughput | Snowsight |
ROWS_PRODUCED | The number of rows that have been returned by a query or operation (e.g., SELECT ) | Work: Throughput | Snowsight |
EXECUTION_TIME | The time it takes a query to execute | Work: Performance | Snowsight |
QUERY_RETRY_TIME | How long Snowflake spends attempting to re-execute a query, either due to transient issues (e.g., network interruptions, resource contention) or system-level failures | Work: Performance | Snowsight |
QUERY_RETRY_CAUSE | The reason for a query being retried | Work: Performance | Snowsight |
Unusual fluctuations in ROWS_INSERTED
could signal issues with data quality or unexpected changes in data sources, such as duplicates, errors, or missing records. For example, if the number of rows inserted drops significantly or varies drastically from the norm, it may indicate a problem in data source reliability or an extract, transform, and load (ETL) error, prompting a review to ensure data quality.
Monitoring ROWS_PRODUCED
helps you identify resource-intensive queries and ensure data quality. For example, if a query returns an unexpectedly high number of rows, it may indicate underlying issues with the integrity or accuracy of your data (e.g., the query is missing filters or otherwise incorrectly scoped, leading to unnecessary data being retrieved).
Slow-running queries in ETL processes can delay data updates, leading to outdated or incomplete data being available to users. Monitoring EXECUTION_TIME
helps detect bottlenecks that could compromise data freshness. For example, if queries that load or transform data into critical tables are taking longer than expected, you may be delivering stale data, which impacts data accuracy for time-sensitive analytics.
A high QUERY_RETRY_TIME
could indicate partial or incomplete data being written to tables, which can lead to data inconsistencies, especially if retries do not fully resolve the issue. For example, a high retry time on a query that populates a staging table could mean that some rows were not loaded on time, creating gaps or duplicates if retries don’t successfully reprocess all records.
Understanding the cause of query retries is essential for diagnosing and addressing issues that affect query execution and overall performance. QUERY_RETRY_CAUSE
helps you understand why a query failed and had to be retried. It provides insights into whether the failure was due to a transient system issue, resource contention, or other factors.
Query hashes
While monitoring individual queries is vital to maintaining data quality, teams may also want to monitor patterns in their queries. To do so, it can be useful to include query hashes as a dimension of your inquiries into Snowflake data. The QUERY_HASH
view can be surfaced via the ACCOUNT_USAGE
schema.
The QUERY_HASH
metric in Snowflake is a unique identifier mapping to a specific query’s execution plan. The query hash remains the same for queries that have the same structure but may vary slightly due to different parameters or values. QUERY_HASH
can be a valuable additional dimension to use when looking at other Snowflake metrics. You may gain insights out of looking at average EXECUTION_TIME
or a total COUNT
of queries by QUERY_HASH
. This can help you find patterns of workloads with high usage that would most benefit from optimization (at query, warehouse, or dataset levels.)
Metrics to watch: QUERY_PARAMETERIZED_HASH
Name | Description | Metric Type | Availability |
---|---|---|---|
QUERY_PARAMETERIZED_HASH | Abstracts away the literal values (e.g., dates, IDs, or filter conditions), grouping together queries that follow the same pattern but with different parameters | Work: Throughput | Snowsight |
Monitoring QUERY_PARAMETERIZED_HASH
in conjunction with other Snowflake metrics can help you pinpoint queries that are resource-intensive or underperforming, and prioritize optimization efforts. For example, you could identify query hashes with the highest values for ROWS_READ
, and then optimize their SQL logic or data structures (e.g., clustering, partitioning) to reduce compute spend.
Query operators
Query operators are symbols or keywords (e.g., + and -, UNION
, ANY
, etc.) used within SQL queries to perform various operations on data. Just as it can be useful to monitor query patterns with the QUERY_HASH
metric, it can be helpful to monitor query operators to gain more granular visibility into these query patterns. The GET_QUERY_OPERATOR_STATS()
table function can be used to return statistics about individual query operators within queries executed in the past 14 days. This information is also available in the query profile tab in Snowsight.
Metrics to watch: OPERATOR_STATISTICS
, EXECUTION_TIME_BREAKDOWN
Name | Description | Metric Type | Availability |
---|---|---|---|
OPERATOR_STATISTICS | Statistics on the performance of individual operators within a query execution plan | Work: Performance | Snowsight |
EXECUTION_TIME_BREAKDOWN | The time spent on individual operators during query execution | Work: Performance | Snowsight |
OPERATOR_STATISTICS
can reveal imbalances in join operations, where one table is significantly larger than the other. Data skew—an uneven distribution of data across nodes or partitions—can cause incomplete joins, incorrect aggregations, or performance issues, which all impact data accuracy. For example, if a join operator shows an unusually high processing time or resource usage, it may indicate that data skew is causing inefficient processing, potentially leading to missing or duplicated rows in the output.
Long EXECUTION_TIME_BREAKDOWN
values can suggest inefficient query logic or data structures, which might yield partial or incorrect results. For example, if scan operators are showing high disk I/O times, it might indicate inefficient data access patterns (e.g., full table scans) that could be optimized with better indexing, partitioning, or more selective filters.
Tables
A Snowflake table is a structured object in Snowflake that stores data in a relational format—this is where data is stored and organized for querying, analysis, and processing. It can be useful to monitor Snowflake tables to understand where clustering might help improve data quality and query performance.
Snowflake uses clustering keys to organize data into micro-partitions, which are smaller chunks of data stored on disk. Clustering large tables can optimize query performance by minimizing the data scanned during execution. The TABLES
view, which can be queried from from the ACCOUNT_USAGE
schema, shows you important metrics, which we will highlight here.
Metrics to watch: CLUSTERING_KEY
, AUTO_CLUSTERING_ON
Name | Description | Metric Type | Availability |
---|---|---|---|
CLUSTERING_KEY | The columns or expressions that comprise the clustering key | Event: Change | Snowsight |
AUTO_CLUSTERING_ON | Indicates whether Automatic Clustering is enabled for a table | Event: Alert | Snowsight |
Monitoring CLUSTERING_KEY
is crucial for data quality and performance because clustering keys help Snowflake organize data within micro-partitions, which optimize query efficiency and ensure that data retrieval is accurate and timely. For example, if the clustering key metric shows high clustering depth (a measure of clustering inefficiency), it may indicate that data is poorly organized within partitions, which can lead to slower query performance and potentially incomplete results due to inefficient data retrieval.
Over time, as data is inserted, updated, or deleted, the table’s structure can become less optimal, potentially degrading query performance. Auto-clustering solves this by automatically re-clustering data in the background without manual intervention, ensuring that data is stored efficiently. Enabling AUTO_CLUSTERING_ON
is generally a good idea for large tables; tables with frequent updates, inserts, or deletes; tables with frequent queries on specific columns; and tables with irregular data distribution.
Monitor Snowflake to ensure the performance of queries and virtual warehouses
Snowflake is a popular data platform, and in this post, we’ve explored the key metrics you can monitor to gain visibility into virtual warehouse performance, database storage, and dataset-level operations. In Part 2, we’ll look at the tools available to collect and visualize key Snowflake metrics. And in Part 3, we’ll show you how you can use Datadog to monitor the health and performance of your Snowflake virtual warehouses, databases, and data quality.