Datadog Database Monitoring (DBM) Recommendations help you proactively optimize performance throughout your database fleet. DBM draws on a wide range of data sources in order to detect and provide actionable guidance on issues such as blocking queries, low disk space, and missing indexes.
In this post, we’ll show you how DBM formulates targeted indexing recommendations to help you optimize database performance. We’ll explain how we use static data sources such as explain plans and table schemas, as well as dynamic sources such as query metrics, in order to help you stay ahead of issues and strategically prioritize your optimizations. You will learn:
- How we identify indexing opportunities
- How we prioritize our recommendations using query metrics
- How we make our recommendations easy to implement by providing cohesive, context-rich workflows
How we identify indexing opportunities
Analyzing explain plans
The first step in our index recommendation process is to identify the queries that are the strongest candidates for indexing. To do so, we rely on explain plans: the data structures generated by database management systems that detail the step-by-step operations involved in executing specific queries. In an explain plan, the execution of a query is represented as a tree of operation nodes.
In order to formulate recommendations, Database Monitoring continuously collects explain plans for your databases and analyzes them for particular nodes and node patterns. For example, sequential scans with filter predicates are among the most common indicators of strong indexing opportunities. A sequential scan is exactly what it sounds like: a scan of a table that reads every row, one after another. Filter predicates are the pieces of query syntax that conditionally limit the rows a query scans or returns—for example, WHERE
clauses in SQL queries.
Testing every row in a table can be an efficient strategy for small tables, but as the number of rows returned by a query increases, so does the load these operations impose on an instance. Another common indicator of indexing opportunities is the pairing of sequential scans with both sort and limit operations—for example, a SQL query with ORDER BY
and LIMIT
clauses.
Once we find a sequential scan, we mark it as a candidate for indexing and begin to apply additional heuristics. For example, the cost of a sequential scan node may account for a very small proportion of the overall costs associated with an explain plan. In that case, indexing is unlikely to materially impact performance, so we remove the node from further consideration. We also consider selectivity: the capacity of a query with one or more filter predicates to limit the proportion of rows in a table that it returns (the smaller the proportion, the more selective the node). Indexing is generally most beneficial for highly selective nodes. For low-selectivity nodes, indexing may not provide any net performance benefits compared with full-table scans.
Analyzing query text
Once we’ve identified an index-worthy node, our next step is to determine which table and columns it accesses in order to determine where specifically an index could be placed. In general, this information can be derived by parsing the full text of a query. With Postgres databases, however, we get a helping hand from the Postgres explain plan format, in which filter predicates are treated as node attributes that include valid query fragments:
This allows us to parse filters into abstract syntax trees (ASTs) in which we can search for semantically significant nodes. Postgres has a particular AST node that denotes a column access, called ColumnRef, which we use to find the columns referenced in the filter.
We also analyze the logical operations of the filter, since not all of these are indexable. For example, col1 = 7
is indexable, whereas col2 ILIKE '%tailstr'
is not. Predicate expressions appear in the AST as A_Expr nodes, which contain queries’ logical operators in their name
attributes. This allows us to exclude from consideration any filter that isn’t actually indexable.
Column vs. expression-based indexes
Indexes are typically column-based, but they can also be expression-based. Rather than referencing column values directly, expression-based indexes reference the results of computations from column values. When it comes to indexing recommendations, expression-based indexes require special consideration. Take the following query as an example:
SELECT * FROM inventory WHERE lower(sku) = 'abc';
Here, the sku
column is passed to the lower
function in the WHERE
clause. This means that even if an index exists on sku
, it won’t be used in the execution of this query, because the index is built on the raw (non-lowercased) values in sku
. With DBM, we detect this type of scenario by scanning for FuncCall nodes in the AST and checking for ColumnRef arguments. If we find column values passed as function arguments in a query, we create a Function in Filter recommendation. This allows us to tailor the specifics of our recommendations to this scenario, as we’ll describe later in this post.
Prioritizing indexing recommendations via query metrics
So far, we’ve outlined how we home in on indexing opportunities by analyzing explain plans for patterns that are frequently associated with poor query performance. Next, we prioritize our recommendations by using metrics for these queries to gauge the severity of their performance issues.
In order to gauge the impact of a query on database performance, we use the percent total duration by host metric, which tells us the proportion of time that an individual query contributes to the total processing time on an instance. Based on this metric, we label our index recommendations Low, Medium, or High severity in order to help users set priorities.
For example, let’s say we have three queries running on a database instance. In the past hour, these queries have been measured to have the following total execution times:
Query | Execution time |
---|---|
q1 | 50 minutes |
q2 | 2 minutes |
q3 | 8 minutes |
Altogether, the instance has spent one hour executing queries, of which q1 accounts for about 83 percent of that time, q3 for about 13 percent, and q2 for about three percent. In this case, it’s clear that we should prioritize any significant improvements we can make to q1. Having already assessed the selectivity of q1’s filter predicate, as discussed above, we can gauge the potential impact of indexing on the query time and set a priority level for the recommendation.
Scoping this calculation to individual hosts is essential, since processing times can differ greatly between instances with different CPU, memory, and disk capacity. For example, q1 might only account for a small proportion of the total processing time within a cluster with varying host types, giving it a low percent total duration despite the fact that it accounts for most of the execution time on one replica. When a query runs in a cluster, we use the maximum value for this metric across all hosts in case of this kind of scenario, since an incident on a single host can cause cascading failures throughout a cluster.
We also allow you to create monitors for DBM recommendations based on severity. You can create monitors on all of our recommendations, but configuring different types of alerts for different severity levels can help you stay vigilant while preventing alert fatigue. For example, you may want to be paged about missing indexes on queries with a high percent total duration by host, which can be crippling to the overall performance of a database. You can create a monitor for High-severity index recommendations by filtering for @recommendation.severity:high
and @recommendation.recommendation_type:missing_index
in a Database Monitoring monitor definition:
Presenting all-in-one contextual workflows
Having formulated and prioritized an index recommendation, we present it along with relevant contextual data and a step-by-step guide to evaluating and applying it.
For column-based indexes, we start by showing the recommended CREATE INDEX
statement along with its associated query. Alongside this, we provide additional context in order to explain our recommendations and help guide your decision-making. The Percent Total Duration by Host graph shows the data behind our severity calculation, so you can see the scope and host-by-host impact of this query.
We then display the explain plan where we detected the offending sequential scan, highlighting what to look for in the plan visualization.
If schema collection is enabled, we also display the existing indexes on the schema for the given table; if not, we provide a query for collecting this information from the database manually. This is essential context, since there may be a similar index on the table that’s now made obsolete and should be dropped.
For queries passing column values as arguments to function calls, the Function in Filter recommendation highlights the call that we detected and recommends one of two courses of action: either modify the query to avoid the function call or create an expression-based index. Expression-based indexes can be relatively expensive to maintain since they impose additional computations on every write operation in the tables on which they’re created. This extra expense may be worth it when a query is unable to use an index otherwise—such as with case-insensitive search via the ILIKE
operator. Often, however, the best solution is to remove the function call from the query. For example, we could avoid calling lower
in our queries by lowercasing our column values ahead of time—which might have broader benefits for our application by helping us ensure more consistent, normalized data.
Optimize database performance with confidence
In this post, we’ve examined how Datadog DBM draws on a holistic range of data sources to make targeted index recommendations. We’ve also discussed the guidance we provide for implementing these recommendations, including detailed workflows and contextual data to aid users’ decision-making and expedite their response to performance issues. In providing these insights into our index recommendation process, we have sought to highlight DBM’s unique positioning to provide effective recommendations, and above all to enable DBM users to proactively optimize database performance with confidence.
To get started with Datadog DBM Recommendations, check out our documentation. And if you’re new to Datadog, you can sign up for a 14-day free trial.