If you’ve already read Parts 1 and 2 of this series, you’ve learned about the key metrics to monitor in PostgreSQL, and how to start collecting this data with native and open source tools.
Datadog’s PostgreSQL integration helps you automatically collect PostgreSQL data from the statistics collector, so that you can monitor everything in one place. And, because Datadog integrates with more than 800 other technologies, you’ll be able to correlate metrics from your PostgreSQL servers with other services throughout your environment.
In this post, we’ll walk through the process of installing Datadog on your PostgreSQL servers, so you can visualize database performance in an out-of-the-box screenboard like the one shown below. We’ll also show you how you can leverage Datadog Database Monitoring to get query-level insights into your databases. Last but not least, we’ll discuss how to identify bottlenecks in your code by tracing application requests (including PostgreSQL queries) with Datadog APM.
Detect and optimize slow queries with Datadog Database Monitoring
Datadog’s PostgreSQL integration
Instead of querying PostgreSQL metrics manually through the utilities covered in Part 2 of this series, you can use the Datadog Agent to automatically aggregate these metrics and make them visible in a customizable template dashboard that shows you how these metrics evolve over time.
Install the Datadog Agent
The Datadog Agent is open source software that aggregates and reports metrics from your servers, so that you can graph and alert on them in real time. Installing the Agent usually takes just a single command—to get started, follow the instructions for your platform here.
GRANT the Agent permission to monitor PostgreSQL
Next, you’ll need to give the Agent permission to access statistics from the pg_stat_database
view, by following the instructions in our documentation. Basically, you’ll need to log into a psql
session as a user who has CREATEROLE privileges, create a datadog
user and password, and grant it read access to pg_stat_database
:
create user datadog with password '<PASSWORD>';
grant SELECT ON pg_stat_database to datadog;
For security reasons, PostgreSQL restricts certain track_activities
statistics to superusers (including information about the queries that other users are currently executing, available via the pg_stat_activity
view). If you’re running PostgreSQL version 10.x+, and you would like to give your datadog
user access to monitoring-related statistics and configuration settings that are normally reserved for superusers, you can grant it pg_monitor
permissions:
grant pg_monitor to datadog;
Run this command on your PostgreSQL server to confirm that the datadog
user can access your metrics:
psql -h localhost -U datadog postgres -c \ "select * from pg_stat_database LIMIT(1);" && echo -e "\e[0;32mPostgres connection - OK\e[0m" || \ || echo -e "\e[0;31mCannot connect to Postgres\e[0m"
You’ll be prompted to enter the password you just created for your datadog
user; once you’ve done so, you should see the following output: Postgres connection - OK
.
Configure the Agent to collect PostgreSQL metrics
After you’ve installed the Agent on each of your PostgreSQL servers, you’ll need to create a configuration file that provides the Agent with the information it needs in order to begin collecting PostgreSQL data. The location of this file varies according to your OS and platform; consult the documentation for more details.
Copy the example config file (postgres.d/conf.yaml.example) and save it as conf.yaml. Now you can customize the config file to provide Datadog with the correct information and any tags you’d like to add to your metrics.
The example below instructs the Agent to access metrics locally through port 5432, using the datadog
user and password we just created. You also have the option to add custom tag(s) to your PostgreSQL metrics, and to limit metric collection to specific schemas, if desired.
conf.yaml
init_config:
instances:
- host: localhost
port: 5432
username: datadog
password: <PASSWORD>
tags:
- optional_tag
Save your changes, restart the Agent, and run the info
command to verify that the Agent is properly configured. If all is well, you should see a section like this in the resulting output:
conf.yaml
Checks
======
[...]
postgres
-----------------
- instance #0 [OK]
- Collected 70 metrics, 0 events & 1 service check
Diving into your PostgreSQL data with dashboards
Now that you’ve integrated Datadog with PostgreSQL, you should see metrics populating an out-of-the-box PostgreSQL screenboard, located in your list of integration dashboards. This screenboard provides an overview of many of the key metrics covered in Part 1 of this series, including locks, index usage, and replication delay. You can also clone and customize it by adding your own custom PostgreSQL metrics. We’ll show you how to set up the Agent to collect custom metrics in the next section.
Quickly reference key metrics and commands in our PostgreSQL monitoring cheatsheet.
Collecting custom PostgreSQL metrics with Datadog
Datadog’s PostgreSQL integration provides you with an option to collect custom metrics that are mapped to specific queries. In the custom_queries
section of the Datadog Agent’s example PostgreSQL configuration file, you’ll see some guidelines about the components you’ll need to provide:
metric_prefix
(required): the prefix to use across every custom metric name (by default, this ispostgresql
)query
(required): the SQL query to run on your databasecolumns
(required): an ordered list of every column returned by thequery
above. Each item in the list should have aname
(the custom metric name that will get appended to themetric_prefix
) and atype
. Thetype
can be the metric type (gauge, count, rate, etc.), or submission method for the queried metric. Alternatively, you can set thetype
totag
to tag the metric with the data contained in this column.tags
(optional): add this if you’d like to tag your custom metrics with additional metadata
For example, you can send a custom query to the pg_stat_activity
view to continuously gauge the number of applications connected to each of your backends, broken down by application name and user. Normally you’d query the view with something like:
SELECT
application_name, usename, COUNT(*) FROM pg_stat_activity
WHERE
application_name NOT LIKE 'psql' AND (application_name <> '') IS TRUE
GROUP BY
application_name, usename;
You can set up the Agent to automatically query this data for you on a regular basis, and report the results as a custom metric in Datadog.
Add your query to the custom_queries
section of the Datadog Agent’s PostgreSQL configuration file:
postgres.d/conf.yaml
[ ... ]
custom_queries:
- metric_prefix: postgresql
query: SELECT application_name, usename, COUNT(*) FROM pg_stat_activity WHERE application_name NOT LIKE 'psql' AND (application_name <> '') IS TRUE GROUP BY application_name, usename
columns:
- name: application_name
type: tag
- name: pg_user
type: tag
- name: count_by_applications
type: gauge
# additional optional tags
tags:
- <TAG_KEY>:<TAG_VALUE>
Save and exit the file, and restart the Datadog Agent (find the command for your OS here). We should now be able to see our custom postgresql.count_by_applications
metric in Datadog, tagged with the application_name
and pg_user
.
Consult the documentation and this article to see examples of other custom PostgreSQL metrics you can collect.
Detect and optimize slow queries with Datadog Database Monitoring
Now that you’ve set up the Agent to track high-level PostgreSQL data from your servers, you can dive even deeper and get query-level performance insights with Datadog Database Monitoring (DBM). To get started with DBM, you’ll need to add a few parameters to your PostgreSQL configuration file (postgresql.conf) and grant the Agent additional permissions. Then, enable DBM in your Agent configuration by setting dbm
to true
, as shown below, and re-starting the Agent for the changes to take effect.
conf.yaml
init_config:
instances:
- dbm: true
host: localhost
port: 5432
username: datadog
password: <PASSWORD>
tags:
- optional_tag
You can now navigate to the Databases view to track the performance of normalized queries across all of your databases, and identify ones that can be optimized. For each normalized query, DBM shows you detailed explain plans, which list the exact steps taken by the query planner to execute the query, along with their estimated cost. These plans help surface inefficient operations, like sequential scans, which you can avoid by creating indexes on your most frequently accessed columns.
DBM also provides timeseries graphs of normalized query performance metrics, so you can monitor trends over the long term. You can add these graphs to the out-of-the-box PostgreSQL dashboard discussed earlier and correlate them with the rest of your monitoring data in Datadog.
To learn more about DBM, read our dedicated blog post.
Tracing PostgreSQL queries with APM
Distributed tracing is another important aspect of a comprehensive PostgreSQL database monitoring strategy. Datadog APM is bundled in the same lightweight, open source Datadog Agent we installed earlier. With all of your services, hosts, and containers reporting to one unified platform, you’ll be able to view key metrics from your applications in the same place as their underlying infrastructure. You’ll also be able to trace requests as they travel across service boundaries in your environment.
Distributed tracing and APM are designed to work with minimal configuration. For example, the Python tracing client auto-instruments web frameworks like Django and Flask, as well as commonly used libraries like Redis and PostgreSQL. In the following example, we’ll show you how to start tracing a Django app that uses PostgreSQL as its database.
1. Install the Datadog Agent + Python tracing client
First, install the Datadog Agent on your app server, by following the instructions for your OS, as specified here. Now you will need to install the Python tracing client in your environment:
pip install ddtrace
2. Update your Django settings.py
file
Add the tracing client’s Django integration to the INSTALLED_APPS
section of your Django settings.py file:
settings.py
INSTALLED_APPS = [
[...]
'ddtrace.contrib.django',
]
You’ll also need to add a DATADOG_TRACE
section to your settings.py file, making sure to specify the name of your service, and any tags you’d like to add to your service-level metrics:
settings.py
DATADOG_TRACE = {
'DEFAULT_SERVICE': '<MY_SERVICE>',
'TAGS': {'env': 'myenv'},
}
And, if you haven’t done so already, make sure you’ve specified the name of your database and user permissions in the DATABASES
section. You also have the option to add the name of your application if desired:
settings.py
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': '<YOUR_DB>',
'USER': os.environ['DATABASE_USER'],
'PASSWORD': os.environ['DATABASE_PASSWORD'],
'HOST': 'localhost',
'PORT': '5432',
'OPTIONS': {
'application_name': '<MY_APP>'
}
}
}
Since we specified application_name
, the Agent will add this in the metadata for each trace it collects and forwards to Datadog. Also, note that DEBUG
mode needs to be off in order for Datadog to trace your application.
Save and exit the file. We’re ready to trace!
3. Run the ddtrace
command.
You’re just one command away from seeing traces and service-level metrics from your app in Datadog. Add ddtrace-run
as a wrapper around the usual command you use to start your app server:
ddtrace-run python manage.py runserver
Inspecting PostgreSQL database queries in Datadog
The Agent will quickly start collecting metrics and traces from your application, and forwarding them to Datadog for visualization and alerting. Navigate to the Datadog APM page, select the environment you specified in the DATADOG_TRACE
section of your settings.py file, and click on your service to see a dashboard of key metrics (latency, errors, and hits).
At the bottom of the dashboard, you can see a list of various endpoints recently accessed throughout your application. In this example, the endpoints map to views in our Django app. We can click into a trace to follow the path of an individual request as it travels across various components of our app, including multiple PostgreSQL database calls. In the flame graph below, we can click into any PostgreSQL span to view the exact query that was executed.
You can also filter through your sampled traces for errors, and start debugging the issue. In the example below, the stack trace shows us that our app is trying to query a nonexistent field in the database.
Auto-instrumentation gives you a head start on collecting traces from popular libraries and frameworks, but you can also set up the Agent to collect custom traces by instrumenting and tagging specific spans of your code. The Python tracing client also includes support for distributed tracing, so you can follow the path of each request across different hosts. Read the documentation for more details.
Creating custom dashboards
You can combine APM metrics with infrastructure-wide metrics on any dashboard in order to identify and investigate bottlenecks—simply click on the button in the upper right corner of any APM graph to add it to an existing timeboard.
Once you create dashboards that combine service-level metrics with infrastructure metrics, you’ll be able to correlate across graphs to help investigate issues. In the dashboard above, it looks like many rows were fetched recently, and requests were spending a higher percentage of time executing PostgreSQL queries. We can investigate further by viewing a trace of a request that occurred around that time:
This trace shows us that many different PostgreSQL queries were running sequentially within a single request. We can go even deeper by clicking on each span to see the exact SQL query that was executed, which can help us determine how to reduce the number of database calls and optimize performance.
Alerting
Once you’ve implemented APM and started tracing your applications, you can quickly enable default service-level alerts on latency, error rate, or throughput, or set up custom alerts.
You can also set up automated alerts on any of the PostgreSQL data you’re collecting. For example, you can configure Datadog to automatically notify you if replication delay increases beyond a certain level, or your databases accumulate too many dead rows.
Start monitoring PostgreSQL
In this post, we’ve shown you how to use Datadog to automatically collect, visualize, and alert on PostgreSQL data to ensure the health and performance of your databases. We’ve also walked through an example of how to auto-instrument traces from an application that relies on PostgreSQL. If you’ve followed along, you should now have increased visibility into your PostgreSQL databases, as well as the applications that rely on them.
If you’re new to Datadog, you can start monitoring PostgreSQL with a free trial.
Source Markdown for this post is available on GitHub. Questions, corrections, additions, etc.? Please let us know.