Glovo Scales On-Demand Delivery App and Eliminates Downtime With Datadog Database Monitoring | Datadog
Case study

Glovo scales on-demand delivery app and eliminates downtime with Datadog Database Monitoring

Technology

4,200 Employees

Barcelona, Spain

150K

About glovo

Glovo connects users with businesses and couriers offering on-demand services from local restaurants, grocers and supermarkets, pharmacies and retail stores.

“Datadog is our one-stop shop for observability. We don’t even consider using anything else if it's available in Datadog. Having everything on the same platform saves us time and makes troubleshooting easier.”

case-studies/amith_reddy
Amith Reddy
Site Reliability Engineer
Glovo
case-studies/amith_reddy

“Datadog is our one-stop shop for observability. We don’t even consider using anything else if it's available in Datadog. Having everything on the same platform saves us time and makes troubleshooting easier.”

Amith Reddy
Site Reliability Engineer
Glovo
Why Datadog?
  • Enables better visibility into databases
  • Inefficient queries can be quickly identified and optimized to reduce computational load
Challenge

Glovo’s database resource consumption couldn’t keep pace with its projected growth. As the company launched a migration to microservices, it needed better visibility into its databases to reduce CPU usage and prevent costly downtime.

Key Results
3 → 0 Hours downtime

Improving inefficient queries reduced downtime

65% → 60% CPU Usage

Due to query optimization

1 Database → 90

Increased complexity of databases while improving performance

Moving to microservices to scale a fast-growing app

Glovo is an on-demand courier service that delivers products customers order through its mobile app, including food, medicines, and retail items. The company operates in 25 countries across Europe, the Middle East and Asia.

Glovo initially designed its application using a monolithic architecture. To keep pace with rapid growth, Glovo engineers recently began migrating their application to a microservice-based architecture. As a growing organization undergoing a major technical architecture change, Glovo engineers had a significant challenge ahead of them. With an increased number of databases and queries running, they found databases were provisioned incorrectly and would often reach CPU capacity. This resulted in outages adding up to three or four hours of downtime in 2022.

Amith Reddy, site reliability engineer, along with the rest of Glovo’s infrastructure team, wanted to improve visibility into databases so engineers could better understand performance issues and avoid costly downtime. The existing monitoring products Glovo used didn’t provide the insight they needed. Limited access to real-time monitoring and alerting hindered the team’s response to issues. They also lacked the ability to track and compare current and historical performance data, making investigations manual and tedious.

glovo-dbm-team.png

Finding and improving inefficient queries

Glovo was already using Datadog Application Performance Monitoring (APM), Infrastructure Monitoring, and Log Management. For Reddy, it felt natural to extend their existing observability tooling to Datadog Database Monitoring (DBM) as it had the same easy-to-use interface they were already familiar with as well as features like granular data of the queries, explain plans, query costs, wait times, etc. With Datadog DBM, Glovo engineers now have full visibility into their databases and can quickly identify and optimize inefficient queries to reduce computational load. “It's hard to believe that one query can cause big problems, but it happens often. That simple, inefficient query can consume all the resources and things stop working and all the alarms go off,” says Reddy. “With Datadog, we can identify inefficient queries and address them. We immediately saw resource consumption improvements.”

For example, Glovo was running a query from its high-traffic homepage. The query was working fine until it was revamped in a design change and caused a failure “After a while the query became very inefficient and at one point that simple query blocked everything else,” says Reddy.

“Datadog is our one-stop shop for observability. We don’t even consider using anything else if it's available in Datadog. Having everything on the same platform saves us time and makes troubleshooting easier.”

Using Datadog DBM, Reddy and his team observed that the query was going to a writer instead of a reader, a major pitfall that should be avoided. After identifying and fixing the issue, Glovo’s overall CPU usage dropped by five percent. “That was a big win for us because it was the single highest drop in CPU percentage we’ve seen at one time,” says Reddy. “Sometimes the data we need is easy to miss, but Datadog provides us a very clear indication when something is going wrong in our code.”

Glovo is also using DBM for capacity planning and to enable resource optimization across the organization. For example, the infrastructure team supports other teams across the business in provisioning and maintaining their own infrastructure. DBM helps the infrastructure team work with other business areas to ensure their workloads run efficiently and resources are allocated appropriately, thereby reducing costs.

Reducing database downtime

In addition to optimizing query performance, Glovo now has visibility into granular database information, which allows the infrastructure team to quickly pinpoint any problems and keep small issues from becoming big ones. Reddy’s team needs to troubleshoot issues across the tech stack, even down to the query level to determine the source of an issue. “We know the throughput, latency, and each query’s contribution to the overall load/resource consumption of the database, so we can identify inefficient queries very quickly,” he says.

Datadog’s unified platform also allows Glovo engineers to see all critical observability data in one place without switching tabs or programs. By unifying Datadog Infrastructure Monitoring, DBM, APM, and Log Management, Glovo engineers receive immediate feedback on system performance while improving communication and collaboration across teams.

As a result of its optimization efforts and improved troubleshooting, Glovo’s database-related capacity incidents and downtime dropped from three hours to zero. According to Reddy, preventing those incidents equates to a savings of approximately $750,000. Most critically, improved observability and database optimization means Glovo can continue to serve the approximately 15 million users across 25 countries who rely on them for timely deliveries. “Datadog is our one-stop shop for observability,” says Reddy. “We don’t even consider using anything else if it's available in Datadog.”

Resources

products/database-monitoring/database-monitoring-section1-2021-12-16

product

Deep Database Monitoring