Ensuring Complete Visibility Into Kubernetes Networks and Workloads | Datadog
CASE STUDY

Ensuring Complete Visibility into Kubernetes Networks and Workloads

Learn how Datadog helped Delivery Hero gain full visibility into their containerized environment and network, allowing them to scale up to meet demand

About Delivery Hero

Founded in 2011 and headquartered in Berlin, Delivery Hero is the world’s leading local delivery platform, operating in more than 40 countries across four continents. Currently, Delivery Hero is pioneering "quick commerce," the next generation of e-commerce, by aiming to bring groceries and household goods to customers in as little as 10 to 15 minutes.


Key Results

2x Orders

Increase in demand that Delivery Hero was able to meet without experiencing any outages.

0%  100%

Increase in visibility into their Kubernetes network traffic.

1 Solution

Number of tools needed for complete coverage of their Kubernetes clusters.


Challenge

Delivery Hero leverages Kubernetes to scale and maintain its containerized environment, but visibility gaps threatened their ability to handle a sharp increase in application traffic caused by the global pandemic.


Why Datadog?

Datadog gave Delivery Hero out-of-the-box insight into the performance of their app and container communication, allowing them to eliminate critical blind spots. The single, unified platform also enabled them to reduce the number of observability tools they rely on, which freed up their engineers to focus their efforts on scaling the platform to meet demand.


A Year of Change

2020 came with a lot of uncertainty, instability, and change. As the global population shifted to spending most of their time at home, companies needed to quickly adapt to meet the challenges presented by this new and evolving landscape.

While many industries saw a sharp decrease in demand, Delivery Hero, the world’s leading local delivery platform, saw the number of orders placed through their application almost double in just a few months. This meant that Delivery Hero’s Pandora team, which operates the largest backend platform at the company, needed to ensure that their system could scale reliably to meet the needs of their customers, who were depending on Delivery Hero for food and other necessities. The challenge for Delivery Hero became: How can we adopt modern technology to aid us in rapidly scaling our applications and infrastructure without jeopardizing the digital experience?

Visibility Challenges in Containerized Environments

Delivery Hero decided to leverage Kubernetes for the automated scheduling, scaling, and maintenance of its containerized infrastructure. This choice helped them ensure that their applications remained highly available to their users, but the arduous setup process and management overhead added complexity and compromised their visibility into their environment.

Before bringing in Datadog, the Pandora team relied on several open source tools to monitor their infrastructure and network, which created critical blind spots when they needed to perform updates or add new clusters. For example, when Delivery Hero upgrades Kubernetes versions, they first spin up a new cluster running on the new version and then slowly migrate traffic over from the old cluster. Their open source tooling, however, could only monitor one of the clusters at a time, which meant they had to choose between monitoring either their old or new cluster while the migration was in progress. They also lacked visibility into their DNS services, which are used by Kubernetes for service discovery and communication. This left them ill-equipped to diagnose performance issues and made them vulnerable to potentially large-scale outages.

Delivery Hero’s reliance on multiple monitoring tools also meant that they had no single source of truth for their telemetry data. Engineers were forced to context switch across this patchwork of tools, which slowed down the issue detection and resolution process. The Pandora team decided to see whether Datadog could help them reduce tool sprawl and close their visibility gaps at the same time.

“ We had so many projects going on, but didn’t have enough human resources to do everything. Scaling and customizing open source tools was simply not a top priority.”

Miguel Mingorance
Systems Engineer, Delivery Hero

How Delivery Hero Does It Today

With Datadog, Delivery Hero is able to get complete, uninterrupted visibility into their Kubernetes clusters without having to manage any open source tooling. Additionally, Datadog’s unified platform allows them to view crucial monitoring data from their infrastructure, applications, and network in a single pane of glass, which has greatly streamlined their troubleshooting process.

Datadog Network Performance Monitoring (NPM) gives Delivery Hero deep insight into their network and DNS by providing key metrics, such as traffic volume, latency, and retransmits, all with extremely low overhead. Delivery Hero can now visualize the flow of network traffic between pods inside each of their Kubernetes clusters, as well as between other endpoints, such as services. This visibility is crucial during Kubernetes upgrades, as it enables Delivery Hero to immediately spot any anomalous traffic patterns during the rollout of a new cluster and revert traffic back to the old cluster, if necessary.

“ Datadog Network Performance Monitoring gave us immediate visibility into all our Kubernetes cluster traffic. As soon as a new cluster is spun up, we can see if pods are communicating as expected and if internal DNS is doing its job.”

Miguel Mingorance
Systems Engineer, Delivery Hero

Datadog Log Management allows Delivery Hero to quickly explore system-wide activity in an intuitive and cost-effective way. For example, their team can see a breakdown of error logs from each application, which helps them stay attuned to early warning signs of potential failures. They can also set up monitors on these log counts, which will alert them when the volume of error logs surpasses a defined threshold.

This monitoring data is tied together in Datadog’s unified platform, which enables Delivery Hero to pivot seamlessly between application performance data, logs, and network traffic. Datadog also provides customizable, out-of- the-box dashboards for more than 800 integrations, including Kubernetes, so Delivery Hero can get immediate visibility into the technologies they rely on. This means the Pandora team no longer has to switch between disparate data in several tools or rely on open source software with significant operational overhead to monitor and troubleshoot issues.

“ Things became much easier once we started using Datadog. As soon as we deployed the Agent, we got access to more out-of-the-box metrics and had a single place to look at everything.”

Miguel Mingorance
Systems Engineer, Delivery Hero

Looking ahead with Datadog

While the challenges of maintaining a scalable yet reliable platform seemed daunting, Delivery Hero was able to leverage Datadog to support their constantly growing business. Delivery Hero now feels confident when rolling out backend upgrades and adopting cutting-edge technology, even as new users are added by the hour. They are also planning to expand their partnership with Datadog by adopting more products from the platform, which will provide even deeper visibility into the health and performance of their system, all in one place.

Resources

blog/network-performance-monitoring/network-performance-monitoring-traffic-hero-v2

official docs

Network Performance Monitoring Setup
/blog/monitor-dns-with-datadog/dns_monitoring_announcement

BLOG

Monitor DNS with Datadog
/blog/apm-npm-application-debugging/apm-npm-tips-header

BLOG

Debug application issues with APM and Network Performance Monitoring
/blog/npm-windows-support/npm-windows-support-windows_for_npm_200928_FINAL

BLOG

Monitor Windows hosts with Network Performance Monitoring