A Year of Change
2020 came with a lot of uncertainty, instability, and change. As the global population shifted to spending most of their time at home, companies needed to quickly adapt to meet the challenges presented by this new and evolving landscape.
While many industries saw a sharp decrease in demand, Delivery Hero, the world’s leading local delivery platform, saw the number of orders placed through their application almost double in just a few months. This meant that Delivery Hero’s Pandora team, which operates the largest backend platform at the company, needed to ensure that their system could scale reliably to meet the needs of their customers, who were depending on Delivery Hero for food and other necessities. The challenge for Delivery Hero became: How can we adopt modern technology to aid us in rapidly scaling our applications and infrastructure without jeopardizing the digital experience?
Visibility Challenges in Containerized Environments
Delivery Hero decided to leverage Kubernetes for the automated scheduling, scaling, and maintenance of its containerized infrastructure. This choice helped them ensure that their applications remained highly available to their users, but the arduous setup process and management overhead added complexity and compromised their visibility into their environment.
Before bringing in Datadog, the Pandora team relied on several open source tools to monitor their infrastructure and network, which created critical blind spots when they needed to perform updates or add new clusters. For example, when Delivery Hero upgrades Kubernetes versions, they first spin up a new cluster running on the new version and then slowly migrate traffic over from the old cluster. Their open source tooling, however, could only monitor one of the clusters at a time, which meant they had to choose between monitoring either their old or new cluster while the migration was in progress. They also lacked visibility into their DNS services, which are used by Kubernetes for service discovery and communication. This left them ill-equipped to diagnose performance issues and made them vulnerable to potentially large-scale outages.
Delivery Hero’s reliance on multiple monitoring tools also meant that they had no single source of truth for their telemetry data. Engineers were forced to context switch across this patchwork of tools, which slowed down the issue detection and resolution process. The Pandora team decided to see whether Datadog could help them reduce tool sprawl and close their visibility gaps at the same time.
“ We had so many projects going on, but didn’t have enough human resources to do everything. Scaling and customizing open source tools was simply not a top priority.”
Miguel Mingorance
Systems Engineer, Delivery Hero
How Delivery Hero Does It Today
With Datadog, Delivery Hero is able to get complete, uninterrupted visibility into their Kubernetes clusters without having to manage any open source tooling. Additionally, Datadog’s unified platform allows them to view crucial monitoring data from their infrastructure, applications, and network in a single pane of glass, which has greatly streamlined their troubleshooting process.
Datadog Network Performance Monitoring (NPM) gives Delivery Hero deep insight into their network and DNS by providing key metrics, such as traffic volume, latency, and retransmits, all with extremely low overhead. Delivery Hero can now visualize the flow of network traffic between pods inside each of their Kubernetes clusters, as well as between other endpoints, such as services. This visibility is crucial during Kubernetes upgrades, as it enables Delivery Hero to immediately spot any anomalous traffic patterns during the rollout of a new cluster and revert traffic back to the old cluster, if necessary.
“ Datadog Network Performance Monitoring gave us immediate visibility into all our Kubernetes cluster traffic. As soon as a new cluster is spun up, we can see if pods are communicating as expected and if internal DNS is doing its job.”
Miguel Mingorance
Systems Engineer, Delivery Hero
Datadog Log Management allows Delivery Hero to quickly explore system-wide activity in an intuitive and cost-effective way. For example, their team can see a breakdown of error logs from each application, which helps them stay attuned to early warning signs of potential failures. They can also set up monitors on these log counts, which will alert them when the volume of error logs surpasses a defined threshold.
This monitoring data is tied together in Datadog’s unified platform, which enables Delivery Hero to pivot seamlessly between application performance data, logs, and network traffic. Datadog also provides customizable, out-of- the-box dashboards for more than
800 integrations, including Kubernetes, so Delivery Hero can get immediate visibility into the technologies they rely on. This means the Pandora team no longer has to switch between disparate data in several tools or rely on open source software with significant operational overhead to monitor and troubleshoot issues.
“ Things became much easier once we started using Datadog. As soon as we deployed the Agent, we got
access to more out-of-the-box metrics and had a single place to look at everything.”
Miguel Mingorance
Systems Engineer, Delivery Hero
Looking ahead with Datadog
While the challenges of maintaining a scalable yet reliable platform seemed daunting, Delivery Hero was able to leverage Datadog to support their constantly growing business. Delivery Hero now feels confident when rolling out backend upgrades and adopting cutting-edge technology, even as new users are added
by the hour. They are also planning to expand their partnership with Datadog
by adopting more products from the platform, which will provide even deeper visibility into the health and performance of their system, all in one place.