Fundbox is an AI-powered fintech startup that offers credit and payment solutions to help small businesses meet their working capital needs. Having already connected with over 325,000 businesses since its founding in 2013, the company has gained popularity thanks to its fast and reliable customer experience.
In 2020, Fundbox looked to build on its success by seeking ways to strengthen its DevOps processes. Through a cross-team initiative, the company uncovered a key obstacle it had been facing in DevOps: Engineers were using eight different systems to monitor the hundreds of microservices that made up the Fundbox platform.
The complexity of the Fundbox monitoring infrastructure made it hard for DevOps engineers to quickly spot and resolve issues in production. And a few of these individual monitoring tools required extensive DIY customizations and ongoing maintenance. All this extra overhead reduced the time available for engineers to perform core DevOps functions, such as updating Fundbox services.
Fundbox wanted to empower its engineers to quickly find and fix issues on their own. And it wanted to reduce the overhead associated with maintaining its monitoring infrastructure, so that DevOps teams could concentrate instead on innovating and iterating. The Environments team, led by Tal Turgeman, knew that a single, all-inclusive monitoring system was crucial to solving these problems.
“We don’t want engineers and ops to focus on creating a monitoring system when we should focus on creating new products, making the CI/CD faster, and making production more robust and scalable.”
Tal Turgeman
Environments Lead, Fundbox
Also essential was a fully supported solution. “We looked into open source tooling,” said Tal, “but you need to calculate how many hours you’re going to spend on patching the OS, monitoring the disk space, increasing disk size, fixing features, re-installing versions, and fixing compatibility issues. For us, we didn’t want to create our own monitoring infrastructure. That would have been a complete waste of time for us.”
Fundbox ultimately chose Datadog for its ability to give a holistic view across infrastructure, applications, logs, networks, platform availability, external cloud-based services, and databases. Only a platform like Datadog that offered complete visibility could meet Fundbox’s goals of strengthening DevOps collaboration and reducing issue resolution times.
Datadog’s
800+ fully supported, third-party integrations gave Fundbox engineers the ability to pull metrics across diverse services and view them all in a centralized platform. Different Fundbox teams can now tap into this consolidated data to quickly build their own dashboards, graphs, and alerts based on metrics that are relevant to their specific needs.
“Our security team, for example, was able to quickly set up Cloudflare integration and gain immediate insights into abnormal user login activity,” said Tal. “We never tried to set up Cloudflare integrations before with previous solutions because it would take too many resources to implement. With Datadog, it was so easy to set up. It was just a click on the dashboard, and you’re done.”
DevOps teams rely on the accuracy of their alerting, for example, on high CPU utilization, high memory usage, or error events. Before Datadog, Fundbox engineers would receive false alarms, such as ‘The production server is down’. The false alerts would set off hurried investigations, when in reality, everything was fine. These temporary “crises” happened at least once per quarter. Eventually, many lost confidence in the validity of the alerts they were receiving.
After implementing Datadog, the era of false alarms ended. The appropriate teams get notified whenever an issue arises, and the notifications are always valid. Engineers no longer question whether alerts are real. Confidence has been restored.
A boost to DevOps collaboration and culture
As part of its initial efforts to improve its DevOps processes, Fundbox learned that it was challenging for its engineers to correlate events across different time zones, applications, infrastructures, and logging systems. Because Fundbox’s different monitoring tools weren’t unified in a single, centralized system, teams would spend hours trying to understand the root causes behind issues that arose. The ineffectiveness of the older tooling further hampered its adoption, which in turn aggravated slow response times and weakened the ability to collaborate.
Thanks to Datadog’s simple, easy-to-use interface, Fundbox teams now have a single source of truth and dramatically lower maintenance requirements. These benefits are helping to foster better communication and collaboration across teams. Teams can now also correlate events across different infrastructures and applications and determine the root causes behind alerts much more easily.
“ The reaction that we had from the R&D teams was overwhelming: they really loved Datadog. One of the team leads told me ‘Datadog was the friend I never knew I was missing,’ and it was true!”
Tal Turgeman
Environments Lead, Fundbox
In the end, Datadog boosted confidence in Fundbox’s ability to quickly find and resolve issues. “Our DevOps success with Datadog came from the combination of time savings, very easy cross-referencing, a single unified UI, and ease of use,” said Turgeman. “Datadog really was the friend we never knew we needed.”