The Browser Company Uses Datadog to Accelerate Release Velocity With Fast, Reliable CI | Datadog
Case study

The Browser Company uses Datadog to accelerate release velocity with fast, reliable CI

Software 
Development

50-100 
Employees

New York City

会社情報 The Browser Company

The Browser Company is building Arc: a re-imagined default web browser. Arc is the solution to tab overwhelm, helping users organize everything in their browser in one simple, beautiful place.

“Our PR checks are now running fast enough on average to be completed before the time it takes a reviewer to respond to the PR. That means that CI is out of the way.”

case-studies/andrew-monshizadeh
Andrew Monshizadeh
Software Engineer
The Browser Company
case-studies/andrew-monshizadeh

“Our PR checks are now running fast enough on average to be completed before the time it takes a reviewer to respond to the PR. That means that CI is out of the way.”

Andrew Monshizadeh
Software Engineer
The Browser Company
なぜDatadogなのか?
  • Provides granular visibility into CI/CD pipelines
  • Helps quickly identify pipelines and tests that are running slowly
  • Helps identify the root cause of pipeline failures, eliminating unnecessary pipeline retries
  • Improves understanding of CI/CD performance for managers and developers
Challenge

The Browser Company sought to improve engineering velocity by fixing unreliable tests and infrastructure and decreasing CI/CD pipeline execution times.

主な成果
1 hour → 30 minutes

50% pipeline time savings for developer feedback

5 minutes → 10 seconds

Improved performance of a blocking test suite

Identifying CI/CD as a bottleneck to the engineering process

The Browser Company developed the Arc browser to offer people a better way to use the internet, free of clutter and overwhelm. At its core, Arc was designed to feel more like a home on the internet than a traditional browser. Spaces and profiles help separate work and personal lives, and the browser’s sidebar makes it easy to organize tabs, change contexts, and focus on what’s most important. In 2024, The Browser Company team went from focusing on the Mac product to syncing and operating across Mac, Windows, and iOS. In February 2024, the company released Arc Search, an AI-enabled mobile browsing experience. Arc Search is growing rapidly—it recently surpassed 100,000 daily active users across a wide variety of operating systems and configurations. In April 2024, the company released its Windows app.

the-browser-company-multi-screenshot.png

Initially, addressing CI/CD performance and reliability issues at The Browser Company required lots of context switching and navigating alert fatigue, manually intervening in GitHub Actions, and combing through Slack notifications for any new failures. “We would hear that builds were failing and tests seemed flaky, but there was not enough information to debug,” says software engineer Andrew Monshizadeh. “Our CI pipeline for PRs used to take over an hour, and there wasn't really any insight into them other than going into individual artifacts and seeing where things were going wrong.”

The Browser Company strives to maintain and continuously improve a fast pace of development so they can ship new features quickly and consistently. They also want to ensure that new features perform up to standard for their high-tech, high-expectation user base. To accomplish that, Monshizadeh and security engineer Joel Henning sought to improve engineering velocity by increasing the performance and reliability of CI/CD for their development teams. They accomplished this by identifying and fixing slow tests as well as unreliable CI/CD pipeline runner infrastructure. They also wanted to make data-driven decisions rather than “following their gut” when it came to responding to slowdowns or failures in CI/CD.

Improving visibility into CI/CD pipelines

In order to get granular visibility into their CI/CD pipelines and tests, The Browser Company implemented Datadog CI Visibility, Test Visibility, and Infrastructure Monitoring on their self-hosted GitHub Actions runners. Being able to identify specific stages and jobs that slowed down or failed, and correlate issues with changes to infrastructure or pipeline configurations, helped them find the cause of failures quickly. Now they can distinguish between whether a pipeline failed due to a new code change, the tests that ran within the pipeline, or because of issues with a specific runner. “Joel and I are very busy people,” says Monshizadeh. “Datadog is helping us identify the tests that are causing problems. We are then able to attack and solve those relatively quickly.”

For example, CI Visibility recently helped them when a hardware failure on a runner caused jobs to fail. Because GitHub Actions randomly assigns pipelines to runners in the fleet, correlating CI/CD issues with machine failures was previously impossible. However, with Datadog, they were able to quickly identify the root cause. “We identified the machine was consistently failing across pipelines at different points in those pipelines,” says Monshizadeh. “We used Datadog CI Visibility to filter jobs to that host and clearly see that it had gone from rock solid to unstable. All of the analysis and validation would have been significantly harder and more time-consuming without Datadog.”

“It’s hard to visually correlate these kinds of failures,” adds Henning. “But with Datadog, we could see that over the history of the machine, it just started to go bad. Datadog gave us that visibility.”

“We can identify sets of tests that may be causing problems. We can identify when and why critical pipelines are slow or bottlenecked. That level of observability has dramatically changed the way we approach dealing with our CI and fixing it.”

the-browser-company-screenshot.png

Reducing CI pipeline time by 50 percent

The Browser Company’s engineers now use Datadog to identify and prioritize fixing the biggest bottlenecks in their CI pipeline runs, as well as to optimize test performance. As a result, tests take less time to run and they experience fewer failures. “Being able to say, ‘this specific test started failing at this specific time’ has been huge for course correction,” says Henning.

“Our PR checks are now running fast enough on average to be completed before the time it takes a reviewer to respond to the PR,” adds Monshizadeh. “That means that CI is out of the way.”

The company has shaved 50 percent off its average CI pipeline time, from one hour to 30 minutes, and substantially decreased its pipeline failure rate. Datadog also helps them identify machines and pipeline runners that have failed, helping their CI/CD system stay efficient and reliable as they scale the pace of development.

The broader development team at The Browser Company also experiences the benefits of CI/CD observability. “Developers are very busy people and will notice when something feels flaky, but don’t always have the ‘muscle memory’ to do the digging themselves,” says Henning. “They are now taking screenshots and linking to Datadog as a source of truth for those issues.”

Ultimately, Datadog helps The Browser Company improve engineering velocity by identifying unreliable tests and pipeline runners and enabling engineers to make data-driven decisions instead of shooting in the dark. “We are able to address CI and automation issues faster, so CI is more stable,” says Monshizadeh. “We can identify sets of tests that may be causing problems. We can identify when and why critical pipelines are slow or bottlenecked. That level of observability has dramatically changed the way we approach dealing with our CI and fixing it.”

リソース

blog/datadog-ci-visibility/pipelines-dashboard

official docs

Continuous Integration Visibility
/blog/best-practices-for-ci-cd-monitoring/ci-cd-hero

BLOG

Best practices for CI/CD monitoring
/blog/dora-metrics-software-delivery/dora-metrics-hero

BLOG

How to use DORA metrics to improve software delivery