Navigating Challenges Caused by Hypergrowth
Compass set out to reinvent the real estate industry by building technology to provide an intelligent and seamless experience for its agents. After establishing first-mover advantage, Compass has seen exponential growth validating the need for the technology they developed. After raising over $1 billion in capital, Compass began furthering its growth initiatives by acquiring some of the best brokerages around the country.
To continue attracting the best real estate agents, renters, buyers, and sellers—and to provide a top-notch experience for its user base—Compass built a customer-centric technology platform that would simplify and automate day-to-day tasks that exist in the real estate market. One of Compass’s differentiating features is private exclusives: off-market homes that agents can share directly with their buyers. To build, scale, and ensure the reliability of the technology platform, the company set out to find a monitoring tool that could help provide order and scale to the engineering organization.
The Pains of Context Switching
With the growth that Compass was experiencing, adopting a sustainable monitoring strategy that could
scale along with the engineering organization became essential. At the time, the company was using a
suite of different monitoring products, which meant that engineers typically had to loop through multiple
tools to solve one problem.
To troubleshoot a problem, an engineer would look at a request trace in one tool, but wouldn’t be able
to find relevant logs without jumping to another tool. When a synthetic test failed, multiple teams
would have to fish around in their respective tools to figure out the root cause. Ultimately, this disparate
monitoring strategy sent engineering teams down various rabbit holes to find the culprit for each
specific issue.
This constant context switching also created friction between teams; not everyone had access to every
tool, which led to a poor experience for developers and ultimately contributed to engineer burnout.
“ Some of the monitoring tools we were using were even part of the same company, but it still felt like they were entirely separate products, because nothing was unified.”
Chris Seltzer
Engineering Manager, Compass
Compass was using a point solution to monitor its frontend stack, but the tool introduced excessive administrative overhead due to its poor support for user provisioning. Each time a new employee started at Compass, an engineering manager had to email the tool’s support team to ask them to manually add the new employee. After that, they typically had to wait up to 48 hours before access was granted. As Compass quickly expanded its engineering team—adding up to 15 engineers every week—this process became a bottleneck and was no longer acceptable.
Additionally, the point solution was difficult to install and often generated false positives due to poor configuration. The tool also offered limited out-of-the-box features, which made it a challenge to get full visibility into user-facing problems. As such, Compass’s agents were reaching out to Compass’s support team to report broken features and bugs, or creating an alarm in the engineering team’s outages Slack channel. This process may have worked when Compass was still a young startup, but it was no longer a viable approach once the platform needed to support over 13,000 agents.
“ Our monitoring tools were forcing us to address issues ad hoc, but our vision was to adopt a proactive monitoring strategy.”
Chris Seltzer
Engineering Manager, Compass
Seamless Migration Toward Full-stack Visibility
To tackle these challenges, Compass needed to find a product that could fulfill all their monitoring requirements and create a frictionless experience for engineering teams across the organization. Initially, Compass turned to Datadog’s infrastructure monitoring solution, but then expanded their usage as they saw the value of monitoring all of their metrics, traces, and logs in the same platform. From there, the team focused on replacing their frontend point solution with Datadog Synthetic Monitoring, which would allow them to get full-stack visibility by seamlessly correlating frontend user experience with backend performance in a single pane of glass.
Migrating away from the point solution initially seemed like a daunting task, especially when it was all that Compass knew. However, after one of their engineering teams tested out Datadog Synthetic Monitoring internally, Compass quickly saw how frictionless this migration could be. Thanks to Datadog’s robust APIs and ease of use, a single Compass employee was able to quickly complete the organization-wide migration.
Once Compass successfully migrated to Datadog, they were able to immediately improve their processes and operations. They began to implement monitoring best practices by setting up repeatable and scalable processes—a necessary step in transitioning from a young startup to an enterprise-grade engineering organization. With Datadog’s web recorder, teams were able to create multi-step browser tests to simulate user journeys, and configure them to execute from various locations around the globe. The web recorder helped Compass’s teams quickly scale their synthetic monitoring coverage; everyone was now able to record their own tests in a matter of minutes, without any coding skills or prior knowledge of testing frameworks. While setting up the tests, they leveraged Datadog’s flexible tagging capabilities to enrich each test with detailed metadata (e.g., the team in charge of that particular service). This also allowed them to confirm that failed test results would automatically get sent to the correct individuals.
Proactively Monitoring User-facing Issues with Synthetic Monitoring
Datadog Synthetic Monitoring was pivotal for helping Compass transition from a reactive to a proactive monitoring strategy. To maintain a strong brand reputation, and be seen as a leader in the real estate technology industry, Compass needs to get notified about issues before any customer is affected. Now, on-call engineers get notified when a synthetic monitoring check fails, and can quickly get the end-to-end context they need to answer questions that used to take days on end to investigate: When did this issue start? Was there a deployment associated with that outage? Who is this issue affecting?
The funnel for triaging these issues begins with Synthetic Monitoring and extends into looking at associated traces in APM (Application Performance Monitoring) to identify the root cause. When viewing the results of a synthetic test, Compass’s engineers can quickly pivot to see associated data from the backend, including relevant logs, infrastructure metrics, services called upon, and even a stack trace detailing out the request. Having all the information in one view has allowed Compass to successfully adopt proper incident management practices, proactively resolve user-facing issues, and minimize customer impact.
Even after resolving an issue, teams continue referring back to Datadog Synthetic Monitoring to check up on the current state. No issue is officially marked as resolved until the synthetic tests are back to normal, so the incident team can feel confident that no users will be affected. Often, this all happens so rapidly that customers would never even know there was a problem.
Improving SEO for Property Listings
By leveraging Datadog’s unified platform, Compass has been able to reduce its mean time to resolution for high severity incidents from 2 hours and 26 minutes to 16 minutes. This number serves as a testament to the monitoring best practices the organization has adopted. While this helped improve workflows for Compass’s incident response team, it has also extended into benefits for the rest of the organization—including its nationwide network of real estate agents.
For example, to effectively sell and rent their properties, Compass’s agents need to ensure that all of their listings are ranked highly in search engine results. Datadog Synthetic Monitoring helps Compass detect if a page is loading slowly and then pinpoint the exact elements on the page that are causing it to be slow. Pinpointing the exact cause of latency helps the engineering teams identify ways to optimize the frontend performance of their platform. Ultimately, this helps improve SEO for their agents’ property listings, which drives more traffic to those listings.
The State of Monitoring at Compass Today
Today, Compass’s engineers are continuously discovering new ways to use Datadog to support their work—which ultimately helps them provide a better experience for users. For instance, Compass’s security teams have started using Datadog log management to analyze audit logs and detect threats across their environment more effectively.
“ Each day we’re amazed by what Datadog’s platform offers. We discover new features all the time. At the end of the day, this is what true observability means to Compass: having the best tools in place to deliver top-notch customer experiences.”
Chris Seltzer
Engineering Manager, Compass