Building a Large-Scale, High-Throughput Platform With Datadog APM and Continuous Profiler | Datadog
CASE STUDY

Building a large-scale, high-throughput platform with Datadog APM and Continuous Profiler

Learn how Datadog APM and Continuous Profiler helped Cvent gain full visibility into their load testing and production environments so they could support virtual, in-person, and hybrid events

会社情報 Cvent

Cvent is a market-leading meetings, events, and hospitality SaaS provider, with nearly 4,000 employees and more than 200,000 users worldwide. In September 2020, Cvent launched the Cvent Attendee Hub™ , which helps organizations deliver immersive virtual, in-person, and hybrid events.


主な成果

2+ weeks

Time saved in the Attendee Hub development process with the help of Datadog Continuous Profiler.

Hours → Minutes

Decreased time to detect and resolve problems during load testing.

28,000+

Number of participants who attended the first ever virtual Cvent CONNECT conference, which was hosted on the brand-new Attendee Hub platform.


Challenge

Cvent had to build a new virtual and hybrid event solution on a tight timeline to help their customers adapt during the global pandemic. They needed a monitoring solution that would allow them to scale their containerized environment and microservices to support extremely high throughput while ensuring their code remained fast and efficient.


なぜDatadogなのか?

Datadog APM and Continuous Profiler gave Cvent robust visibility into the performance of their complex systems, which allowed them to build and successfully launch the Cvent Attendee Hub under tight deadlines. The unified Datadog platform also enabled Cvent to implement a culture of observability across the organization, and ship code more frequently and reliably.


Adapting to a new reality

For over 20 years, Cvent’s technology teams have delivered innovative solutions that enable marketers and event professionals around the world to create engaging, impactful experiences. In 2020, the global COVID-19 pandemic forced events into virtual environments, which transformed the industry and accelerated its digital transformation.

In order to support their customers during this unprecedented time, Cvent had to pivot their entire product roadmap strategy so they could focus on building a new solution for virtual, hybrid, and in-person events. The team planned to not only launch their new platform (now known as the Cvent Attendee Hub™) at Cvent CONNECT, Cvent’s annual customer conference, but to actually host the event on it, as well. This meant that its performance had to be impeccable, and they only had six months to deliver.

Ian Schell, Site Reliability Architect at Cvent, was given the task of ensuring that the Attendee Hub could accommodate the broad reach and increased registration volume of virtual events, which can often exceed that of in-person events. When reflecting on this undertaking, Schell recalled, “There were so many unknowns surrounding usage patterns and scale, because the product was completely new, and the number of participants could be an order of magnitude higher compared to some in-person events.” Cvent relied heavily on Datadog to meet these unique challenges and ensure a successful product launch.

A unified platform for full-stack visibility

Cvent engineers have always been very attuned to performance concerns, but before Datadog, they relied on siloed, disparate tools that offered limited or incomplete visibility into their system. This resulted in wasted time and low adoption rates. “We were jumping between different tools, and it was very hard and time consuming to tie the different parts of our stack together and put them in the right context to actually drive results,” recalled Brent Montague, Site Reliability Architect at Cvent.

With Datadog’s unified platform, Cvent has been able to achieve its observability goals, from frontend to backend, in a single pane of glass. Distributed tracing and APM were particularly important to Cvent, and instrumenting their Java, Node.js, and .NET services was a simple and seamless experience. When asked about the onboarding experience with APM, Schell said, “It just worked.”

“ Before we started using Datadog, we were jumping between different tools and it was very hard and time consuming to tie the different parts of our stack together and put them in the right context to actually drive results.”

Brent Montague
Site Reliability Architect, Cvent

Unlocking the power of distributed tracing

As they were building the Attendee Hub, Cvent used Datadog APM every step of the way to validate design decisions and identify areas in need of optimizations. At one point during the load testing process, APM helped Cvent engineers detect and fix a bug in their code that would likely have not been caught as quickly—if at all—with previous tools. In another case, the team leveraged the Datadog flamegraph, which correlates service spans with relevant telemetry on the same screen, to detect an Amazon Container Services instance that was experiencing elevated latency. The team was able to quickly determine that the host was overloaded and that the Amazon Container Services CPU reservation was too low—without having to switch contexts. This information enabled them to make an informed decision to size up the instance appropriately.

These experiences led Schell to say, “If I had to pick my favorite feature within Datadog APM, I would say it’s distributed tracing. The robustness and responsiveness of the product when we’re sending tens of thousands of spans per second, as well as the unique way in which traces are visualized, makes it very easy to spot problems quickly and nip them in the bud.”

“ The robustness, responsiveness, and visualizations of Datadog Distributed Tracing makes it very easy to spot problems quickly and nip them in the bud.”

Ian Schell
Site Reliability Architect, Cvent

Getting continuous, code-level visibility

Before Datadog, Cvent engineers who wanted to understand resource consumption at the method level had to connect remotely to one host at a time, and connections had to be brief due to high performance overhead. This resulted in manual work, security considerations, and limited code-level visibility. With Datadog Continuous Profiler, the Cvent team is able to capture uninterrupted profiling data from every host and service in any environment—with minimal overhead. And because Continuous Profiler uses the same Agent and tracing library as APM, no additional installations or security evaluations were necessary.

Schell and his team put Continuous Profiler to the test when they discovered a service that was egregiously slow. Requests that were coming into the service were locked and took a long time to execute because they were all trying to read and parse the same JAR file. Continuous Profiler saved them two weeks of troubleshooting by allowing them to see the time spent on locks by individual requests, as well as an aggregate view of the lock profile for the entire service and all the hosts it was running on. “Datadog Continuous Profiler is extremely powerful and provides an aggregated profiling view of multiple hosts, which has been critical in our ability to quickly deliver new solutions to our customers,” said Schell.

“ Datadog Continuous Profiler is extremely powerful and critical in our ability to quickly deliver new solutions to our customers.”

Ian Schell
Site Reliability Architect, Cvent

Ensuring a successful launch

As Cvent CONNECT approached, Cvent’s engineering teams took their new platform for a test drive with Datadog APM in the passenger seat. The Trace Search feature enabled them to load test their platform efficiently and remediate issues quickly, which helped them reduce their MTTR from hours to minutes and ensured a highly performant launch. Montague, Schell, and their teams carefully watched their Datadog dashboards throughout the conference, which revealed that their 28,000+ attendees were enjoying the immersive virtual experience. “I can’t really imagine how we would have done what we did with the Attendee Hub launch if we didn’t have Datadog as a one-stop shop for all aspects of monitoring our application and microservices,” Schell said.

Moving forward with Datadog

Cvent’s journey with Datadog did not end with the launch of the Attendee Hub. For example, Cvent leverages the Deployment Tracking feature of APM on a regular basis to monitor how their code deployments impact the health of their applications. With Deployment Tracking, Cvent is able to detect when code changes introduce performance regressions, correlate deployments with other telemetry, and use a rolling deployment strategy to ship code safely and more frequently to their customers.

Cvent also takes advantage of Watchdog, Datadog’s machine learning-based troubleshooting engine, to automatically detect outliers and anomalies—and identify their root causes. With Watchdog, Cvent engineers can find unexpected anomalies that could lead to an issue or impact their customers. When reflecting on the importance of Watchdog to Cvent’s daily operations, Montague said, “I wake up, grab a cup of coffee, and look at Watchdog.”

A bright, observable future

The Attendee Hub now supports virtual, in-person, and hybrid events, and Cvent’s technology teams are actively adding new features and releasing updates to further expand their offerings. This introduces a new set of challenges, but Montague and Schell are continuing to drive a culture of full-stack observability and high-performance standards. Consequently, Cvent engineers turn to Datadog whenever a performance question arises, and its unified platform enables them to troubleshoot with precision, resolve issues faster, and ship new code with confidence.

リソース

/blog/end-to-end-application-monitoring/full_context_apm_201210_v5b

BLOG

End-to-end application monitoring with Datadog
/blog/datadog-continuous-profiler/continuous-profiler-hero

BLOG

Analyze code performance in production with Datadog Continuous Profiler
/blog/datadog-apm-gartner-magic-quadrant-2021/gartner-mq

BLOG

Datadog receives a "Leader" distinction in Gartner's 2021 Magic Quadrant for APM