E-Commerce Platform Increases Resilience at Scale With Datadog and AWS | Datadog
CASE STUDY

E-Commerce Platform Increases Resilience at Scale with Datadog and AWS

Learn how Neto maintained a strong customer experience during its cloud migration by using Datadog and AWS

About Neto

Neto is a retail management solution that allows retailers to run their web store, point of sale, inventory, and fulfilment operations through a central platform. With Neto, retailers can future-proof their businesses in an increasingly competitive industry by delivering exceptional customer experiences via any channel—be it in-store, online, or through a marketplace.


Key Results

Complete coverage

Datadog collects and correlates metrics from both legacy and cloud environments, which ensured coverage during Neto's cloud migration.

Less overhead

Datadog automatically scales to monitor Neto’s dynamic infrastructure.


Challenge

Neto was looking to move its existing legacy infrastructure to the cloud in order to drive automation and support their customers’ growth. To meet this challenge, they needed a monitoring solution that could provide real-time visibility across a highly-automated environment. Their existing monitoring tools, however, were unable to scale dynamically and could not track services across ephemeral infrastructure components.


Why Datadog?

Datadog's scalable solution supported Neto’s automation efforts by providing insight into the health of every host and service in a single view. Additionally, Datadog’s ability to collect metrics and gather real-time insights from both Neto’s legacy and cloud environments helped accelerate their migration—and ensured its success.


Lack of Elasticity and Resilience in Legacy Environment Hampered Growth

Neto’s retail customers like Pelican, Zakkia, and Gingerlily rely heavily on the health of Neto’s infrastructure, which must scale often to support customers’ web stores and business management tools. Failure to meet the capacity needs of customers could result in degraded service and render retailers unable to capture sales or properly manage inventory during their most lucrative, high-traffic times. Yet prior to moving to the Amazon public cloud (AWS), maintaining and scaling Neto’s legacy infrastructure—a fleet of virtual machines on a platform with limited capacity for automation—was slow, reactive, and prone to technical difficulties. Neto’s infrastructure environments often drifted out of sync, making it hard to increase capacity or deploy changes to production without engaging in manual, time-consuming processes. “We could spend up to a week preparing for a high-traffic event,” such as a television appearance by a Pelican product or Gingerlily’s holiday sale, says Justin Hennessy, VP of Engineering at Neto—and those were the events that they knew about ahead of time.

“ Doing anything in the old environment was very time-consuming.”

Justin Hennessy
VP of Engineering, Neto

Cloud Infrastructure Allows for Automatic Scaling and Provisioning

To confidently support their customers’ growth and allow for agile innovation internally, Neto needed to move to the cloud and introduce automation throughout their platform. Neto selected AWS because of its extensive and robust APIs, “making automation on pretty much every front possible,” explains Hennessy. “With Amazon, it’s a change of a number and a few minutes later, the environment’s pre-scaled for a particular event.” To bolster the efficiency and resiliency of their cloud environment, Neto adopted infrastructure-as-code practices that allow them to automatically provision, configure, and scale their infrastructure through APIs.

“ One of the driving forces was improving the resilience of the platform.”

Justin Hennessy
VP of Engineering, Neto

Poor Visibility Threatens Digital Transformation

In order to migrate confidently and truly thrive in the cloud, Neto would need end-to-end visibility into their infrastructure before, during, and after their move to AWS. But as Neto prepared to migrate their applications and customer assets to the cloud, they found that their existing open source monitoring tools were unable to provide platform-wide visibility across a highly automated cloud environment. Neto’s legacy monitoring setup consisted of manually configured health checks for individual host machines, meaning that their monitoring coverage would not scale dynamically with their cloud environment, nor track services across ephemeral infrastructure components. Neto’s engineering team needed reliable, real-time insights into the state of their legacy infrastructure and their new AWS environment in order to track the progress of their migration and ensure success on the cloud.

“ When you move to a highly dynamic environment, you want to move away from monitoring individual servers, and towards monitoring groups of services.”

Justin Hennessy
VP of Engineering, Neto

Migrating from Legacy Environment with Monitoring that Scales in the Cloud

Neto enlisted Datadog to ensure that their application and assets were transferred with minimal customer impact, and that Neto’s newly automated platform remained reliable and performant once in production on the cloud. Datadog’s ability to collect metrics from both of Neto’s environments and then display the health of every host and service in a single interface—regardless of where they were running—meant that Neto never experienced a lapse in visibility or platform reliability during their migration. In Neto’s new cloud infrastructure, Datadog helps support Neto’s overall automation efforts by monitoring new hosts as soon as they come online, allowing Neto to track the health and performance of any service, as it scales, at a glance.

“ At the end of the day, Datadog is our central portal to the platform. It’s the first place we go.”

Justin Hennessy
VP of Engineering, Neto

Maintaining the Customer Experience in all Phases of Migration

During Neto’s 18-month migration project, the visibility provided by Datadog was critical to maintaining platform reliability and ensuring business as usual for Neto’s customers. For six of those months, Neto’s legacy and AWS cloud infrastructures were running simultaneously as assets from customers like Zakkia were transferred from MySQL to hosted Amazon Aurora databases. Datadog helped ensure the accurate, on-time migration of these customer assets by collecting, aggregating, and displaying metrics from databases in both environments on a single platform. This made it easy for Neto to visually correlate metrics and troubleshoot across environments, reducing mean time to detection (MTTD) and allowing Neto to resolve issues before they were felt by customers.

By monitoring traffic, latency, and resource usage as workloads moved to the cloud, Neto was able to track performance in real time and make any needed adjustments to ensure that their re-architected application would function properly in the new environment. For instance, Neto kept a close watch on database performance using Datadog’s built-in integrations with AWS services as well as with the underlying database engine itself. “We’re using the Amazon integration and native MySQL metrics to build a comprehensive Aurora dashboard that allows us to look at all of our clusters together," Hennessy says. “It’s pretty obvious when a cluster is misbehaving, and then we can drill down into that cluster in isolation and address wherever the issue or congestion is."

Mastering Automation for Improved Resilience on the Cloud

Datadog increases the efficiency and reliability of Neto’s platform by ensuring that Neto’s infrastructure and monitoring coverage seamlessly scale in parallel. “We build our infrastructure off a single golden image, so we just baked Datadog in and then it was pushed out to all of our environments,” Hennessy says. Neto deploys the Datadog Agent through Terraform, which they use to automatically provision and configure their dynamic infrastructure. By reducing the manual overhead of scaling and monitoring their environment, Neto has enabled product features and fixes to move nimbly between development phases, and has taken key components of their platform “from adequate to highly available and resilient,” Hennessy says.

With AWS and Datadog, Neto’s customers like Pelican, Zakkia, and Gingerlily can deliver an optimal customer experience – they know that they can depend on the Neto platform to perform, regardless of how much traffic comes their way.

“ Now we have a new level of resilience. And on top of that, we now have platform-wide visibility, which we didn’t have before.”

Justin Hennessy
VP of Engineering, Neto

Resources

/blog/aws-1-click-integration/aws_1click_install

BLOG

Introducing our AWS 1-click integration
/blog/aws-monitoring/aws_monitoring_hero_v4

BLOG

Key metrics for AWS monitoring
/blog/managing-datadog-with-terraform/terraform_hero

BLOG

Managing Datadog with Terraform