Gain Comprehensive Visibility Into Your ECS Applications With the ECS Explorer

Gain comprehensive visibility into your ECS applications with the ECS Explorer

Amazon Elastic Container Service (ECS) is a container orchestration service that enables you to efficiently deploy new applications or modernize existing ones by migrating them to a containerized environment. Building on ECS gives you the flexibility, scalability, and security that containers offer, but also presents challenges in monitoring and troubleshooting your applications and infrastructure. To fully understand the performance of your ECS clusters, you need insight into their component parts—including tasks, which define the application’s container image and resources, and services, which automatically deploy and maintain the desired number of tasks. Understanding your entire ECS environment also demands visibility into its infrastructure, which may include EC2 instances or Fargate tasks.

To simplify your ECS monitoring, the Datadog ECS Explorer provides a comprehensive view of your ECS clusters, services, and tasks—as well as the underlying infrastructure. With metrics, logs, and traces in a single view, the ECS Explorer makes it easy to understand the status and performance of your clusters and quickly troubleshoot any issues that arise. The ECS Explorer also surfaces the configurations that specify each task’s CPU and memory resources, enabling you to understand resource constraints and to fine-tune allocations. And the ECS Explorer gives you infrastructure visibility across both Fargate and EC2 hosting platforms, ensuring consistent monitoring regardless of launch type.

In this post, we’ll show you how the ECS Explorer helps you:

View ECS events to track your deployments, scaling activity, and more
Analyze task definitions to inspect and troubleshoot your tasks and containers
Monitor the infrastructure that backs your clusters

Monitor cluster activity with ECS events

ECS events describe changes in the cluster. For example, if you revise a task definition to specify a new container image, ECS will start new tasks to replace the ones based on the previous version and will generate events that reflect this activity. By providing a feed and a record of cluster activity, events help you better understand your clusters’ performance and provide context to help you troubleshoot.

The ECS Explorer gives you visibility into your clusters’ events so you can view cluster activity alongside your other cluster monitoring data. Event visibility is particularly valuable for tracking the progress of your deployments, enabling you to ensure that a service’s availability is not disrupted by the deployment process. ECS supports rolling updates, which allow you to update running tasks with new versions while avoiding any service downtime. You can provide a minimumHealthyPercent value in your ECS service definitions to ensure that a specified percentage of tasks remain operational while the deployment process replaces the old version with a new one. By monitoring deployment events alongside your application’s traces, performance metrics, and infrastructure telemetry, you can track the progress of your rollouts and adjust minimumHealthyPercent to optimize for speed or safety.

Events can also provide insight into the health of your cluster. For example, if a task dies due to a failed EC2 instance, ECS starts a new one in its place and creates an event. ECS will also create an event if a service experiences an error, which can occur if the cluster doesn’t have enough resources available to run a scheduled task. You can investigate the cause of an issue like this by referencing the infrastructure metrics that the ECS Explorer displays alongside the event to see correlated resource utilization data and traces that show specifically which services were involved.

The ECS Explorer shows events from the synth-endpts service.

Examine and troubleshoot your task and container configurations

Each ECS task is defined in a human-readable text file which specifies key configuration data about the task. Task definitions specify the container images that make up the task, as well as each container’s network configuration, tags, and environment variables. They also determine the storage, CPU, and memory resources available to each container.

The ECS Explorer gives you visibility into your task definitions—alongside cluster, service, and infrastructure monitoring data—so you can gain complete context when you need to investigate or optimize your cluster’s performance. You can easily compare any two versions of a task’s definition to pinpoint changes that affect your cluster, then explore related logs and traces all within the same view.

The ECS Explorer Task Definition view shows a container definition within an ECS task manifest.

If you’re using the Fargate launch type, each task definition includes the Datadog Agent sidecar container. The ECS Explorer makes it easy to confirm that you’re using the latest available version of the Agent, troubleshoot Agent issues, and ensure that you’ve applied the necessary tags to enable Universal Service Monitoring.

View infrastructure metrics in context

The ECS Explorer expands on the visibility provided by the Containers view, which shows you real-time resource utilization of containers grouped by useful dimensions—including task family and cluster. By visualizing CPU and memory utilization metrics side by side with reservation data, the ECS Explorer makes it clear how much reserved capacity is actually used by your workload over time. The ECS Explorer adds deep context to your utilization data by making it easy to also see cluster-level reservation data, logs, traces, and events. Building on Cloud Cost Management’s existing insights into your ECS spend, the utilization data you see in the ECS Explorer helps you spot waste and optimize your clusters’ resource allocation to reduce unnecessary cloud costs.

Your task definitions also specify resource limits, which define the maximum amount of CPU and memory the container is allowed to use. The ECS Explorer makes it easy to visualize any task’s resource usage compared to its limit, which can inform whether you need to scale up your infrastructure and provide more generous limits. These visualizations can also help you spot when a task has reached its limit, which can shed light on application performance problems such as errors and latency.

Expand your ECS visibility with Datadog

The ECS Explorer organizes all of your relevant ECS log, metric, event, and trace data alongside ECS configuration information in one place. To access the ECS Explorer—and to also make use of the ECS out-of-the-box (OOTB) dashboard—enable the AWS integration and upgrade to the latest version of the Agent. For more information, see the ECS Explorer documentation, and check out our blog to learn more about monitoring ECS and AWS Fargate.

If you’re not already using Datadog, you can get started today with a 14-day free trial.

Want to work with us? We're hiring!

Gain comprehensive visibility into your ECS applications with the ECS Explorer

Further Reading

Monitor cluster activity with ECS events

Examine and troubleshoot your task and container configurations

View infrastructure metrics in context

Expand your ECS visibility with Datadog

Further Reading

Start monitoring your metrics in minutes

Gain comprehensive visibility into your ECS applications with the ECS Explorer

Further Reading

Monitor cluster activity with ECS events

Examine and troubleshoot your task and container configurations

View infrastructure metrics in context

Expand your ECS visibility with Datadog

Related jobs at Datadog

Further Reading

The State of DevOps: Accelerating Software Development With Generative AI

How to support a growing Kubernetes cluster with a small etcd

Best practices for monitoring event-driven architectures

vLLM Observability & Monitoring

Start monitoring your metrics in minutes