Hadoop & Spark Monitoring With Datadog | Datadog

Hadoop & Spark monitoring with Datadog

Author K Young

Published: May 16, 2016

Using Datadog you can now immediately start monitoring the four most widely-used technologies in the Hadoop ecosystem: HDFS, MapReduce, YARN, and Spark.

Out of the box Hadoop + Spark dashboard screenshot

Apache Hadoop began as an open source implementation of Google’s MapReduce algorithm and its supporting infrastructure. Over time Hadoop evolved to accommodate different execution engines in addition to MapReduce, such as Spark. Today Hadoop is the standard data processing platform for large data sets, and is used by very nearly all companies with large amounts of data.

Hadoop is great, but sometimes opaque

Hadoop is incredibly widely used because it does a good job solving an important and complex problem: distributed data processing. But it is a complex technology, and it can be quite difficult to know what’s happening, why jobs fail, if issues in Hadoop are related to the data, upstream / downstream issues in other parts of your stack, or Hadoop itself. As with most distributed systems running on many machines, it can be quite hard to collaboratively solve problems.

Datadog + Hadoop

So we decided to add the power of Datadog to Hadoop. After you turn on the integration, you can see hundreds of Hadoop metrics alongside your hosts’ system-level metrics, and can correlate what’s happening in Hadoop with what’s happening throughout your stack. You can set alerts when critical jobs don’t finish on time, on outliers or on pretty much any problematic Hadoop scenario of which you want to be made aware.

HDFS

Using the HDFS integration you can monitor:

  • the number of data nodes: live, dead, stale, decommissioning
  • number of blocks: under-replicated, pending replication, pending deletion
  • disk space remaining on each host, and on the cluster as a whole
  • namenode load and lock queue length
  • and more

MapReduce

The MapReduce integration includes metrics for:

  • map and reduce jobs pending, succeeded, and failed
  • map and reduce input / output records
  • bytes read by job or in total
  • any counter
  • and more

YARN

The YARN integration gives visibility to:

  • nodes: active, lost, unhealthy, rebooted, total
  • apps: submitted, pending, running, completed, failed, killed
  • cluster cores: allocated, available, reserved, total
  • cluster memory: in use, total available
  • and more

Spark

With the Spark integration you can see:

  • driver and executor: RDD blocks, memory used, disk used, duration, etc.
  • RDDs: partitions, memory used, disk used
  • tasks: active, skipped, failed, total
  • job stages: active, completed, skipped, failed
  • and more

Turn it on

If you already use Datadog and Hadoop, you can turn on the integrations right now, and add Hadoop to the long list of technologies you can monitor easily and collaboratively. If you’re new to Datadog, here’s the link to a free trial.