Operator vs. Helm: Finding the Best Fit for Your Kubernetes Applications | Datadog

Operator vs. Helm: Finding the best fit for your Kubernetes applications

Author Nicholas Thomson
Author Celene Chang
Author Kennon Kwok

Published: September 26, 2024

Kubernetes operators and Helm charts are both tools used for deploying and managing applications within Kubernetes clusters, but they have different strengths, and it can be difficult to determine which one to use for your application. Helm simplifies the deployment and management of Kubernetes resources using templates and version-controlled packages. It excels in scenarios where repeatable deployments and easy upgrades or rollbacks are needed. Meanwhile, operators provide a more sophisticated and automated approach to managing applications by applying operational knowledge throughout their lifetime.

Choosing between Helm charts and operators depends on the complexity of the application, its operational requirements, and the need for customization and automation. In this post, we’ll show you:

Deploy with a Helm chart

Helm is a package manager for Kubernetes that simplifies the deployment and management of applications. A Helm chart is a reusable template that describes a set of Kubernetes resources necessary to run a particular application. This can include multiple Kubernetes resources such as deployments, services, ConfigMaps, and ingress rules. These objects can be saved in separate YAML files for ease of development. Parameters can be defined to tailor these resources, and default parameter values packaged with the Helm chart can be overridden with a command line flag or via a file.

You can install the Helm chart with helm install <package>, update it with helm upgrade <package>, and uninstall it with helm uninstall <package>. Previous application versions, called Helm releases, can also be inspected and rolled back to in the case of a faulty update.

Using Helm charts has several benefits over applying an application manifest and its dependencies (RBACs, configmaps, secrets, and so on) directly to a Kubernetes cluster. First, Helm charts are versioned and released to the public, so a user doesn’t have to manage application configuration changes locally; they can just update the corresponding Helm repository on their machine to the latest version.

A Helm chart is also lightweight, easy to install, and can be used across a plethora of different applications. When a user runs the helm install command, it’s a matter of seconds before the Kubernetes resources are applied. Similarly, upgrading or uninstalling the application with Helm is swift, making it a natural tool to use for testing during application development.

Because Helm is a templating tool with CLI commands, it can be easy to replicate a Helm-based deployment across different machines and environments. This makes it simple to test and verify application state with collaborators, and to deploy identical setups across multiple clusters.

Helm charts are also flexible in regards to parameterization. For example, you can use parameters in Helm charts to define a string (or a list of strings), conditionally include configuration blocks, include logic based on a version number, and more. You can also define intermediary variables to be reused elsewhere in the Helm chart and apply simple formating (e.g., indentations). These capabilities allow maintainers to define parameters as deemed fit for users’ varying requirements. To learn more, check out the Helm documentation.

Deploy with a Kubernetes operator

Operators are Kubernetes-native software extensions that help users manage the entire lifecycle of a specific application. An operator is a method of packaging, deploying, and automating management tasks in a Kubernetes application. They are designed to reconcile an application’s current state to its desired state (for instance, an updated configuration). The configuration is exposed to a user via a CustomResourceDefinition. The user can then create a custom resource that is typed to that CRD, and the operator will process the information and interact with the Kubernetes API to apply the desired settings to the cluster.

In contrast to Helm, operators themselves are applications that run as Kubernetes pods. They need their own permissions and configurations. However, because they are code, they are extremely flexible in what they can control in application management. For instance, perhaps it is important for an application’s dependencies to be applied in a certain order, or for configuration conflicts to be handled gracefully. Or maybe you need additional features, such as canary deployments or distinct settings based on the operating system.

In addition, since operators are applications, they can respond dynamically to changes in a cluster or environment and update any necessary configuration as a result. For example, they can adapt the number of replica pods for an application based on cluster size or change resource allocations based on the type of node. Operators can also support remote configuration, so users can enable or disable certain configurations via a web app. The sky is the proverbial limit when it comes to the complex logic that can be written into an operator.

Operators can also handle API version updates of a CRD via conversion webhooks. While Helm charts are versioned and incremental changes are simple, breaking changes—such as if one wanted to restructure how parameters are organized—would require users to refactor their values.yaml file, which is prone to error. For operators, Kubernetes provides conversion webhooks that convert an older API version of a CRD to the latest version, so the user can use the newer resource specification going forward.

Despite the flexibility operators provide, some users may be deterred by the need to run one application on their cluster to manage another. This is a valid concern that users should consider when examining resource allocation in their environment.

Why Datadog migrated all production clusters to the Datadog Operator

At Datadog, we migrated our infrastructure to run on Kubernetes several years ago. All workload and application deployments used custom Helm charts managed in version control, including the Datadog Agent. However, after the Datadog Operator and Datadog Agent CustomResourceDefinition became generally available in 2023, we changed all of our internal monitoring to be managed by the Operator—around 200 clusters, ranging between 5 and 5,000 nodes across multiple regions and data centers. Even at this scale, the migration didn’t cause any regressions.

The operator manages the state of a cluster via CustomResourceDefinitions.

Our choice to migrate was motivated by the benefits of using a controller to manage the Datadog Agent, Cluster Agent, and Cluster Checks Runners. Specifically, moving to an operator allowed us to consolidate configurations for all of these solutions into a single features-based hierarchy, so that users can focus less on which specific container-level settings they need and more on the desired product or user experience. With the Datadog Operator, developers can translate those product-oriented configurations into Kubernetes resource configurations in the code. When multiple features update the same resource—as is the case for Cluster Agent Cluster Role settings—the code can cleverly merge these updates and handle any potential conflicts. The reconciliation loop also provides light validation for a subset of Agent configuration parameters.

In addition, switching to the Datadog Operator opened the door for us to develop and use complex features that we wouldn’t have been able to with Helm. For instance, we have a few advanced features that are in preview:

  • Introspection, which enables the Operator to detect the cloud provider that a Datadog Agent is running on and then adjust default configurations for that provider.
  • DatadogAgentProfiles (DAP), which allows overrides of Datadog Agent resource settings at the node group level. This feature works by matching DAP settings with node labels and applying a new Daemonset with the overridden settings on those nodes.
  • Lastly, Fleet Automation allows users to enable products on a cluster from the web app, which leverages a remote configuration client in the Datadog Operator.

The Datadog Operator is not just for complex use cases, however. For users just getting started with Datadog, the Datadog Operator is also an excellent tool for setting up Kubernetes monitoring. With default values embedded in the code, a user can, at minimum, configure their Datadog credentials, define a cluster name, and then view their Kubernetes metrics and Kubernetes resources in Datadog.

From an operational perspective, we were able to update our deployment tooling so that the process of deploying the Datadog Agent via a custom resource was nearly identical to how we used Helm. The Datadog Operator deployment itself is managed by Helm, and since the configuration is relatively static, it doesn’t add a significant burden for teams to manage another application. We use Datadog and Prometheus to monitor the health of the Datadog Operator pods.

The Datadog Operator’s prepackaged controllers

Another useful benefit of the Datadog Operator—independent of the Datadog Agent deployment—is that it comes packaged with three other controllers that each manage a CRD: DatadogMonitor, DatadogSLO, and DatadogDashboard (in preview). These CRDs can be used to configure their respective Datadog resources, providing a way for users to manage these resources closer to their application definitions.

Monitoring the Datadog Operator

Because operators are applications, some users may be concerned about gaining the visibility needed to ensure they are running as expected (e.g., to detect if the Agent keeps restarting or if it is unable to report metrics). There are some simple strategies you can implement to monitor the Datadog Operator:

  • Analyze logs from the Operator’s pod (using kubectl or a monitoring platform like Datadog) to look for errors and/or reconciliation loop issues. Error logs that indicate failure to read the desired state, errors in applying changes, communication problems with the API server, or other issues could indicate misconfigurations or incorrect permissions in the Operator.
  • The Operator reports metrics to Datadog indicating when all desired Agents, Cluster Agents, and Cluster Checks Runners have deployed, as well as when the Operator has finished reconciling the state of your cluster with your desired configuration. If metrics such as datadog.operator.agent.deployment.success and datadog.operator.clusteragent.deployment.success do not show a constant value of 1, you should examine your Kubernetes cluster for possible issues with running the Agent.
  • Utilize the out-of-the-box Operator dashboard, which surfaces metrics like Operator CPU by cluster to confirm that resource usage is as expected and not spiking, as well as number of Operator events, an elevated number of which might indicate an error with the Agent.
The Datadog Operator dashboard surfaces metrics to help you monitor resource usage and Agent errors.

Choose the deployment method that reflects your needs

While both Helm charts and Kubernetes operators are ways of deploying applications on Kubernetes, it’s important to understand which method best applies to your use case so you can choose the right one. Helm charts are best-suited for simple deployments, while Kubernetes operators are ideal for deploying complex applications that require custom lifecycle management. At Datadog, we recommend deploying the Agent on Kubernetes with the Operator whenever possible, because of the powerful capabilities that this method offers over using the Helm chart.

To learn more, check out our documentation. If you’re new to Datadog, sign up for a 14-day .