Best Practices for Securing Kubernetes Applications | Datadog

Best practices for securing Kubernetes applications

Author Mallory Mooney

Last updated: 11月 14, 2024

Cloud-based Kubernetes applications have become the standard for modernizing workloads, but their multi-layered design can make securing them more challenging. To protect your applications from these threats, you need security controls at each layer of your Kubernetes infrastructure. This approach to application security is an example of a defense-in-depth strategy, which helps teams increase their overall security posture and reduce single points of failure that could lead to a data breach.

In this guide, we’ll walk through best practices for mitigating some of the common security risks that can occur in:

Along the way, we’ll show how Datadog’s Security Platform, in addition to its integrations with popular security services, gives you end-to-end visibility into your Kubernetes environment. The security platform is made up of:

  • Datadog Cloud Security Management (CSM): visualizes the current and historic security posture of your cloud environment for real-time threat detection and continuous configuration audits
  • Datadog Cloud SIEM: provides real-time analysis of operational and security logs for robust threat detection in dynamic, cloud-scale environments
  • Datadog Application Security Monitoring (ASM): helps DevOps and security teams streamline application security to track suspicious requests, visualize the full scope of an attack, and surface vulnerabilities in code

With these offerings, you can easily identify risky service misconfigurations and mitigate legitimate threats at any level of your Kubernetes infrastructure.

Secure application code

Simple design flaws or implementation bugs in application code can be leveraged to compromise a cluster. According to the Open Web Application Security Project (OWASP), some of the most common code-level vulnerabilities include:

You can mitigate these types of risks with a few best practices for writing safer code, getting better visibility into code health, and detecting attacks that target your application and APIs. All of these steps are critical for building multi-layered security controls that protect Kubernetes infrastructure in both development and production environments.

Conduct regular audits with code analysis tools

Logging key application events is a good first step for proactively surfacing code-level gaps in security, but you may still introduce other weaknesses in your code’s design and structure—especially if your environment is rapidly evolving. Using code analysis tools to conduct regular audits of your application code can help you identify these types of security risks during development, so you can patch them before they are exploited by an attacker.

Analysis tools such as Datadog Code Security—a capability of Datadog ASM—flags problematic code and provides recommendations on how to resolve any issues. For example, form fields that do not sanitize or validate user-submitted data could potentially be vulnerable to SQL injection attacks. In these cases, an attacker submits database queries that can modify or delete data from your database as input. Datadog Code Security gives you a high-level overview of your code’s quality and any flagged issues, which can help you verify that applications are using the recommended security protocols for protecting data, such as using parameterized queries for user input.

Datadog Code Security signal

Monitor third-party dependencies for security risks

Most applications leverage open source dependencies (e.g., libraries, packages, frameworks) that are managed by third parties, which means that you do not have as much control over their design or security. A vulnerability in a third-party library can easily jeopardize your application’s security, but remaining informed on the status of each dependency requires significant effort. For example, you may want to use a new version of a third-party library that includes a critical security patch, as well as a bug fix that may be incompatible with your application code and infrastructure resources. It’s critical to be aware of these kinds of caveats in order to ensure that you do not introduce breaking changes as you attempt to secure your application.

Scanning code dependencies regularly—and staying up to date on issues flagged by vulnerability databases—can help make you more aware of their state and better assess the risks of updating your code to a new or patched version of a dependency. Tools like OWASP Dependency-Check and Datadog Software Composition Analysis (SCA)—a part of Datadog ASM—provide more details about a compromised library, such as the affected versions, the vulnerability’s severity, and which versions you should upgrade to in order to fix the issue.

Datadog Software Composition Analysis signal

They can also be integrated into your CI/CD pipelines, enabling you to identify the parts of your code that interact with compromised libraries before critical deployments. These measures allow you to make informed decisions about how to keep dependencies up to date while reducing the risk of introducing vulnerabilities or breaking changes.

When an attacker exploits code-level vulnerabilities—such as compromised versions of the log4j library—to target your application, Datadog ASM can automatically alert you to the threat. Datadog will generate a security signal that includes more information about affected code, the source IPs that triggered the vulnerability, and how to remediate.

Datadog Application Security signal

Track application activity at the code and container levels

Regular audits for application code and third-party dependencies is an important step for securing your application, but it may not be enough to protect against all attacks. Kubernetes environments are complex and highly dynamic, giving attackers more opportunities to hide their activity. For example, they may target individual containers or exploit smaller, more vulnerable application components that are easy to overlook. To mitigate this risk, it’s critical to have visibility into:

  • file, process, and kernel activity on containers
  • operations against application code and APIs
  • accounts that interact with application services

Tracking changes to application files, directories, and running processes can give you a better understanding of the path of an attack and an attacker’s overall goals. For example, an attacker may use a database process to launch a shell via a SQL injection attack. This type of attack takes advantage of poorly sanitized application fields, giving someone an entry point to compromise a host or gain access to other critical application services. An attacker may also take advantage of flaws like the Dirty Pipe vulnerability to escape from underprivileged Linux containers, including those running in Kubernetes environments.

Datadog CSM monitors file, process, and kernel activity on your containers in real time, with built-in detection rules to cover these types of commonly used techniques and tactics. You can also correlate this activity with signals generated by Datadog ASM, which automatically flags attacks that exploit application-level vulnerabilities.

For a better understanding of the source of these kinds of events, you can enable audit logging, which provides a wealth of information on Kubernetes activity. We’ll look at audit logging in more detail later.

Secure container images and workloads

In distributed environments, applications are broken down into smaller workloads, each of which runs on dedicated containers. Many teams leverage publicly available container images that already include the operating system and binaries needed for a particular workload, which can significantly reduce development time. But as containerized applications grow and leverage more resources, the chances of introducing new vulnerabilities to your workloads increases. In this section, we’ll explore some best practices you can follow to ensure their security.

Ensure container images come from a trusted source

The risks associated with pulling images from a public registry are similar to those involved in using third-party libraries in application code. You do not have full visibility into the structure of third-party container images, so you may inadvertently pull an image with outdated dependencies or malicious code.

To prevent these types of scenarios from occurring, you should validate that images are signed by authorized users and originate from a trusted source that actively maintains them, such as a known company or open source group. Ensuring that container images only come from a trusted source can help prevent attackers from taking advantage of flaws, like the Dirty Pipe vulnerability mentioned earlier. Pulling images from and monitoring your cloud provider’s registries like Amazon Elastic Container Registry (Amazon ECR) and Azure Container Registry can also significantly reduce risk.

Datadog CSM and Cloud SIEM provide detection rules that can help you monitor your container registries to ensure that you are pulling images that are safe to use in your applications. For example, Datadog can notify you when a container is pulled from a registry that is not secure or when a new image is uploaded to a private AWS ECR registry.

AWS ECR detection rule

A new image in your ECR registry could indicate that an attacker is attempting to establish persistence by uploading a container with malicious code, so it’s important to be aware when a new image is unexpectedly added.

Limit the use of privileged containers

Privileged containers have direct access to host resources and other devices running on the host. An attacker that has access to one of these containers can therefore perform a variety of actions to modify host resources, such as updating the host’s /root/authorized_keys with their SSH public keys. Though there are some benefits to using privileged containers, such as leveraging them to run GPU-enabled workloads in Kubernetes clusters, it’s important to restrict their usage and always be aware of their status in your environment.

Since privileged containers have the same capabilities as the host, it can be more difficult to distinguish between malicious and routine activity. This problem becomes more prevalent as applications leverage thousands of containers to support workloads, rapidly spinning up new containers at regular intervals. Keeping track of the state of individual containers is often not feasible.

Datadog CSM offers detection rules that are automatically mapped to CIS benchmarks for Docker and Kubernetes, providing deep visibility into container- and cluster-level settings. For example, you can quickly single out privileged containers from the rest of your fleet, so you can determine if they are legitimate or not as soon as they spin up.

Privileged pod compliance rule

Datadog CSM can also help you monitor other potentially harmful configurations that an attacker can use in tandem with privileged containers, such as sensitive mount paths or privileged port mappings.

Improve isolation between container workloads and host resources

Container isolation creates boundaries between container workloads and hosts, ensuring that workloads—and attackers—have limited access to system resources. While limiting the use of privileged containers is one way to protect host resources, there are also several configuration options that improve container isolation:

  • Container runtimes: use a runtime like CRI-O to leverage its built-in security features, such as the ability to enforce signed and encrypted images
  • Resource limits: set container I/O, memory, and CPU limits to help prevent denial-of-service attacks
  • Kernel capabilities: assign a reduced set of privileges (e.g., mount operations, filesystem access) to containers based on specific use cases to prevent access to critical resources

Collectively, these configuration options help you create multiple layers of security for your containers. Datadog CSM can detect suspicious activity on any container in your cluster, which can help you identify the ones that are not properly isolated from other workloads and hosts. For example, Datadog will flag any attempts to launch the kubectl utility directly in a container, indicating that an attacker may be attempting to find information that would grant the ability to execute a lateral movement (e.g., container to container, container to host).

A signal generated by Datadog CWS when a container management utility is launched in a container

Secure Kubernetes clusters

Kubernetes manages and scales your application containers in clusters, which group workloads into one or more pods that share network and storage resources. Kubernetes also provides an API server that allows users and service accounts to make changes to pods, services, deployments, and more. Because Kubernetes is responsible for orchestrating your application, cluster resources should be configured appropriately to reduce the likelihood of an attack. There are some recommendations for securing Kubernetes clusters that can supplement your container-level configurations, which we will explore in this section.

Capture Kubernetes activity with audit logs

Audit logging captures all events between the API server, application services, and users, giving you more details about the source of malicious activity within an environment. You can forward your audit logs to Datadog Cloud SIEM, which provides detection rules that help you automatically flag potential threats, such as:

Events like these could indicate that there are other security gaps in your clusters, such as a misconfigured API server or pods with escalated privileges.

Limit access to the Kubernetes API

The Kubernetes API server uses a variety of ports for their APIs that an attacker could take advantage of if they are exposed—a large number of cloud-managed Kubernetes clusters expose their API server to the internet. To address this risk, it’s important to significantly limit access to the Kubernetes API. The API server provides several controls that you can configure to ensure that only authenticated users with the appropriate permissions can access the Kubernetes API.

For example, you can use OAuth2 authentication services like OpenID Connect to first authenticate any user who attempts to access the API server, which helps limit access to just your organization. You can also leverage models such as role-based access control (RBAC) to authorize requests from specific authenticated users to the server. RBAC allows you to create roles that mirror your organization’s structure, so you can easily grant access to Kubernetes resources, including the API server, to only the users or groups who need it.

Limiting access to the Kubernetes API server also helps protect secrets stored there—such as API keys, user passwords, and certificates—across workloads, external services, and accounts. Secrets are stored unencrypted in the server’s underlying data store (i.e., etcd) by default, so anyone with access to etcd can view that data. Secrets can also be accidentally exposed to resources, such as via an environment variable for a pod. Anyone who manages that pod will also be able to see the exposed secret. To reduce these risks, it’s important to limit the number of secrets in your environment. For example, using short-lived secrets can reduce the likelihood of exposure.

You can also enable encryption at rest for existing secrets. Kubernetes supports several different encryption providers but recommends using your cloud provider’s key management service (KMS) to maximize security. KMS providers store decryption keys remotely instead of in Kubernetes, so an attacker would need to gain access to both the Kubernetes API server and the KMS to decrypt secrets.

For better visibility into the state of a Kubernetes cluster, you can use Datadog CSM’s Kubernetes detection rules to quickly notify you of any configurations that make the cluster more vulnerable, such as not leveraging an encryption provider to encrypt secrets or RBAC to restrict traffic.

Detection rule for Kubernetes RBAC rule

Create isolated pods with limited privileges

Pods share similar configurations and contexts as individual containers, such as network policies and resource limits, so you can leverage the same isolation rules to prevent attackers from creating or modifying pods or accessing other containers. Kubernetes provides out-of-the-box security policies via an admission controller to give you more control over pod configurations in a cluster—pods must be configured according to your policies in order to be deployed successfully. These policies offer various levels of protection based on Kubernetes recommendations, such as:

  • restricting privileged pods and privilege escalation
  • limiting pod capabilities (e.g., run mount operations, modify processes)
  • restricting access to the host’s namespace, ports, and filesystems

Datadog CSM provides an extensive list of posture management detection rules for Kubernetes infrastructure to help you identify pods that are not configured according to these policies. Additionally, Datadog Cloud SIEM complements these configuration checks by automatically detecting suspicious activity across Kubernetes clusters that may be outside of normal operations, including activity that could result in a misconfigured pod. For example, Datadog will flag new pods that could be suspicious, such as those with privileged permissions or that have access to the host network. These scenarios could indicate that a cluster is not configured with a security policy or that a policy is not restrictive enough.

It’s important to note that the pod security admission controller may not work for all use cases, such as granting specific rights to a workload while still limiting its access to cluster node resources. To mitigate this limitation, you can also leverage the Validating Admission Policy or widely used open source tools like Open Policy Agent Gatekeeper to implement pod policies across your cloud environments. Datadog’s Gatekeeper integration enables you to monitor the status of your Gatekeeper-managed policies and ensure that they are configured appropriately.

Datadog's Gatekeeper integration

Secure Kubernetes infrastructure in the cloud

The final layer of infrastructure is the cloud provider that hosts your application. Most providers offer managed services like Google Kubernetes Engine (GKE), Amazon Elastic Kubernetes Service (EKS), and Azure Kubernetes Service (AKS), to simplify the process for deploying and scaling your container environment, but they can be vulnerable to some of the same security risks as other parts of your infrastructure (e.g., misconfigurations, insufficient monitoring). The following best practices can give you more visibility into activity across your platform and ensure that any cloud resources supporting your Kubernetes infrastructure are configured appropriately.

Enable audit logging

As we discussed earlier, Kubernetes audit logs provide more details about cluster-level activity. For insights into events across a cloud provider, including logins, edits to a profile or resource, and the status of a resource, you can also collect platform-specific audit logs. Enabling and understanding how to interpret these logs can help you uncover application resources and cloud accounts that are not configured according to your security policies, which are the most common vulnerabilities in a cloud environment. Depending on your provider, you can enable AWS CloudTrail logs, Azure platform logs, or Google Cloud audit logs to capture activity.

Datadog Cloud SIEM leverages these logs to identify changes in cloud resources that warrant further investigation, such as an IAM policy that suddenly changes. You can also correlate these types of changes with Datadog CSM to help you determine if they are the result of a misconfigured account, such as an IAM user that has administrative access to your AWS environment, enabling them to change IAM policies.

Detection rule for AWS IAM

Use the principle of least privilege for cloud accounts

Cloud-based Kubernetes applications require different users and services to have varying levels of access, which can introduce permission misconfigurations that can be exploited by attackers. For example, an attacker can take advantage of a misconfigured IAM permission in order to take over a GKE service account and make changes to an application cluster. Creating minimally privileged user and service accounts—and granting additional permissions only when necessary—can help protect Kubernetes resources from unauthorized access. You can check out GKE’s, EKS’s, and AKS’s documentation for best practices on implementing secure identity-based policies, which can complement your existing RBAC policies and container-level configurations.

Datadog CSM can help you monitor policies across cloud and multi-cloud environments, so you can ensure that all of your user and service accounts are configured appropriately. For example, Datadog will notify you when RBAC is not enabled on AKS instances.

Restrict access to the provider’s metadata API

Cloud platforms often provide a metadata API server to store metadata about environment resources, such as the name of virtual machine instances deployed on GKE. Metadata can include cloud credentials, identity tokens, and other sensitive information that running pods have full access to by default. Accessing a provider’s metadata API—such as Amazon EC2’s instance metadata service (IMDS)—is one way attackers explore Kubernetes infrastructure in order to find other resources that they can exploit. For example, an attacker can leverage a compromised pod in an EKS cluster to query the metadata service for an EC2 instance’s credentials. With this information, the attacker can access account-level details about the cluster (e.g., EC2 instance data, security groups, VPCs) and manipulate cluster resources.

Datadog CSM can notify you when a network utility like curl or wget accesses an EC2 IMDS via an interactive session. If Datadog detects this type of activity, it will generate a security signal that includes more details about that session, such as the executed commands and the affected host.

A finding generated when someone accesses an EC2 IMDS

In production environments, using an interactive session to access an EC2 IMDS is not a common operation. It’s important to be aware of this activity so you can determine whether it came from a legitimate source or is an indicator of a larger threat to your resources. Network policies can help you mitigate this type of threat by restricting traffic from pods to your cloud provider’s metadata API.

Using our previous EKS example, you can leverage the Calico network policy engine to create the appropriate policies for your clusters. You can also use Datadog Network Monitoring to easily visualize traffic between Kubernetes clusters, the metadata API service, and other cluster resources and verify that your policies are working as expected. These measures help ensure that attackers can’t retrieve credentials for other cloud resources should they gain access to a pod.

A multi-layered approach to securing Kubernetes applications

In this guide, we looked at some best practices for securing every level of your Kubernetes application—from application code to the cloud provider hosting your Kubernetes resources. We also explored how Datadog enables you to easily monitor your Kubernetes stack in its entirety and identify critical issues in real time. This multi-layered approach to security helps you remediate exploitable misconfigurations and detect legitimate threats and attacks as soon as they occur.

Check out our documentation to learn more about the Datadog Security Platform. If you don’t already have a Datadog account, you can sign up for a .