Key Learnings From the 2024 State of Cloud Security Study

Key learnings from the 2024 State of Cloud Security study

Minimize the use of long-lived cloud credentials

Credentials are considered long-lived if they are static (i.e., never change) and also never expire. These types of credentials are responsible for a number of documented cloud data breaches, and organizations should avoid using them.

In AWS, this means you should avoid using IAM users. Using IAM users to authenticate humans is both cumbersome and risky, especially in multi-account environments. AWS IAM Identity Center (formerly AWS SSO) and IAM role federation provide a more secure and convenient way of managing and provisioning human identities, while supporting common command-line applications such as the AWS CLI or Terraform.

AWS workloads—such as EC2 instances—should not use IAM users either. Depending on the compute service in use, AWS provides mechanisms to leverage short-lived credentials by design, so applications can transparently retrieve short-lived credentials without additional effort when using the AWS SDKs or CLI. These mechanisms include:

You can also use a service control policy (SCP) to proactively block the creation of IAM users at the account or organization level.

In Google Cloud, humans should leverage their own identity—typically from Google Workspace—and should not authenticate to the Google Cloud APIs using service account keys.

Google Cloud Workloads should not use service account keys, as they don’t expire and can easily become exposed. Instead, you can attach service accounts to workload resources, such as virtual machines or cloud functions. For the common case of workloads running in a Google Kubernetes Engine (GKE) cluster, you can leverage Workload Identity to transparently pass short-lived credentials for a specific service account to your applications. Google Cloud also provides several organization policy constraints that you can enable at the project, folder, or organization level to disable service account or service account key creation.

In Azure, users accessing resources should authenticate using their Entra ID identity. They should not use long-lived credentials of Entra ID applications. Similarly, Azure workloads such as virtual machines should not embed static credentials of an Entra ID application. Instead, you can leverage Managed Identities to transparently and dynamically retrieve short-lived credentials bound to a specific app registration. Services running in Azure Kubernetes Service (AKS) can use Entra Workload ID.

Using Datadog CSM to identify long-lived cloud credentials

You can use the Inventory in Datadog CSM to identify IAM users and Google Cloud service accounts or service account keys across all your cloud environments.

CSM Inventory in Datadog — You can use the CSM Inventory to identify long-lived cloud accounts, as well as their associated misconfigurations and related threats.

The cloud configuration rule “Service accounts should only use GCP-managed keys” (open in-app) also allows you to quickly identify Google Cloud service accounts with active user-managed access keys.

Track down stale cloud credentials

As discussed above, long-lived cloud credentials are problematic because they never expire and can become exposed in places such as source code, container images, configuration files, etc. Older credentials carry an even greater risk. Consequently, tracking down old and unused cloud credentials is a highly valuable investment for security teams. You can use an AWS credential report to identify IAM users with unused credentials. In Azure, you can use the Microsoft Graph API through the Azure CLI to identify all Azure AD applications with credentials and pinpoint old ones:

az rest --uri 
'https://graph.microsoft.com/v1.0/applications/?$select=id,displayName,
passwordCredentials'

In Google Cloud, you can use Policy Analyzer to retrieve the last authentication time of every service account and identify unused ones:

gcloud policy-intelligence query-activity \
  --activity-type=serviceAccountLastAuthentication \
  --project=your-project

It’s also possible to use Recommender, including at the organization level.

Use Datadog CSM to track down stale cloud credentials

With Datadog CSM, you can identify stale and risky cloud credentials at scale. In particular, you can use the following rules:

Enforce the use of IMDSv2 on Amazon EC2 instances

The EC2 Instance Metadata Service Version 2 (IMDSv2) is designed to help protect applications from server-side request forgery (SSRF) vulnerabilities. By default, newly created EC2 instances allow using both the vulnerable IMDSv1 and the more secure IMDSv2. It’s critical that EC2 instances—especially publicly exposed ones hosting web applications—enforce IMDSv2 to protect against this type of vulnerability, as SSRF vulnerabilities are frequently exploited by attackers.

AWS has released a guide to help organizations transition to IMDSv2, as well as a blog post about the topic. Several additional mechanisms are also available:

A mechanism to enforce IMDSv2 by default on specific Amazon Machine Images (AMIs), released in 2022
A set of more secure defaults for EC2 instances started from the console, released in 2023
A mechanism to enforce IMDSv2 by default at a regional level, released in 2024 (Datadog implemented support for this mechanism in the Terraform AWS provider)

While it’s more efficient to enforce IMDSv2 at the design phase—for instance, by updating the source infrastructure-as-code templates—it’s also possible as a last resort to use an SCP to block access to credentials retrieved using IMDSv1.

Use Datadog CSM to identify EC2 instances that don’t enforce IMDSv2

You can use Datadog CSM to identify EC2 instances that don’t enforce IMDSv2 through the Misconfigurations rule “EC2 instances should enforce IMDSv2” (open in-app) and the CSM Issue “Publicly accessible EC2 instance uses IMDSv1” (open in-app).

In addition, the attack path “Publicly accessible EC2 host is running IMDSv1 and has an SSRF vulnerability” (open in-app) allows you to easily identify instances that are immediately at risk.

IMDSv1 finding in Datadog CSM — Datadog CSM identifies an EC2 instance that's publicly accessible and does not have IMDSv2 enforced.

Block public access proactively on cloud storage services

Cloud storage services such as Amazon S3 or Azure storage are highly popular and were among the earliest public cloud offerings. While storage buckets are private by default, they are frequently made public inadvertently, exposing sensitive data to the outside world. Thankfully, cloud providers have mechanisms to proactively protect these buckets, ensuring that a human error doesn’t turn into a data breach.

In AWS, S3 Block Public Access allows you to prevent past and future S3 buckets from being made public, either at the bucket or account level. It’s recommended that you turn this feature on at the account level and ensure this configuration is part of your standard account provisioning process. It’s important to note that since April 2023, AWS blocks public access by default for newly created buckets. However, this doesn’t cover buckets that have been created before this date.

A similar mechanism exists in Azure, through the “allow blob public access” parameter of storage accounts. Similarly to AWS, all Azure storage accounts created after November 2023 block public access by default. Microsoft, however, planned to migrate to a more secure default in November 2023.

In Google Cloud, you can block public access to Google Cloud Storage (GCS) buckets at the bucket, project, folder, or organization level using the “public access prevention” organization policy constraint.

When you need to expose a storage bucket publicly for legitimate reasons—for instance, hosting static web assets—it’s typically more cost-effective and performant to use a content delivery network (CDN) such as Amazon CloudFront or Azure CDN rather than directly exposing the bucket publicly.

Use Datadog CSM to identify vulnerable cloud storage buckets

You can use Datadog CSM to identify vulnerable cloud storage buckets through the following rules:

Datadog CSM finding that blob container allows anonymous access

Datadog also determines when a storage bucket is publicly accessible, and allows you to filter associated misconfigurations, so you can focus on the ones that matter most.

Filter results for public accessibility in Datadog CSM — You can also manually select or deselect public accessibility as a facet in the CSM sidebar.

Limit privileges assigned to cloud workloads

Cloud workloads such as virtual machines are frequently assigned permissions to perform their usual tasks, such as reading data from a cloud storage bucket or writing to a database. However, overprivileged workloads can allow an attacker to access a wide range of sensitive data in the cloud environment—or even gain full access to it. Ensuring that workloads follow the principle of least privilege is critical to help minimize the impact of a compromised application.

Right-sizing permissions is a continuous process that usually follows three steps:

Determine what actions the workload needs to perform.
Apply the associated policy to the workload role, with minimally scoped permissions. For instance, if an EC2 instance needs to read files from an S3 bucket, it should only be able to access this specific bucket.
Avoid “permissions drift” by ensuring that the workload still requires and actively uses these permissions.

At development time, you can use tools like iamlive to discover what cloud permissions a workload needs. At runtime, cloud provider tools such as Amazon IAM Access Analyzer or Google Cloud Policy Intelligence can compare granted permissions with effective usage, to suggest scoping down permissions of specific policies. Note that Google decided to make Policy Intelligence accessible only to organizations that use Security Command Center at the organization level, instead of providing it as part of their base offering, starting January 2024.

In addition, it’s important to note that seemingly innocuous permissions can allow an attacker to gain full access to a cloud account through privilege escalation. For instance, while the AWS-managed policy AWSMarketplaceFullAccess may give the impression it’s only granting access to the AWS Marketplace, it also allows an attacker to gain full administrator access to the account by launching an EC2 instance and assigning it a privileged role.

Use Datadog CSM to identify overprivileged cloud workloads

You can use the following CSM Misconfiguration rules to identify risky workloads:

Datadog CSM finding that publicly accessible EC2 instances have highly privileged IAM roles

In addition to showing you a detailed context graph, Datadog CSM also indicates the “blast radius” of the impacted workload—i.e., further resources that can be accessed if the instance is compromised.

Blast radius visualization in Datadog CSM

Apply cloud-specific tuning to your managed Kubernetes clusters

Managed Kubernetes services like Amazon EKS, Azure AKS, and Google Cloud GKE are widely adopted in cloud environments, enabling teams to concentrate on deploying application workloads rather than handling the complexities of Kubernetes control plane management. Despite their popularity, these clusters often come with insecure default configurations. This can pose significant risks, as these clusters operate within cloud environments; compromising a managed cluster can provide attackers with pathways to access the underlying cloud infrastructure.

In general, it’s important to make sure of the following when working with managed clusters:

Limit internet exposure. This can typically be achieved by allow-listing IP ranges, using a site-to-site VPN tunnel, or a zero-trust solution such as Google Cloud’s Identity-Aware Proxy (IaP), which is fully compatible with GKE.
Ensure that proper logging is enabled. On Amazon EKS, audit logs are not enabled by default and require explicit configuration.
Block pod access to the worker node’s metadata service. On EKS and AKS, this is typically done through a Kubernetes network policy. On GKE, this requires turning on Workload Identity.

Use Datadog CSM to identify risky managed Kubernetes clusters

You can use the following CSM Misconfigurations rules to identify risky managed clusters in your cloud environment:

Insecure EKS cluster finding in Datadog CSM

Secure IAM roles used for third-party integrations

Third-party SaaS services often integrate with cloud environments. The most common way of doing so is through the use of IAM roles. In this situation, customers of that service create an IAM role with a trust relationship to a specific, vendor-owned AWS account.

This increases the risk of “confused deputy” attacks, where any third party with knowledge of your AWS account ID could enroll your account on the third-party SaaS service. To prevent this type of threat, it’s essential to enforce the use of an ExternalID, an identifier known only by the user and the third-party SaaS solution.

It’s also critical to make sure that your role does not have excessive permissions with regards to what the third-party service needs to accomplish. For instance, a role used to read CloudTrail logs should not have permissions to read data from S3 buckets. Otherwise, if the role is compromised (such as through the lack of an ExternalID) or the third-party SaaS service is breached, an attacker would get access to sensitive data.

Use Datadog CSM to identify risky third-party roles

You can use Datadog CSM to identify overprivileged third-party roles, as well as roles that have excessive permissions.

You can also use a custom query in the CSM Misconfigurations explorer to identify roles matching both of these rules:

@workflow.rule.defaultRuleId:(def-000-qaw OR def-000-au9)

Finally, CSM Identity risks allows you to visually identify unused permissions and suggest a right-sized policy, based on correlation with CloudTrail logs and IAM Access Analyzer.

Datadog CSM finding that AWS IAM role has a large permissions gap

Suggested downsized policy in Datadog CSM

Secure your infrastructure against common cloud risks

Our 2024 State of Cloud Security study shows organizations have made continued improvements from 2023 when it comes to securely configuring their cloud infrastructure. Still, there is work to be done in several key areas—these include avoiding long-lived and unmanaged credentials, using the most up-to-date and secure configurations on compute and storage instances, and ensuring roles are not overprivileged, particularly those associated with third-party SaaS integrations. In this post, we highlighted several steps that teams can take to harden their security posture on these fronts, and they can use Datadog CSM to help in that effort.

To get started with CSM, check out our documentation. If you’re new to Datadog, sign up for a 14-day free trial.

Want to work with us? We're hiring!

Key learnings from the 2024 State of Cloud Security study

Further Reading

Minimize the use of long-lived cloud credentials

Using Datadog CSM to identify long-lived cloud credentials

Track down stale cloud credentials

Use Datadog CSM to track down stale cloud credentials

Enforce the use of IMDSv2 on Amazon EC2 instances

Use Datadog CSM to identify EC2 instances that don’t enforce IMDSv2

Block public access proactively on cloud storage services

Use Datadog CSM to identify vulnerable cloud storage buckets

Limit privileges assigned to cloud workloads

Use Datadog CSM to identify overprivileged cloud workloads

Apply cloud-specific tuning to your managed Kubernetes clusters

Use Datadog CSM to identify risky managed Kubernetes clusters

Secure IAM roles used for third-party integrations

Use Datadog CSM to identify risky third-party roles

Secure your infrastructure against common cloud risks

Further Reading

Start monitoring your metrics in minutes

Key learnings from the 2024 State of Cloud Security study

Further Reading

Related jobs at Datadog

Further Reading

Secure your cloud environment from end to end with Datadog Infrastructure-as-Code Security

Identify the secrets that make your cloud environment more vulnerable to an attack

From on-prem to cloud: Detect lateral movement in hybrid Azure environments

How we use Datadog for detection as code