Redact Sensitive Data On-Prem Using the Datadog Agent | Datadog

Redact sensitive data on-prem using the Datadog Agent

Author Lutao Xie
Author Bowen Chen

Published: June 26, 2024

Logs and other telemetry generated by your applications and infrastructure may contain sensitive information such as authentication tokens, secret keys, or customer IDs. Without the ability to quickly and easily identify sensitive data in your environment, it can be difficult to maintain regulatory compliance and avoid data leaks that impact revenue, customer trust, and business operations.

Datadog Sensitive Data Scanner (SDS) helps you detect and redact sensitive information in your telemetry so you can confidently comply with various regulations and improve your data security. For some organizations, detecting sensitive data and redacting it in the cloud as it is ingested into Datadog is sufficient. But data privacy laws like GDPR and HIPAA may prevent organizations in strictly regulated industries, such as government and healthcare, from moving sensitive data off-premise. In these cases, it may be necessary to redact sensitive data on-prem—i.e., while it’s still within your environment and not yet ingested into Datadog.

Redacting sensitive data on-prem is already possible with Datadog Observability Pipelines—and now, we’re excited to announce that you can apply SDS scanning rules directly using the Datadog Agent. This enables you to obfuscate sensitive information in your logs with existing deployment tools and ensure data compliance prior to sending logs to downstream services. In this blog post, we’ll show you how to:

Use the Datadog Agent to redact sensitive data within your premises

When your infrastructure and applications that are instrumented with Datadog generate events, metrics, logs, and traces, they are collected by the Datadog Agent. At this point, the data still resides within your local environment and any processing done is considered to be on-prem. Our Agent then forwards this telemetry to Datadog, where you can monitor it in our observability platform. Once the data is ingested into Datadog, it has left your organization’s premises and is now within the cloud.

As data is ingested into Datadog, it shifts from being on-prem to in-cloud.

This creates two separate opportunities in the data collection process to detect and redact sensitive information: while your data is still on-prem and when Datadog ingests it into the cloud. For organizations subject to regulations that require sensitive customer data to remain on-prem, it can be difficult to ensure compliance when sending telemetry to downstream services. This concern may even prevent their usage of observability platforms and other monitoring tools altogether. Meanwhile, organizations may also want to add layers of protection to minimize the risk of data exposure. By scrubbing data both on-prem and as it is ingested into Datadog, organizations can bolster their customer privacy and reduce the risk of violating regulations and incurring fines.

By configuring Sensitive Data Scanner for the Datadog Agent, you gain access to nearly 90 out-of-the-box scanning rules that can be applied to your data on-prem. As the Agent collects logs from your applications, it will automatically apply your configured scanning rules to redact and hash any detected sensitive data before it is sent to Datadog. As long as Remote Configuration (which is enabled by default) is successfully connected and your applications are using Agent v7.54+, you can begin configuring on-prem data redaction within the SDS UI without having to deploy any additional pipeline monitoring tools.

Configure scanning rules for the Agent to apply as it collects logs.

By centralizing management for the Agent, you can standardize, configure, and apply your scanning rules across your entire fleet of hosts even at enterprise scale. You can also create scanning groups using host tags to target specific services and environments that you’d like to apply scanning rulesets to. For example, limiting your scanning group to your production environments will enable you to censor live production data while you test features under development in separate environments with no additional overhead.

Create scanning groups to target specific services and ennvironments.

The Agent identifies and applies rules to logs collected in the defined scanning groups using remote control, so no further YAML file configuration is required. Even with SDS enabled, the Datadog Agent adds minimal performance overhead and continues to be lightweight. You can view our OOTB SDS overview dashboard to monitor trends, as well as a summary of SDS activity across your organization. If needed, Datadog will also provide dedicated assistance to help you monitor your Agent fleet’s health and performance and efficiently manage SDS rules.

Monitor your SDS activity with the OOTB SDS Overview dashbaord

Choose between the Datadog Agent and Observability Pipelines for your business use case

Sensitive Data Scanner now supports sensitive data redaction on-prem through both the Datadog Agent and Observability Pipelines. With two Datadog solutions, you may be wondering which is best for your organization. We’ll go through how our Agent solution covers SDS needs for most organizations and the situations in which your organization should consider using Observability Pipelines.

Customers looking for a straightforward sensitive data scanning solution that is easy and quick to configure should consider using the Agent solution. Because it integrates with your existing Agent fleet, no additional YAML configuration or deployment is required beyond selecting your scanning rules in Datadog. SDS using the Agent is also the preferred solution if you’re planning on sending logs exclusively to Datadog.

Choose between the Agent and OP for your on-prem data redaction needs.

If your organization instruments your applications with other monitoring solutions (e.g., Splunk, Sumo Logic, or other sources) in addition to Datadog, or if you want to send logs to other destinations such as Amazon S3 and Google Cloud Storage, you’ll need to configure an Observability Pipeline in order to scrub data on-prem. Additionally, if your business demands large data aggregation or processing, we recommend that you run a dedicated service to handle Observability Pipeline workloads such as log processing, SDS, and data routing. Learn more about bootstrapping Observability Pipeline workers or get started using our Sensitive Data Redaction pipeline template.

Start protecting your data with Datadog

Datadog Sensitive Data Scanner helps you manage sensitive information at scale, whether you’re looking to redact information on-prem or in the cloud. Our Agent features for SDS are now available in Preview. To get started, request access using this form. You can learn more about Sensitive Data Scanner and how it helps you triage and investigate ongoing data leaks in our blog post and in our documentation.

If you don’t already have a Datadog account, sign up for a .