How Financial Services Companies Discover, Classify, and Manage Sensitive Data With Datadog | Datadog

How financial services companies discover, classify, and manage sensitive data with Datadog

Author Pronoy Chaudhuri

Published: August 28, 2024

As financial services companies, such as banks, hedge funds, and stock exchanges, move to the cloud, sensitive data often unintentionally moves with them. To help avoid costly breaches and address governance, risk, and compliance (GRC) requirements such as PCI-DSS, GDPR, and SOC 2, these organizations may need to identify where in the cloud sensitive data can leak and be able to redact it at scale.

Financial services companies that use Datadog often enable Sensitive Data Scanner to meet these needs due to its ability to discover, classify, and redact sensitive data across logs, traces, RUM, and events—all common places for sensitive data leaks. Moreover, many Datadog customers who are required to meet PCI-DSS’s strict security controls use Sensitive Data Scanner to redact payment card information when ingested by Datadog.

In this post, we’ll show you how DevOps and security teams in financial services companies use Sensitive Data Scanner to:

Detect credit cards, bank account numbers, and other PII across observability data

Most financial services companies scan all production data as well as services that process customer data—such as credit card or loan application forms, which often contain credit card numbers, bank account numbers, and personally identifiable information (PII), such as Social Security numbers (SSNs) and email addresses. They also typically use Sensitive Data Scanner to not only scan their logs but also their APM spans, RUM events, and custom events.

APM spans and RUM events often contain IP addresses, geolocation data, and credit card numbers, which can be considered as sensitive data leaks if they are not classified and redacted. Scanning custom events is useful in cases where third-party alerts may contain sensitive data, such as those from GitHub pull request messages and ServiceNow tickets.

To meet these needs, financial services companies often use Sensitive Data Scanner to create multiple scanning groups, breaking down scans by business unit or service. This enables teams to fine-tune the rules they use to classify their data and the actions Sensitive Data Scanner should take in managing it. Creating a scanning group also lets you define which telemetry data should be scanned by that group. The following example shows settings for a scanning group that scans all logs, APM spans, RUM events, and custom events in the production environment.

Edit Scanning Group page with settings to scan the production environment for all logs, APM spans, RUM events, and custom events.

Classify sensitive data matches against common compliance standards such as PCI-DSS and GDPR

Once a scanning group is created, scanning rules can be added to it. You can either choose from Datadog’s 90+ Sensitive Data Scanning rules or create custom rules by writing regular expressions. Sensitive Data Scanner offers four rule categories: Secrets and Credentials, Credit Cards and Banking, Personal Identifiable Information (PII), and Network and Device Information.

The Sensitive Data Scanner Rules Library showing scanning rules organized by scanning group.

Most financial services companies enable all “Credit Cards and Banking” rules—often to meet PCI-DSS requirements—along with all “Personal Identifiable Information (PII)” rules—often to meet internal data loss prevention (DLP) requirements. These rules are managed by Datadog and updated as needed. However, to reduce noise, we recommend enabling rules that make sense for your particular organization. “Secrets and Credentials” rules can be useful when securing financial data because they enable you to keep track of leaked keys, which could be used by bad actors to cause harm if found.

Scanning Groups page showing rules within default and custom rule categories.

Many financial services companies also create and use custom rules to meet data security requirements. For example, trading firms may leak stock ticker symbols in their telemetry data, which could reveal their stock positions. Depending on the exchange, stock ticker symbols can contain between 1 and 5 characters. However, there could be many other 1–5 character strings in your telemetry data, which could lead to false positives.

To improve accuracy, we recommend using the keyword dictionary, which enables you to provide a keywords list to check within a defined proximity of the matching pattern. If any of the keywords are found within the proximity check, then the rule will evaluate as true. If none of the keywords are found, then the rule will evaluate as false and the matched value will be excluded.

Common keywords used to fine-tune matching conditions for sensitive financial data are stock, stock_position, and other well-known attribute names in the organization. Using these provides a massive benefit, as it enables you to use more generic regular expressions for sensitive data classification.

Custom keywords being added to the keyword dictionary to fine-tune matching conditions.

Finally, we have seen financial services companies rely on rule targeting to include or exclude certain attribute values from being scanned. This is more useful for structured data, whereas the keyword dictionary is more useful for unstructured data. For example, if your customer and application IDs are also represented as 1–5 character strings, you can exclude scanning their values so that your scanning rules are only focused on attributes that are likely to contain sensitive information.

Using rule targeting to exclude certain attributes from scanning.

Redact results to help prevent sensitive data from leaking into Datadog

Once you’ve defined your rules and how they should be applied during a scan, you can define the actions Sensitive Data Scanner takes when it finds a match. If you have a policy where no sensitive data should be present in Datadog, we recommend using Redact, Partially Redact, or Hash.

Defining the actions that Sensitive Data Scanner should take upon finding a match.

Redacting will replace the match with whatever placeholder you choose. Partial redaction will redact the first n or last n characters, which is helpful if you need to investigate multiple instances of the same match. Hashing will have the same effect, except that none of the original match characters will be searchable. Many financial services companies prefer hashing, since they often have internal data loss prevention requirements to investigate leaks. Having the data in hashed form makes it easier to investigate multiple instances of the same match.

You can also set a priority level for the scanning rule so that when a match is found, you know how important it is to address the issue. Financial services companies may be best served by assigning credit cards, secrets, and credentials as critical/high, customer emails as medium/low, and all other detections as info. Once the rule is saved, it evaluates all data streamed into Datadog that matches your scanning group criteria.

Setting a priority level for a rule.

Resulting matches are viewable on the Summary page. The Summary page aggregates all matches for easy triage and remediation. In the first row below, we notice that in the past day, 242k logs in “*” have positive matches for American Express Cards. From here, most financial service companies would want to investigate, verify the match, and identify which services were impacted so they can work with the service owner to stop leaking these credit card numbers.

Sensitive Data Scanner summary page showing matches.

Clicking on the issue gives you a few options. You can click on logs to view their contents and verify matches. You can create a case with Case Management to collaborate with other team members. You can also view the services, hosts, and environments affected. Assuming that you have verified the match, you would be able to reach out to “alex@shopist.com” for next steps.

The details page about a particular match, which enable you to take action to address it.

Discover, classify, and manage your sensitive financial data today

Join other financial services companies in using Sensitive Data Scanner to help prevent sensitive data loss across your logs, traces, RUM, and events. Start by scanning all your data for credit card numbers and other banking information. Then, ensure that you scan your critical services for secrets, credentials, and PII. Finally, follow up on sensitive data issues to help comply with your PCI-DSS obligations.

To learn more about Sensitive Data Scanner, see our documentation. If you don’t already have a Datadog account, you can sign up for a 14-day today.