Transform and Enrich Your Logs at Query Time With Calculated Fields | Datadog

Transform and enrich your logs at query time with Calculated Fields

Author Nicholas Thomson
Author Usman Khan

Published: October 28, 2024

As the number of distinct sources generating logs across systems and applications grows, teams face the challenge of normalizing log data at scale. This challenge can manifest when you’re simply looking to leverage logs “off-the-shelf” for investigations, dashboards, or reports–especially when you don’t control the content and structure of certain logs (like those collected from third-party applications and platforms). Without processing mechanisms that enable efficient search and analysis, teams risk losing some of the value of their most critical and expensive data streams.

Meanwhile, teams’ analytical needs also continue to evolve in ways that can be difficult to predict. This makes the ability to easily process or reprocess data at query time and dynamically reshape it on the fly incredibly valuable for on-call engineers tackling unique incidents, security analysts hunting for novel threats, and anyone else looking to perform in-depth investigations in highly dynamic environments. To help address these challenges, Datadog offers Calculated Fields in the Log Explorer. With Calculated Fields, every Log Management user now has the power to remodel and transform their log data on the fly during searches and investigations—unconstrained by pipelines, permissions, or time period.

In this post, we’ll show you how common workflows can benefit from usage of Calculated Fields, including:

Troubleshoot performance issues

Calculated Fields make it easy to add new dimensions to your logs on the fly right from the Log Explorer. For example, say you’re an SRE at a large e-commerce platform, and you’ve just been alerted to an increase in latency across your app. You navigate to the Log Explorer for more context on what might be causing the problem.

Your instinct is to look for actions that are taking longer than usual. When you open one of your application logs to inspect it though, you only find two timestamp attributes to work with, individually marking the beginning and end of the logged action. You consider placing a request to your Datadog admin to update your pipeline with an Arithmetic Processor that computes a “duration” attribute. However, you know it could take hours or days to hear back.

Instead, you directly pivot from the timestamp attribute to calculate the field you need. It’s a simple formula: @end_time - @start_time. And just like that, in mere seconds you’ve unblocked your investigation with a brand new #DURATION field that you can now filter, group, and sort by like any other.

Sorting your logs by #DURATION, you see that the highest #DURATION logs are coming from the checkout service.

Filter your logs by the newly created calculated field

To investigate further, you can click through to the Service Catalog, which alerts you to the fact that the host running the checkout service is underprovisioned for CPU. With this knowledge in hand, you alert the service owner to the issue so they can remediate the problem.

Investigate suspicious activity

Calculated Fields can also help speed up time-sensitive investigations around security threats, preventing attacks from wreaking havoc on your system. For example, say you’re a security analyst at a banking application, and you’re alerted to a spike in failed login attempts, which could be a sign of a brute-force attack.

One piece of evidence that could help you determine if the attempted logins are malicious is whether the login attempts actually came from your office location, or if they originated externally. As a security user, you’re not familiar with log pipeline configuration—that’s handled by your Observability team. Luckily, Calculated Fields allows you to perform ad hoc filtration on your logs to carry out this investigation.

You navigate to the Log Explorer and filter for logs from the suspicious IP. Then, you click on one of the logs, opening up the detail side panel. Here, you are able to calculate fields from any local Event Attribute, so you input the following equation:

 employee_office_location != current_login_location
 

#EMPLOYEE_LOCATION is a boolean field, the value of which will be true or false depending on whether or not the employee’s location is at the office.

Easily create a calculated field from the Log Explorer

Now, when you filter these logs using this newly calculated field, you find that all the login attempts are coming from a location outside the employee’s office. This is cause for alarm. You immediately flag the suspicious IP, halting any further login attempts until you can investigate further and confirm your suspicion.

Understand transactional data for business analysis

Another great use case for Calculated Fields is adding context to your existing logs to gain previously obscured insights from your data. This can be especially useful if you are taking advantage of Flex Logs, which offers cost-effective retention options allowing you to keep logs queryable inside Datadog for up to 15 months.

For example, say you’re a business analyst at an e-commerce company looking for insights in the purchasing history of one of your most loyal customers. However, your third-party CRM tool sends logs with names split into “first” and “last” fields, which makes it hard to group data directly by customer. You could update your pipeline, but that does nothing for the historical analysis you’re looking to complete now as the changes would only apply to new logs.

Luckily, Calculated Fields provide a path to enriching your indexed data using this formula:

#FIRST_STANDARDIZED = upper(@user.first_name), 
#LAST_STANDARDIZED = upper(@user.last_name),
#NAME_STANDARDIZED = concat(FIRST_STANDARDIZED, LAST_STANDARDIZED)

This will standardize the case of the user’s name in your logs, making it easy to surface logs coming from this user in the Log Explorer.

Standardize names to more easily surface logs from certain users

Because you’re now able to quickly filter for logs from this user, you’re able to uncover insights about purchasing history, trends, frustration signals, security, and more. This visibility helps your team improve the end-user experience so they can make your application stand out in a crowded marketplace.

Improve ingestion time processing

Calculated Fields can help all members of your team—from non-technical users entirely unfamiliar with logging schemas to advanced power users whose querying needs constantly change and evolve—format log data on the fly to serve their specific querying needs. In addition, Datadog enables you to standardize your logs’ processing, parsing, and transformation ahead of time via Log Pipelines and Processors. Our pipelines configuration page includes many out-of-the-box processors for commonly used technologies, such as AWS.

Log Pipelines and Processors enable users to store these data transformations long-term, which is helpful if you want to reuse them (e.g., defining logs as error logs if they have an error message). This also allows you to standardize log schemas, which can be helpful if, for example, 10 different teams all are using their own calculated fields and want to create a shared dashboard.

Enhance ad hoc queries with Calculated Fields

Calculated Fields allow all members of your team to add additional information to logs to enhance both time-sensitive investigations and ad hoc analysis. This new feature complements Flex Logs, Datadog Log Management, and Log Pipelines and Processors by enabling you to add new dimensions to your logs at query time. Datadog preconfigures 300+ pipelines out-of-the-box via first-class integrations for common log sources and formats, helping you achieve faster time to value from your log analysis.

If you’re new to Datadog, sign up for a 14-day .