Key Metrics for Monitoring AWS WAF | Datadog

Key metrics for monitoring AWS WAF

Author Mallory Mooney

Published: June 4, 2024

AWS WAF is a managed web application firewall that monitors network traffic to your AWS applications and resources. As a perimeter-based firewall, AWS WAF is designed to secure the boundaries between your applications and the public internet. This means that it’s capable of protecting all of the various elements of your AWS architecture, including Amazon API Gateways, load balancers, and Amazon CloudFront distributions.

Monitoring AWS WAF activity is essential for assessing the firewall’s ability to manage incoming request traffic and broad-scale attacks from threat actors and bots as expected. To help you monitor its performance in these areas, AWS WAF generates standard request metrics as well as metrics for its built-in CAPTCHA, challenge, and bot control components.

In this post, we’ll look at the following categories of AWS WAF metrics:

We’ll also briefly summarize the data you can extract from WAF logs so you can get full visibility into your WAF configurations and traffic. But first, we’ll describe the main components of AWS WAF and the roles they play in processing incoming traffic.

How AWS WAF works

Perimeter firewalls like AWS WAF monitor ingress network traffic that occurs at the application layer of the OSI model in order to protect applications from a wide variety of threats. For example, AWS WAF can be used to detect and prevent distributed denial-of-service (DDoS) attacks, which typically attempt to flood applications with requests in order to exhaust underlying resources. AWS WAF can also be used to prevent web application attacks, which send malformed HTTP requests—like you might see in a SQL injection attack—in order to gain control of internal resources such as databases. In addition to attacks like these, threat actors often use bots to scan applications for vulnerabilities or disrupt performance by flooding them with traffic as part of a more sophisticated DDoS attack.

Rules and rule statements

AWS WAF manages all request traffic via individual rules, which process any incoming request against a pre-configured set of criteria, known as rule statements. Rule statements describe which request components—such as HTTP method, headers, cookies, URI path, and more—the firewall should inspect and how to evaluate them.

AWS WAF offers the following types of rule statements for evaluating request components:

  • Match rule: Inspects a request component for a specific match, such as geographic, string, or malicious SQL code
  • Logical rule: Enables you to combine rule statements based on AND, OR, and NOT logical statements
  • Rate-based rule: Detects rate-limit matched requests that are being generated at a significant rate
  • Rule group rule: Enables you to reference custom or managed rule groups in your statements

Depending on how a request matches this criteria, the rule will then execute either a terminating or non-terminating action on it.

Rule groups and web ACLs

You can apply a collection of rules directly to rule groups or web access control lists (web ACLs), both of which provide unique capabilities. Rule groups offer an additional organizational layer for a large collection of rules. You can create custom rule groups, or deploy managed rule groups from AWS or from third-party AWS Marketplace vendors. Managed rule groups, which are preconfigured and receive regular updates, are designed to support a wide variety of web applications, protect applications from well-known vulnerabilities and threats, and meet compliance requirements. For example, some AWS managed rule groups offer automatic threat detection for fraudulent activity, such as account creation and takeover.

In order to protect AWS resources from malicious traffic, you can deploy your rules and rule groups as part of a web ACL. Web ACLs are associated with specific AWS resources and deployed per region. When you use web ACLs, AWS WAF processes traffic in the order of your priority settings, which you are required to configure for each rule and rule group. As new requests are generated, AWS WAF evaluates them against all available rules and rule groups, starting with the lowest numeric priority and working up in ascending order until all rules are applied.

The following diagram illustrates how you can organize a set of rules and rule groups as part of a single web ACL:

Diagram of AWS WAF web access control lists

In the case of the example web ACL shown in the diagram, AWS WAF will evaluate incoming requests in the following order:

  • Rule Group 1, Rule 1
  • Rule Group 2, Rule 2
  • Rule Group 2, Rule 1

Rule actions

You can configure rules to have terminating and/or non-terminating actions, depending on the nature of a request. When AWS WAF evaluates a request against a rule with a terminating action, it will not process any other rules within the web ACL. Non-terminating actions instruct AWS WAF to continue applying rules and rule groups, based on their priority settings. You can configure rules to either automatically allow or block a request (i.e., both terminating actions) or count a request (i.e., a non-terminating action). The count action instructs AWS WAF to capture request information and then continue applying the remaining rules until it either allows or blocks the request from traveling to an AWS resource. You can also add CAPTCHA or challenge actions to obtain additional verification that the request is not coming from a bot; these actions can be either terminating or non-terminating.

If a rule doesn’t include an action, the web ACL will apply either a default allow or block action, depending on your configuration. Blocked requests will typically see a 403 (Forbidden) response, though you can create custom responses to fit your use cases.

Labels

Each rule group generates labels and adds them to requests as it evaluates them. Labels provide additional details about a request based on the rule’s use case, and AWS WAF transforms that data into metrics, and records it in logs. We’ll look at how you can interpret these labels in logs in more detail later. It’s important to note that AWS WAF will only generate metrics for the first 100 labels on an evaluated request, regardless of whether or not the rule applied more than that number.

Now that we’ve looked at how AWS WAF manages web traffic and generates metric and log data, we’ll walk through which of these metrics are key for monitoring WAF configurations and performance.

Key AWS WAF metrics

AWS WAF generates metrics based on a web ACL and its rule groups, rules, and rule actions, in addition to any generated labels. For the sake of this post, we’ll classify non-label metrics as standard request metrics, CAPTCHAs and challenges metrics, and bot control metrics. The terminology we’ll use to describe these metrics refers to the categories from our Monitoring 101 series, which provides a framework for metric collection and alerting.

Standard request metrics

AWS WAF’s request metrics are based on a rule’s available actions. For example, when a web ACL allows an incoming request, AWS WAF will increment the appropriate metric. The following request metrics provide a high-level overview of WAF activity, which creates a solid starting point for understanding your configurations and how well they are managing traffic.

NameDescriptionMetric typeAvailability
AllowedRequestsThe total number of requests that the WAF allowedWork: ThroughputCloudWatch
BlockedRequestsThe total number of requests that were blocked by AWS WAFWork: ThroughputCloudWatch
PassedRequestsThe total number of requests that do not match any configured ruleWork: ThroughputCloudWatch
CountedRequestsThe total number of requests that match all conditions of a particular ruleWork: ThroughputCloudWatch

Metrics to alert on: AllowedRequests, BlockedRequests

Keeping track of the total number of allowed requests creates a baseline for monitoring overall AWS WAF activity. Any changes in this metric could indicate an attack as well as gaps in a web ACL’s configuration. For example, a sudden spike in the number of allowed requests for an AWS resource could be a sign of a DDoS attack. In this case, you will need to update an ACL’s configuration to block the source of the attack in order to prevent it from happening again.

Number of allowed requests for AWS WAF
Reviewing the number of allowed requests across web ACLs can help you find unexpected surges in traffic.

A sudden drop in the number of allowed requests for a resource, on the other hand, could indicate an issue with a deployed rule. If this metric’s value is inversely related to the number of blocked requests, meaning there is a spike in the latter, then you may need to modify the criteria that a rule is matching on.

Number of blocked and allowed requests for AWS WAF
Comparing the number of allowed and blocked requests for a single web ACL can help you determine if a rule is misconfigured.

Alternatively, you can configure your web ACLs and rules to automatically block traffic based on certain conditions, such as requests that are malformed or coming from a known-malicious IP address. As with the allow action, monitoring the number of blocked requests can help you uncover misconfigurations in a web ACL rule. For example, a series of blocked requests from a new or atypical source that is quickly followed by an allowed request could indicate that a threat actor successfully bypassed AWS’s sizing limits on web request components.

By default, AWS WAF can inspect the first 8 KB of a request’s body, header, and cookies. If a threat actor sends an attack payload in a request’s body with a size that is larger than 8 KB, AWS WAF will automatically forward it to the application. This scenario can happen if an application is vulnerable to attacks like SQL injections and server-side request forgeries. You can mitigate this activity by adjusting the SizeRestrictions_BODY, SizeRestrictions_Cookie_HEADER, and SizeRestrictions_QUERYSTRING rules for AWS’s core rule set rule group.

As another example, a significant and sudden increase in the number of blocked requests could be the result of rate limiting requests from a legitimate IP address. AWS WAF will block requests from the IP if it hits your configured rate limit, and continue blocking it until the rate falls below your configured threshold. An influx of blocked requests could also indicate an active DDoS attack. In this case, AWS WAF is working as expected, but it’s important to determine the source of the attack before it escalates and negatively affects AWS resources.

Metrics to watch: PassedRequests, CountedRequests

Monitoring the number of passed or counted requests can help you test your AWS WAF rules before you deploy them. For example, an increase in the number of passed requests, which are requests that don’t match any rules, could indicate that a web ACL is overlooking traffic from new sources. In these cases, you may need to determine if the traffic should be allowed and then update your web ACL rules accordingly.

Monitoring the number of counted requests can be helpful for testing rules that are configured to block a segment of traffic, such as traffic from a specific source. For example, before you enable the blocked action on a rule, you can monitor how many requests it counted and compare that number to the total number of requests you expected to see. This test ensures that the rule will block traffic from the appropriate source when enabled.

CAPTCHA and challenge metrics

AWS WAF enables you to add rule actions that automatically apply CAPTCHAs or challenges to a request in order to verify that the source IP address is legitimate. Both actions are useful for protecting important workflows and resources in your application from a wide variety of automated threats, including external web scrapers, bots creating fake accounts, and ticket scalpers.

CAPTCHAs and challenges offer different capabilities that are worth noting. CAPTCHA puzzles aim to prove that the end user sending the request is human. They are typically used for filtering out traffic from bots while still allowing legitimate traffic. Because they require manual input, CAPTCHAs should only be implemented in specific areas of your application to ensure that the end user’s experience isn’t negatively affected—logins, forms, and search functions are common use cases. Challenges alternatively run in the background in order to verify the end user’s client session. This option can be useful when you want to protect static content like images or CSS but do not want to require manual input in order to request them.

Both CAPTCHAs and challenges are initiated based on a request’s token. If an incoming request has a valid token, AWS WAF will count the request (a non-terminating action), apply the appropriate labels, and continue processing it against the remaining rules. If the token is either missing, invalid, or expired, AWS WAF will send either a CAPTCHA puzzle or challenge response back to the client, based on the rule’s configuration. Solving the puzzle or passing the challenge will generate a new token, which allows AWS WAF to continue processing the request. If the client is unable to provide a solution, AWS WAF will automatically block the request (a terminating action).

The following diagram illustrates this process for validating an incoming request before granting access to an AWS resource:

Diagram of AWS WAF rule

Monitoring CAPTCHA and challenge rule actions, which are captured by the following metrics, gives you a better understanding of how well rules are identifying requests from malicious bots or other unauthorized activity.

NameDescriptionMetric typeAvailability
CaptchaRequestsThe total number of requests that triggered a CAPTCHA puzzleWork: ThroughputCloudWatch
RequestsWithValidCaptchaTokenThe total number of requests that triggered a CAPTCHA puzzle and generated a valid CAPTCHA tokenWork: ThroughputCloudWatch
CaptchasAttemptedThe total number of CAPTCHA attempts that were submitted by an end userWork: ThroughputCloudWatch
CaptchasSolvedThe total number of successful CAPTCHA attemptsWork: ThroughputCloudWatch
ChallengeRequestsThe total number of requests that triggered a challenge actionWork: ThroughputCloudWatch
RequestsWithValidChallengeTokenThe total number of requests that triggered a challenge action and generated a valid challenge tokenWork: ThroughputCloudWatch

Metrics to alert on: CaptchasAttempted, CaptchasSolved

While CAPTCHAs can successfully prevent bot activity, they also directly affect your end-user experience. Monitoring the number of CAPTCHA puzzles that were generated, attempted, and solved can give you a better understanding of how their implementation affects your users. For example, end users may stop interacting with your application altogether if they need to solve too many CAPTCHAs—this can show up as a significantly lower number of attempted or solved puzzles compared to the number of total requests that triggered the action. In this scenario, you may need to adjust the CAPTCHA immunity time, which sets a limit on the frequency of puzzles generated for a user over a period of time. The default value is 300 seconds, but AWS WAF allows a maximum of three days for both CAPTCHA and challenge actions.

Metrics to watch: CaptchaRequests, RequestsWithValidCaptchaToken, ChallengeRequests, RequestsWithValidChallengeToken

These metrics provide a baseline for monitoring the efficiency of your CAPTCHA and challenge rule actions. Having visibility into these areas can help you adjust your rules accordingly in order to accommodate new traffic patterns. Significant changes in traffic, such as a sudden increase in the number of requests that triggered a puzzle or challenge, should be evaluated for malicious activity. In this case, you can review your AWS WAF logs for more information about the types of requests that are triggering the action. An influx of requests from a single IP address, for example, could be the sign of an automated threat. You can implement a rate-based rule with a CAPTCHA or challenge rule action in order to prevent the automated threat while still allowing legitimate traffic.

Bot Control metrics

AWS WAF offers a Bot Control managed rule group for protecting your applications from bot activity. The rule group includes rules for each category of bot as well as other signs of bot activity, such as requests from datacenters that are typically used by them. You can configure the Bot Control rule group to either detect self-identifying bots, bots that do not self identify, or both. For self-identifying bots, which is a part of the “common” level of protection for the rule group, AWS WAF will automatically block any bot that it can not verify. The “targeted” level of protection incorporates CAPTCHA and challenge rule actions for any bot that does not self identify. AWS charges for Bot Control, so it’s important to only apply the rule group to a subset of application pages or resources.

If you don’t use Bot Control, AWS WAF will still generate metrics for a sampling of requests for free. The following key metrics can give you a better understanding of bot activity in your environment before you implement more sophisticated control mechanisms.

NameDescriptionMetric typeAvailability
SampleAllowedRequestThe percentage of sampled requests that AWS WAF allowedWork: SuccessCloudWatch
SampleBlockedRequestThe percentage of sampled requests that AWS WAF blockedWork: SuccessCloudWatch

Metrics to alert on: SampleAllowedRequest, SampleBlockedRequest

Similar to the standard request metrics, alerting on the number of allowed and blocked requests can help you detect misconfigurations in your web ACLs as well as legitimate attacks. A sudden spike in the percentage of sampled requests that AWS WAF allowed could indicate false positives, for example, especially if the triggered rule evaluates a historically low rate of traffic. This scenario can happen as a result of new traffic patterns that are not yet captured in a rule, causing AWS WAF to evaluate legitimate requests as bot activity.

If you use the Bot Control rule group for your application, you can take advantage of its generated labels in order to fine-tune the metrics you monitor. For example, you can monitor all of the standard request metrics without sampling in addition to specific scenarios, such as the top five categories of bots interacting with your application. This information can provide insight into how to fine-tune the Bot Control rule group based on which types of bots are targeting your application the most. Search engine bots, for example, are necessary for optimization and shouldn’t be completely blocked. If you find that the percentage of allowed requests from a search engine bot is affecting application performance, you can add a rate-based rule to the Bot Control rule group to limit the number of requests from that particular bot.

Key AWS WAF logs for monitoring traffic

For a better understanding of perimeter activity, you can collect activity and audit logs for your web ACLs. When you enable logging for any web ACL, it allows AWS WAF to forward activity logs to Amazon CloudWatch. Activity logs capture information about the requests that a web ACL processes. You can also monitor web ACL audit logs, which record any configuration changes to a web ACL. Audit logging is automatically enabled for AWS resources, and you can review them in AWS CloudTrail. We’ll look at these monitoring services in more detail in Part 2 of this series.

In this section, we’ll break down the information captured in activity and audit logs, and provide a few examples of key logs that can establish a baseline for monitoring traffic.

Activity logs

Web ACL activity logs provide comprehensive information about a request, including components like headers, associated HTTP methods, and the source IP. Activity logs also provide context for how AWS WAF responded to a particular request, such as the evaluating rule group, rule, rule action, or the first 100 labels. When reviewing activity logs, looking at these key details can help you determine if a request is legitimate and if a web ACL processed it accordingly.

The following snippet is an example of an activity log for a request that matched the SQL managed rule group:

{
    "timestamp": 1234567891234,
    "formatVersion": 1,
    "webaclId": "arn:aws:wafv2:ap-southeast-1:123456789111:regional/webacl/sample-aclt/1DEMO-123456EXAMPLE",
    "terminatingRuleId": "DEMO_SQLi_XSS",
    "terminatingRuleType": "REGULAR",
    "action": "BLOCK",
    "terminatingRuleMatchDetails": [
        {
            "conditionType": "SQL_INJECTION",
            "sensitivityLevel": "HIGH",
            "location": "HEADER",
            "matchedData": [
                "10",
                "AND",
                "1"
            ]
        }
    ],
    "httpSourceName": "-",
    "httpSourceId": "-",
    "ruleGroupList": [],
    "rateBasedRuleList": [],
    "nonTerminatingMatchingRules": [],
    "httpRequest": {
        "clientIp": "1.1.1.1",
        "country": "US",
        "headers": [
            {
                "name": "Host",
                "value": "localhost:1808"
            },
            {
                "name": "User-Agent",
                "value": "curl/7.63.1"
            },
            {
                "name": "Accept",
                "value": "*/*"
            },
            {
                "name": "x-demo-test",
                "value": "10 AND 1=1"
            }
        ],
        "uri": "/sampleUri",
        "args": "",
        "httpVersion": "HTTP/1.1",
        "httpMethod": "GET",
        "requestId": "rid"
    },
    "labels": [
        {
            "name": "value"
        }
    ]
}

With this log, we can see that AWS WAF detected a SQL injection attempt from the incoming request and automatically blocked it. The labels that are logged as part of the request can offer deeper insight into the nature of a request and why it was blocked. The following log snippet shows labels applied by the Bot Control managed rule group to a request:

"labels": [
      {"name": "awswaf:managed:captcha:accepted"},
      {"name": "awswaf:managed:aws:bot-control:bot:unverified"},
      {"name": "awswaf:managed:aws:bot-control:bot:category:advertising"}]
}

In this example, AWS WAF identified a request from a bot associated with third-party advertising services. The request’s token included a valid CAPTCHA solution as well as an unexpired CAPTCHA timestamp. However, AWS WAF could not verify its identity. In this case, AWS WAF will automatically block the unverified bot, based on the rule group’s CategoryAdvertising rule action.

While activity logs provide valuable information, finding the right logs to confirm an attack or web ACL misconfiguration can be difficult in environments that generate thousands of logs at any given time. Instead of reviewing individual logs, you can query specific scenarios that give you better insight into AWS WAF performance. The following scenarios provide a baseline for monitoring:

  • Top 10 IP addresses, countries, user-agents, hosts, and web ACLs: Provides details about where the bulk of your requests come from and their main characteristics
  • Top 100 IP addresses blocked by a rate-limiting rule: Helps you determine if AWS WAF is blocking requests from legitimate users or not
  • Top terminating rules: Provides insight into the root cause of blocked requests
  • Top URIs: Helps you determine if requests are targeting a specific path
  • Top labels: Provides insight into which rules are being matched by incoming requests the most

We’ll look at how to efficiently query your logs for these scenarios in Part 2 of this series.

Audit logs

Amazon CloudTrail logs any activity associated with a user updating a web ACL and/or accessing its information. Audit logs are a rich source of data on user activity, but there are a few scenarios that you can look out for in particular when assessing the efficiency of your web ACLs. For example, reviewing logs for when a user lists resources for a particular web ACL can help you identify possible signs of reconnaissance. You can also use audit logs to keep track of any newly created or deleted web ACLs.

The following audit log snippet captures information about the admin-1 user with administrative permissions deleting a web ACL in the us-east-1 region:

{
  "eventVersion": "1.05",
  "userIdentity": {
    "type": "AssumedRole",
    "principalId": "principalId",
    "arn": "arn:aws:sts::123456789012:assumed-role/Admin/admin-webacl",
    "accountId": "123456789012",
    "accessKeyId": "accessKeyId",
    "sessionContext": {
      "sessionIssuer": {
        "type": "Role",
        "principalId": "principalId",
        "arn": "arn:aws:iam::123456789012:role/Admin",
        "accountId": "123456789012",
        "userName": "admin-1"
      },
      "webIdFederationData": {},
      "attributes": {
        "mfaAuthenticated": "false",
        "creationDate": "2024-02-06T11:11:10Z"
      }
    }
  },
  "eventTime": "2024-02-06T11:22:10Z",
  "eventSource": "wafv2.amazonaws.com",
  "eventName": "DeleteWebACL",
  "awsRegion": "us-east-1",
  "sourceIPAddress": "10.0.0.1",

[...]
}

While this activity typically represents day-to-day administrative work, it can also be associated with an attack. If you see multiple logs for a single user deleting various web ACL resources, for example, you should investigate further to make sure that a threat actor hasn’t gained access to an administrative account.

Monitor AWS WAF activity to ensure application security

AWS WAF is a critical part of your security architecture, and in this post, we looked at key metrics for monitoring its performance as well as signs of broad-scale, malicious activity. In Part 2 of this series, we’ll walk through how you can monitor this data using Amazon CloudWatch, AWS’s built-in monitoring service. Finally, in Part 3, we’ll look at how you can use Datadog to monitor AWS WAF activity and leverage its built-in WAF to complement your perimeter-based firewalls.