With the alarming number of data breaches and vulnerabilities today, security is now a primary concern for organizations and their customers, but knowing how to efficiently develop and scale secure applications is still a problem. Tackling this challenge requires considering the potential security risks of a new feature or service much earlier in the development cycle, an idea that is foundational to the Secure by Design approach. With this strategy, teams can significantly reduce both the costs of fixing vulnerabilities and the risk of introducing them to customers.
At Datadog, we understand the importance—and are familiar with the challenges—of prioritizing security as we design and build our platform. In this post, we’ll discuss the issues inherent in secure software design, share the principles we’ve adopted to inform our approach, and highlight how we use our own platform to scale security.
Challenges with building secure software
The challenges that organizations face when building secure software are similar to those that go along with fostering a DevSecOps culture. Both require a dramatic shift in how teams design and develop applications, and two of the primary hurdles that are created as a result are siloed workstreams and rapidly evolving products.
Siloed development workstreams
Environments where teams are disconnected from each other are common in software development, and their impact on application security is evident. Because there are multiple stakeholders involved in development, it’s challenging to align everyone’s understanding of what features need to be built and how security impacts those design decisions. As a result, siloed workstreams can hinder communication and knowledge sharing among security, product, and platform teams.
For example, a siloed team of engineers may not have the bandwidth or experience to consider security measures for deploying a new Kubernetes cluster. Without input from a security team, they may release it with unexpected vulnerabilities, such as exposed Kubernetes ports or API endpoints.
An additional danger with this kind of disjointed workflow is that it reduces visibility into the systems and dependencies that can create security flaws, as well as generates longer feedback loops for identifying them. The longer the delay in identifying these issues, the higher the development costs and overall mean time to remediate (MTTR). In the same way that shifting left significantly reduces the cost and impact of fixing bugs, teams need visibility into security flaws earlier in development.
Rapidly evolving applications
Addressing security flaws earlier in a development cycle is simple in theory but difficult in practice. As an application grows—either through intentionally added functionality or feature creep—so does a greater reliance on third-party dependencies. In these cases, security teams often have no visibility into their usage or input on their design, and looping them in is often seen as additional overhead. If a third-party library releases a version that creates a vulnerability in a service, they may not be equipped to handle it appropriately as a consequence—or even be aware of it.
With these challenges in mind, in addition to our own experience in application development at Datadog’s scale, our teams have found a path forward for successfully designing for security across our platform. We’ll look at how in more detail next.
Guiding principles for Secure by Design software
To recap, applications that are secure by design are more likely to be able to adequately protect their services, data, and resources from threats by default. This approach typically includes a defense-in-depth strategy, which implements multiple layers of security throughout the development cycle to minimize risk. There are a few principles that we follow in order to accomplish this:
- A decentralized model for workstreams
- A scalable system for sharing knowledge
- A customer-centric design for software
A decentralized model for workstreams
One of the challenges we previously highlighted is siloed workstreams, which results in a lack of shared context and adequate communication among teams. To address this, we have made significant improvements in these areas by adopting a decentralized model for security, which is broken down by tools, processes, and people.
As seen in the preceding diagram, we use our own platform extensively to understand a service’s overall security posture. For example, we use the Service Catalog and its custom scorecard to identify all of the relevant service ownership and configuration metadata and context for a service along with its dependencies. Our code analysis product generates static analysis and software composition analysis scans, which we use to identify vulnerabilities in the build phase of the development cycle. We also rely on Cloud Security Management to identify and remediate misconfigurations in our deployed cloud resources.
When it comes to processes, dogfooding our platform enables us to continually refine how we use its features. It also enables us to strengthen the shared responsibility between security and engineering teams. In practice, this is driven by our Security Champions program. The program is made up of engineers from both product and platform engineering teams. They work together with our application security team to conduct security design reviews, which consists of identifying risks in our platform, understanding their scope and severity, and taking the lead on implementing fixes.
A scalable system for sharing knowledge
In order for this decentralized model to work, we needed to find a way to efficiently share information across all involved teams. One of the biggest challenges with implementing security by design is sharing knowledge across teams. For our organization, we wanted to build a system that was easily accessible for our engineers, amenable to asking questions, and supported cross-team collaboration. In practice, we encourage our engineers to reach out to our application security team for a security design review or for input or answers to questions at any stage during development. Establishing this channel of communication has helped us cultivate and maintain a shared and updated context for security concerns, including the latest vulnerabilities and threats. Security design reviews have helped us proactively identify and address vulnerabilities in applications and services as well as foster collaboration between security and engineering teams.
We have also created training and educational resources to help engineers understand the “why” behind security practices, such as why a certain approach is preferred over another. This starting goal morphed into our Security Champions program, which now delivers customized training around security topics like threat modeling, threat detection, an overview of our security products, and more.
Creating effective communication and shared context among our teams allows us to implement practical security measures more efficiently. We’ll look at two primary examples of this next.
A customer-centric and secure design for software
In many cases, security measures are implemented at the expense of usability. Therefore, knowing where and how to implement security measures in a way that doesn’t disrupt users can be challenging, especially as applications scale. At Datadog, a core element of expanding our platform is balancing usability with security as we scale. To accomplish this, we use tools like webinars, support calls, and beta testing to research our customers’ needs and how they interact with our platform. The data from these tools allow us to better understand the purpose and business logic of new features, which informs how we design and scale our platform.
An example outcome of this work is our platform safety measures, which include features like security contacts, a dedicated security center for alerts, and token safety. Together, these measures show how we’ve baked security into our customers’ day-to-day workflows without sacrificing the platform’s usability.
Having the ability to efficiently develop these features requires standardizing how we secure the underlying services that support our platform, which we’ll look at next.
Build a pipeline for effective security design reviews
During security design reviews, security engineers often attempt to identify any and all risks and design flaws based on a new feature’s proposed design specifications. But according to our State of Application Security report, only three percent of critical vulnerabilities are worth prioritizing when you apply runtime context to adjust their level of severity. These factors can make security design reviews challenging and inefficient.
Adopting the guiding principles we previously discussed has enabled us to easily scale our own security design reviews and make them more effective at identifying the most critical risks. To this end, we’ve developed a high-level pipeline that includes every stage of these reviews, with each stage building upon data from the previous one. These stages are Threat Modeling, Secure Code Reviews, and Dynamic Testing.
We have seen a significant return on investment by following these stages in our security design reviews. They are typically done in one of the following scenarios:
- Launching a new product or feature
- Releasing a major update to an existing product or feature
- Publishing a minor update to a security-sensitive feature
- Conducting periodic reviews of all products
Threat modeling
Effective threat modeling is only possible when a system’s design, implementation, and data flows are well-documented. Having this documentation in place, in addition to implementing strong feedback mechanisms like code analysis scans, helps better correlate and fine-tune recommendations during the threat modeling stage. The following screenshot illustrates how the Service Catalog helps us easily visualize the connections between each service, flagged security threats, errors in code, and more:
With this information, we can fully understand the scope of the system during risk assessments. This enables us to develop new features that are secure by default, or resilient to risks out of the box, and do not require the customer to take any additional security measures. In practice, to support our platform’s immense scale, our internal platform engineering teams embed security defaults into the systems and services that we use across the organization.
For example, our Go Secure SDK provides the following security defaults as part of its libraries:
- Robust cryptographic hash functions and cryptographic keys management functions
- Hardened I/O and write operations
- SSRF-safe HTTP client implementations and strengthened TLS dialer functions
These measures make it easier for product teams to build new features on top of secure infrastructure.
Secure code review
Manually reviewing and keeping track of all code vulnerabilities is impossible given the volume of an application’s ephemeral services. These vulnerabilities not only surface from third-party, open-source libraries but also from custom code that’s developed in-house. That’s why having clear visibility into a service’s open vulnerabilities and their source, severity, and status are crucial for prioritization and remediation during the code review. The following screenshot shows how Datadog ASM provides the necessary context for conducting code reviews.
If we find more vulnerabilities originating in custom code, this may indicate that engineers need additional training. On the other hand, if more vulnerabilities are coming from third-party libraries, then teams may need to reevaluate their usage.
Dynamic testing
Once our applications are live, we use Datadog ASM’s attack flow to close the loop between secure design, development, and deployment. The following visualization shows how a particular attack propagated across downstream services or databases, enabling us to easily pinpoint where the vulnerability originated.
Using our own platform has benefited our security, platform, and product teams in several ways. First, it’s allowed us to cultivate a DevSecOps culture, which has in turn improved overall developer experience and productivity—from establishing feedback loops to increasing the foundational knowledge of our systems. As a result, we can proactively discover and mitigate the most critical vulnerabilities in our applications and services while they are being developed in order to reduce business risk.
Adopt Secure by Design principles with Datadog
In this post, we looked at the challenges of building applications that are secure by design. We also looked at the guiding principles that inform our security and engineering teams’ processes and how they’ve enabled us to scale our security design reviews. Check out our documentation to learn more about our security products. If you don’t already have a Datadog account, you can sign up for a free 14-day trial.