I Use GitHub Ac­tions for Data­dog's Service Catalog, and You Should, Too | Datadog

I use GitHub Ac­tions for Data­dog's Service Catalog, and you should, too

Author Mike Stemle
Principal Architect, Arc XP

Published: July 14, 2023

Today’s guest blog is by Mike Stem­le, a software engineer and Principal Architect for the Ar­c XP di­vi­sion of the Wash­ing­ton Post. In his role, Mike focuses on AppSec and large-scale architecture.

Any­body who works with me knows that I love the Data­dog Service Catalog. Ser­vice Cat­a­log is great for our teams because it puts ser­vice own­er­ship and pro­duc­tion support in­for­ma­tion right next to the met­rics we use to de­tect­ out­ages and per­for­mance is­sues. Having all that information all in one place helps shift pro­duction sup­port left, allowing us to proactively detect, mitigate, and resolve issues before they even make it to production—much in the same way we have seen with qual­i­ty and secu­ri­ty in the last 15 years.

But to get this full ben­e­fit from Service Catalog, you need to flesh out your organization’s service definitions. During incidents, it’s the definitions in Service Catalog that give you the most useful information, such as which team a service be­longs to, where its doc­u­men­ta­tion lives, and what repos­i­to­ries make up each ser­vice’s code.

service-catalog-1.png

So how do you update your service definitions with all this information? You have a number of options. You can use the Git­Hub in­te­gra­tion to feed metadata to Datadog via your GitHub repository. You can also manage service definitions through Terraform. Or you can man­u­al­ly sup­ply definition de­tails to the Service Definition API by using any HTTP client, such as the Node Fetch API.

service-catalog-2.png

For me, though, the best way to configure your service definitions is to use GitHub Actions to send your information to Datadog and Service Catalog. This method uses an open source cus­tom action I authored, called the Data­dog Service Cat­a­log Meta­da­ta Provider (DSCMP), which is available in the GitHub Marketplace. What’s great about the DSCMP is that it al­lows you to sup­ply service de­tails to Data­dog with­out giving any third parties spe­cial ac­cess or permissions to your GitHub organization, and with­out need­ing to add an­other Git­Hub in­te­gra­tion to mon­i­tor (did I men­tion I was in­volved in App­Sec?). Us­ing this custom action is easy, and as a bonus, it can help you pro­mote best prac­tices in your or­ga­ni­za­tion through tech­ni­cal con­trols.

For the rest of this article, I will cover how to use Git­Hub Ac­tions to­geth­er with the DSCMP, but first, the oblig­atory dis­claimers: This ar­ti­cle rep­re­sents my own ex­per­tise and lived ex­pe­ri­ence. In no way should this ar­ti­cle be con­strued as con­vey­ing thoughts or opin­ions held by my em­ploy­er—The Wash­ing­ton Post—or any of its re­lat­ed en­ti­ties. Nor should this ar­ti­cle be seen as speak­ing for Datadog, Git­Hub, or any of their re­lat­ed en­ti­ties.

Git­Hub Ac­tions: automated workflows, actions, and triggers

Git­Hub Ac­tions is sometimes assumed to be a CI/CD platform only, but it has other uses. It can be used to automate virtually any set of tasks whenever a specified event occurs (such as a code push) in your GitHub repository. For example, I’ve also used Git­Hub Ac­tions for code scan­ning with Cod­e­QL, and for au­to­mat­ed doc­u­men­ta­tion gen­er­a­tion and pub­li­ca­tion with Git­Hub Pages, Jekyll, and JS­Doc. Sending service metadata to Datadog on a code push, therefore, is perfectly viable.

In GitHub Actions, the tasks you want to automate are defined in YAML files called workflows, saved in the .github/workflows directory in your repository. These workflows start with a trig­ger and include steps that point to predefined actions—such as the DSCMP. With the DSCMP, the code is already written for you, but you can also author your own custom actions to incorporate into your workflows or make available to others. If you’re interested in learning the basics about how to automate workflows with GitHub Actions, there are many ar­ti­cles and tutorials available (such as this one, this one, this one, or this one).

Note: Be aware that trig­gers can be a foot­gun, meaning that there can be ma­jor side-ef­fects of get­ting your trig­gers wrong. For example, overly noisy triggers result in an unnecessary load on the Service Catalog APIs. When your job runs too of­ten or takes too long, it also results in wasted Actions minutes. And when you have trig­gers that ac­tivate in dif­fer­ent work­ing ver­sions of code, it can result in flapping (frequent changes) in your Service Catalog definitions.

Gi­t­Hub Ac­tions se­crets

Another thing that’s important to remember, if you want to use the DSCMP as your meta­da­ta provider, is that you’ll have to create an API and application key in Data­dog. You’ll also want to store those keys as en­crypt­ed se­crets in Git­Hub Ac­tions. Doing so will al­low the cus­tom ac­tion to trans­mit your in­for­ma­tion to Data­dog when the ac­tion is run.

Since I’m an App­Sec-type per­son, I can’t help my­self but to give you ad­vice on se­crets while we’re on this topic:

  1. Nev­er com­mit any file to any git repos­i­to­ry with a se­cret or introduce secrets into your workflows or code files.
  2. Use GitHub Secrets management for your secrets. Here’s Git­Hub’s doc­u­men­ta­tion on Se­crets in Ac­tions. You’ll want to read this and understand it well.

(For more information about secrets in GitHub Actions, you can read the documentation here.)

Warning: As with any oth­er com­pute en­vi­ron­ment, it is pos­si­ble for some­one to de­ploy a ma­li­cious Git­Hub Cus­tom Ac­tion. To pro­tect se­crets and other sensitive information in your GitHub or­ga­ni­za­tion, I rec­om­mend us­ing only custom ac­tions found in the Git­Hub Mar­ket­place. Git­Hub has the abil­i­ty to mod­er­ate those ac­tions in the event that any­body de­ploys a ma­li­cious ac­tion.

Now that we’ve got­ten these basics out of the way, let’s get nerdy.

Using the cus­tom ac­tion

Because GitHub can already access the code for DSCMP, to use it, you simply need to point to it in your workflow YAML file with the following key-value pairs:

  • name: Datadog Service Catalog Metadata Provider

  • uses: arcxp/datadog-service-catalog-metadata-provider@v1

Note: Although we are using v1 of the DSCMP in this demo, you can find the reference to the latest version here.

The workflow file itself can be very simple. Don’t believe me? Check this one out:

---
name: Datadog Service Catalog Metadata Provider 

on:
  push: 
    branches: 
      - main
  workflow_dispatch:

jobs: 
  deploy: 
    runs-on: ubuntu-latest 
    steps: 
      - uses: actions/checkout@v2 
      - uses: arcxp/datadog-service-catalog-metadata-provider@v1  
        with: 
          datadog-hostname: api.us5.datadoghq.com 
          datadog-key: ${{ secrets.DATADOG_KEY }} 
          datadog-app-key: ${{ secrets.DATADOG_APP_KEY }}
          service-name: my-service
          team: my-team 
          email: my-team@sirius-cybernetics-corporation.com 

This is a ful­ly func­tion­al work­flow that will tell Data­dog in­for­ma­tion about your service. It is triggered by a push to the main branch, or manually by clicking on the “Run workflow” button on the repository’s Actions tab (thanks to the workflow_dispatch key). Once triggered, this workflow will tell Data­dog that your ser­vice is called my-ser­vice and that it is main­tained by a team called my-team. That team can be reached at my-team@sirius-cy­ber­net­ics-cor­po­ra­tion.­com if some­thing goes wrong. Having a team name and email address associated with a service might not seem like much information, but it is highly useful for support teams during incidents. You can also see that this work­flow is in­struct­ing the ac­tion to use the us5 host for Data­dog (information required by the Datadog API), and it pro­vides an API key as well as an ap­pli­ca­tion key.

But you might also want to add other useful information to this YAML file, such as a Jira board URL, a run­book doc­u­ment, and a Pager­Du­ty in­te­gra­tion URL. That’s not hard to do. Just look at this example:

---
name: Datadog Service Catalog Metadata Provider

on:
  push: 
    branches: 
      - main 
  workflow_dispatch: 

jobs: 
  deploy: 
    runs-on: ubuntu-latest 
    steps: 
      - uses: actions/checkout@v2 
      - uses: arcxp/datadog-service-catalog-metadata-provider@v1
        with:
          datadog-hostname: api.us5.datadoghq.com 
          datadog-key: ${{ secrets.DATADOG_KEY }} 
          datadog-app-key: ${{ secrets.DATADOG_APP_KEY }}
          service-name: my-service 
          team: my-team 
          email: my-team@sirius-cybernetics-corporation.com
        # Adding Jira, Runbook, and PagerDuty
        docs: | 
          - name: Jira board 
            url: https://somelegitorg.validatlassiancloudurl.com/ blah/blah 
            provider: jira 
        links: | 
          - name: Downtime Runbook 
            url: https://totally-normal-url.com/runbooks/downtime  
            type: runbook 
        integrations: | 
          pagerduty: https://valid-pagerduty-url-here.com 

It’s that easy. By us­ing this Git­Hub Ac­tion, teams can now push their in­for­ma­tion to the Data­dog Ser­vice Cat­a­log with­out need­ing to ap­prove any inte­gra­tions. You can find the full schema for this ac­tion here.

To be clear, Data­dog does already give you all the tools you need to use the Ser­vice Cat­alog prod­uct. Datadog already has a Git­Hub integration that can grab a sim­ple YAML file that has this info. Datadog also already has an API that lets you send this info to Service Catalog with custom au­tomation. Like many of you, though, I have con­straints I’m op­er­at­ing under, which means I have had to in­no­vate! If you’re using GitOps for deployments, for example, more integrations and webhooks can be concerning. By us­ing this Git­Hub Ac­tion, folks can now push their in­for­ma­tion to the Data­dog Ser­vice Cat­a­log with­out need­ing to ap­prove any inte­gra­tions. The custom action also allows you to have full control and visibility over this process, including full control over when this information is sent to Datadog.

Use of the pipe character in YAML

Those who are familiar with YAML syntax may have no­ticed that within the workflow, nodes for docs, links, and in­te­gra­tions look a little unusual: They have a mul­ti-line string pipe char­ac­ter (“|”) af­ter the colon. This is be­cause in Git­Hub Ac­tions, all of the in­puts are con­strained to be­ing scalar. In or­der to work around this lim­i­ta­tion, the cus­tom ac­tion takes these values in as a mul­ti-line string and then pars­es them as YAML. It’s a lit­tle weird, but it gets the job done.

Organizational controls

A major advantage of the DSCMP is that it lets you supply service metadata to Datadog without opening your repository to any third parties. However, another benefit of using this custom action is that it provides light-touch tech­ni­cal con­trols to en­force policies that define which metadata must be included in your organization’s service definitions. For example, with the help of the DSCMP, you can set controls such as “every­body must have a tag called di­vi­sion that match­es one of the di­vi­sion names with­in the or­gani­za­tion,” or “every ser­vice run­ning in pro­duc­tion must have a run­book link.”

To set and enforce these organizational controls, you can include them in a file called the Organization Rules file (or Org Rules file), specifically named service-catalog-rules.yml. This file is saved to the directory your-organization-name/.github/. The DSCMP by default will look for the Org Rules in this location and enforce the rules defined there.

To support this function, I made a new YAML schema that has three pri­ma­ry com­po­nents: rules, se­lec­tion cri­te­ria, and re­quire­ments. Rules con­tain a name, a list of se­lec­tion cri­te­ria, and a list of re­quire­ments. The selec­tion cri­te­ria are the var­i­ous fields which make a giv­en rule ap­ply to a repos­i­to­ry’s use of the DSCMP. Re­quire­ments are the con­straints that a repos­i­to­ry’s DSCMP work­flow must ad­here to if the se­lec­tion cri­te­ria ap­ply to the repos­i­to­ry.

Before we move on to some examples of how to enforce organizational controls, I want to mention a few “universal truths” about policy enforcement:

  1. There are always going to be exceptions
  2. If the controls aren’t centrally located and auditable, they can sometimes mutate into a gap rather than a control. (To be clear, this problem can happen in all cases, but centrally locating and auditing them is helpful in preventing mutations.)
  3. In order for organizational controls to be helpful:
    • The control must be visible.
    • Version control is a must.
    • Exceptions must be supportable.
    • In the event of non-compliance, the error message must be clear about what needs to change.

Finally, remember that us­ing an org rules file is op­tion­al. If a work­flow can­not find such a file, it will al­low the DSCMP work­flow to ex­e­cute with­out re­stric­tions.

Now, let’s dig into some examples.

Ex­am­ple 1: Requiring a “di­vi­sion” tag

The following Org Rules file, stored in sir­ius-cy­ber­net­ics-cor­po­ra­tion/.github/ser­vice-cat­alog-rules.yml, en­forces a tagging re­quire­ment for all DSCMP work­flows with­in a Git­Hub or­ga­ni­za­tion named Sirius Cybernetics Corporation:

ser­vice-cat­alog-rules.yml

---
org: sirius-cybernetics-corporation

rules: 
  - name: Division Tag Requirement 
    selection: all
    requirements: 
      tags: 
        division: ANY

When this Org Rule file is in place, if you try to run the first DSCMP work­flow above—which doesn’t include a division tag—you will see an er­ror reporting that it failed to sat­isfy the “Di­vi­sion Tag Re­quire­ment.” To meet this re­quire­ment in the new rule, then, you would need to mod­i­fy the DSCMP work­flow to include tags and division keys with a value, as follows:

ser­vice-cat­alog-rules.yml

--- 
name: Datadog Service Catalog Metadata Provider

on:
  push: 
    branches: 
      - main

  workflow_dispatch:

jobs: 
  deploy: 
    runs-on: ubuntu-latest 
    steps: 
      - uses: actions/checkout@v2 
      - uses: arcxp/datadog-service-catalog-metadata-provider@v1  
        with: 
          datadog-hostname: api.us5.datadoghq.com 
          datadog-key: ${{ secrets.DATADOG_KEY }} 
          datadog-app-key: ${{ secrets.DATADOG_APP_KEY }}
          service-name: my-service
          team: my-team 
          email: my-team@sirius-cybernetics-corporation.com 
          tags: |
            - division:complaints 

Now we have iden­ti­fied this ser­vice as be­long­ing to the Com­plaints di­vi­sion of the Sir­ius Cy­ber­net­ics Cor­po­ra­tion.

Ex­am­ple 2: Con­strain­ing the “di­vi­sion” tag

Let’s say that we get the Org Rules file up and run­ning, and folks start­ adding their di­vi­sion tag. The fol­low­ing week, how­ev­er, some­one makes a ty­po in the workflow file and specifies the “com­pli­ants” di­vi­sion—which is not a real di­vi­sion! To prevent this from happening again, we can it­er­ate on the original Org Rules file to limit which val­ues people can put into that di­vi­sion tag, as follows:

ser­vice-cat­alog-rules.yml

--- 
org: sirius-cybernetics-corporation

rules: 
  - name: Division Tag Requirement 
    selection: all
    requirements: 
      tags:
        division: 
          - complaints 
          - marketing 

With this new version, we have restricted the di­vi­sion tag values to only complaints and marketing. Any oth­er val­ue will fail to sat­is­fy this rule.

Ex­am­ple 3: Selective requirements within an organization

Let’s say the time comes for the ser­vices in the mar­ket­ing di­vi­sion to require an is­sue track­er. We can configure the Org Rules file as follows to include a second rule requiring a Jira board for only the mar­ket­ing di­vi­sion:

ser­vice-cat­alog-rules.yml

---
org: sirius-cybernetics-corporation

rules: 
  - name: Division Tag Requirement 
    selection: all
    requirements: 
      tags:
        division: 
          - complaints 
          - marketing
  - name: Marketing Jira Board Requirement
    selection:
        tags:
          division: marketing
    requirements: 
      docs: 
        provider: jira

There, now we have two rules: one for re­quir­ing a di­vi­sion tag, and one which will require all DSCMP work­flows with a specified di­vi­sion of mar­ket­ing to also have at least one docs en­try with a provider val­ue of jira. Keep in mind the following guidelines and restrictions regarding organizational controls:

  • Low­ercase is pre­ferred. For any of the fields out­side of name, lowercase values are like­ly to work more consistently as expected.
  • The Org Rules file must ex­ist with­in the same org as the repository that hosts the DSCMP work­flows.
  • YAML wants you to use spa­ces around colons, but Datadog uses tags with key:val­ue syn­tax. If you notice this inconsistency, don’t worry: spaces are automatically handled by the DSCMP.

Wrap-up

I’ll con­clude by restat­ing what I think is the val­ue-add of this custom GitHub Actions I’ve been talk­ing about: Data­dog’s Ser­vice Cat­a­log of­fer­ing has the po­ten­tial to sub­stan­tial­ly im­prove qual­i­ty of life dur­ing pro­duc­tion sup­port. If a service goes down, this product can help them bring it back up more quickly.

But for your teams to get the most out of this Datadog product, you need to find a way to supply metadata about your organization’s services to Service Catalog. I’ve authored the open source GitHub Actions module Datadog Service Catalog Metadata Provider for this very purpose. DSCMP makes life eas­i­er by fa­cil­i­tat­ing an easy set­up that can be owned by the same en­gi­neers who stand to ben­e­fit from the qual­i­ty-of-life im­prove­ments in­troduced by Service Catalog. It also helps ensure that or­ga­ni­za­tions can es­tablish and en­force, with tech­ni­cal con­trols, in­ter­nal poli­cies that for­mal­ize op­er­a­tional and sup­port ex­pec­ta­tions. It does all this within the spa­ces that de­vel­op­ers al­ready use, with tools they al­ready know, and with­out ex­pos­ing their Git­Hub or­ga­ni­za­tion to any ad­di­tion­al in­te­gra­tions.

And yes, as the title of this blog post says, I’m already us­ing the DSCMP, as are sev­er­al teams I work with. I hope you find it as use­ful as we do.