Forcing HTTPS for Git actions with Terraform via Atlantis

While working at a previous employer in late 2017, we chose to adopt Terraform as our Infrastructure as Code toolchain in response to several challenges facing our engineering teams. The two largest of which were enforcing standards across the organisation for newly delivered infrastructure, and secondly, reducing the overall time to provision and deploy infrastructure.

Prior to implementing Terraform, builds were configured and deployed via the console along with a combination of some in-house scripts to handle truly repetitive tasks. This approach worked, but wasn't particularly future proof nor was it flexible enough for our rapidly expanding needs. The choice to implement Terraform and thanks to some in-house modules saw us successfully tackle our two most significant challenges. However, it was to bring with it in time its own challenges.

A new challenger appears!

As time passed and Terraform usage grew within the organisation, we started to run into what I am going to call the “human” factor; namely, engineers making changes locally, running Terraform from their workstations, and forgetting to commit/push their changes. As a result, other engineers were often forced to reverse engineer the changes based upon output from Terraform before implementing their intended changes. Compromising our efficiency, productivity and causing operator frustration.

After being personally impacted by such a factor myself, I set out with the goal to improve our work methodology and processes further. Whilst our work with Terraform had seen benefit, it was becoming apparent there was an element still missing. I spent a few days researching and experimenting with a variety of solutions before coming across Atlantis, which is designed specifically for use with Terraform.

What is Atlantis

Atlantis is an open-source application for automating Terraform via pull requests (PR). It integrates with GitHub, GitLab and BitBucket via webhooks. When a PR is opened on a repo configured to send events to Atlantis, Atlantis will automatically run a terraform plan and put the output in a comment on the PR. This comment can then be reviewed by another engineer, who can review it alongside the code from the PR itself. If they're happy with both the code and the plan output, they can go ahead and approve it. After approval, they or the submitter can tell Atlantis to run a terraform apply by commenting atlantis apply in the PR. Atlantis will then run the apply and put the apply output into another comment on the PR. Additionally, in our situation, if the apply was successful and there are no other pending items, it will automatically merge this back into our main branch (requires a config option).

The problem

By default, Terraform run by Atlantis does not have access to private repositories (i.e. repositories requiring authentication). Since we use private repositories for some of our modules, this was an immediate show-stopper. The usual recommendation/solution from the Atlantis community at the time was to build an image and bake the needed secrets in or add support for extra image arguments to specify ways to fetch secrets (e.g. download an SSH key from an external source.)

Neither of these seemed like an incredibly acceptable option from a security standpoint, nor did they meet my desire of not having to maintain our own fork/container unnecessarily. Where possible, I prefer to use upstream and contribute back to the project. Maintaining your own fork means that you still need to monitor upstream and merge in any appropriate security fixes and or features that you need, resulting in the consumption of time and engineering resources better spent elsewhere.

Initial Solution

Atlantis requires by default that it be able to authenticate to your Git platform to be able to comment on PRs and clone the project repositories. Best practice dictates the usage of a unique user so that its access can be appropriately scoped. Then all you need is the username and OAuth token or equivalent for it to be able to authenticate with your respective Git platform.

There was already a Github issue about this particular problem, where people had discussed various solutions. One in particular that caught my attention was the usage of gitcredentials, in particular in combination with git-credential-store. This allows you to write out a .git-credentials file and tell Git to use it to authenticate with the respective Git provider for repositories with HTTPS remotes.

With a solution identified, it was time to test it and hopefully, get it working. My initial implementation was to add the functionality into the Atlantis Docker image. I accordingly submitted a PR, hoping to get it merged. The maintainer came back to me asking that this change be made optional, a reasonable request. After further thought, it was suggested that this kind of thing should ideally be implemented directly into Atlantis itself, to be agnostic to the runtime environment (Docker or otherwise).

Now comes the first real challenge: Atlantis is written in Golang. I'd never written anything in Golang, and my programming skills were and still are relatively marginal in comparison to some. But I nonetheless gave it a shot, and to my surprise got it working and merged after only a few days of work. Overall I found it simple to implement after being pointed at some other examples in the codebase.

The remaining problem

This initial solution was excellent. However, it only worked for modules that had an HTTPS source defined, anything that wasn't configured to be explicitly an HTTPS source would result in failure.

# This works
module "example" {
source = "github.com/example/private-module.git?ref=v1.0.0"
}

# This doesn't work
module "example" {
source = "git@github.com:example/private-module.git?ref=v1.0.0"
}

For some, this isn't a significant issue. However, in our case, our workflow and security policies meant we wanted to stick to SSH remotes on staff workstations, whilst simultaneously using HTTPS remotes from Atlantis itself. Could we find a way to have these use cases coexist without ever needing to touch the module URLs?

The Tweaked Solution

The tweak I ended up settling on was a git config directive to have SSH remotes be implicitly rewritten as HTTPS remotes internally to git.

As an example, we want git@github.com:example/private-module.git?ref=v1.0.0 to become https://github.com/example/private-module.git?ref=v1.0.0, and then Git should use the credentials it already has to authenticate with GitHub over HTTPS.

This config below should have worked just fine:

[ url "https://github.com"]
        insteadOf = git@github.com:

Alas, that'd be too easy, wouldn't it? It worked on my laptop when testing with manual Git commands, which I then naively assumed would work fine with Terraform. Spoiler alert, it didn't.

Once I started testing directly with Terraform itself, I discovered Terraform doesn't call out to Git directly; instead, it uses another library called hashicorp/go-getter which does some extra magic to source URLS when it realises that they're Git URLs. git@github.com: becomes ssh://git@github.com for example, which then gets provided to Git for it to clone the module. This means the remote ends up being ssh://git@github.com/example/private-module.git.

The config I should need is thus:

[ url "https://github.com"]
        insteadOf = ssh://git@github.com

This solution ended up working for GitHub and GitLab, but not BitBucket. BitBucket needs the username in the URL for it to work correctly. Thankfully, GitHub and GitLab are both happy to take the username in the URL also, so I didn't need to worry about doing any particular logic for BitBucket.

The final config was thus:

[ url "https://user@github.com"]
        insteadOf = ssh://git@github.com

Now that I had all the pieces of the puzzle, putting it together into Atlantis was pretty straight forward. I created another PR with the necessary changes, that was eventually merged in.

Future improvements

This has primarily been a retrospective about work I did 12-18 months ago, but I'd like to do further improvements when and if I have the time. For example, currently, we write the credentials to disk and leave them there permanently, which does pose a slight security risk.

To avoid such a risk, we could use git-credential-cache just before the init happens and clear the cached credentials before the plan stage. This way, the credentials are only in memory for a short time and never actually touch the disk. We'd also need to use a credential helper to provide the secret to Git when it asks for it the first time before caching it. A problem I potentially see with this is concurrency; e.g. if Atlantis is doing an Terraform init on multiple projects at a time, one of which only takes a short amount of time and another takes longer but breaks when the cache is cleared by the first one. However, we could avoid this by ensuring we don't use this strategy globally and do it on a per repo basis only.