- root
-- /envs
--- / dev.tfvars
--- / prod.tfvars
- main.tf
When it gets deployed by the CICD, the right tfvars file is passed in via the -var-file parameter. A standard `env` var is also passed in, and used as a basis for a naming convention. Backend is also set by the pipeline.The rationale here is that our environments should be almost the same between them, and any variations should be accomplished by parameterization.
Modules are kept either in separate repos, if they need to be shared between many workspaces, or under the `modules` subfolder.
- root
-- /envs
--- / .dev.env
--- / .test.env
--- / .prod.env
--- / dev.tfvars
--- / test.tfvars
--- / prod.tfvars
- 1_create_network.tf
- 2_create_storage.tf
- 3_create_service.tf
https://www.youtube.com/watch?v=WgPQ-nm_ers Compliance At Scale: Hardened Terraform Modules at Morgan StanleyThen sure sprinkle a few per-env-toggles `count: 1 if env == dev else 0` (or whatever the latest nicest way to do that is today), but it feels weird to have to set up an artificial TF module around our entire codebase to be able to share the bulk of the code?
All of them either don't address or use multiple directories to handle multiple environments (and assume a static list). What?! No, use terraform workspaces, or a single module that you instantiate for each environment. Or terragrunt if you really want. TFA is just a blogspam mess from either AI or someone that's spent about 20 minutes with a Youtube video to learn terraform & push out their latest deep dive content guys.
If you read my first post related to this, I was giving myself a refresher to understand different dynamics that people think about.
I did not watch one YouTube video or spend 20 minutes on this or create with GPT.
The original source of inspiration came from me wanting to understand the examples our Eng team put together on how our config file correlates to what customers are actually using to find any gaps.
https://docs.resourcely.io/concepts/other-features-and-setti...
This is also a part 1 of the article and I clearly asked what was missing.
Storing state in S3 or TFC or Spacelift or somewhere else is out of scope. S3 is where 90% of the world stores their state and writing those configuration lines is not in scope. You can find other resources on that.
I struggled to find an exhaustive list of how people manage their directory structures and hence the focus of this piece.
If you’d like to provide constructive feedback and avoid comments regarding scope creep, please share.
It just struck me as a collection of ways you could split your configuration where none of them is something I would suggest. You could split your frontend/backend or services I guess sure, but then why are you assuming it's in the same repo, and why does that matter anyway, it has nothing to do with Terraform.
Why have dev & prod as different projects, assuming they're supposed to look the same? Use workspaces or different state or terra grunt. Ditto regions; use provider aliases.
If you say Well I'm not assuming they are the same, this is about structuring directories on the basis that they are already isolated Terraform configurations... Ok sure, but why does how you do that matter and what does it have to do with Terraform? It's the same as talking about structuring the directories for their application code isn't it?
If the state isn't local, then if you want that split in order to apply them separately, there would be no reason to physically separate & duplicate them into different directories.
Composable modules such as `terraform-aws-lambda/modules/standard-function`, `terraform-aws-iam/moduls/role-for-aws-lambda`, etc, which get composed for a specific usecase in a root module (which we call stacks). The stack has directories under it such as `dev/main/primary/` `dev/sandbox-a/primary/` `dev/sandbox-a/test-a/`, etc, where `dev` is environment, `main/sandbox-a` is tenant and the `primary/test-a` is the namespace. The namespaces contain a `tfvars` file and potentially some namespace specific assets, readme's, documentation, etc. The CD system then deploys the root module for each namespace present.
Stacks are then optionally (sometimes deeply) nested under parent directories, which are used for change control purposes, variable inference and consistency testing.
OpenTofu >1.8.0 is required for all of this to keep it nice and tidy.
Outside of being able to use variables in very niche places that you can't in terraform (and can easily work around, and that last I heard is on the road map for open tofu), what does terragrunt do that using regular module imports in terraform don't?
This may be anecdotal but every terragrunt repository I've ever seen was a mess of spaghetti trying too hard to stay DRY.
We get this question a lot because Terragrunt did in fact start as a feature shim for Terraform. But that was a long time ago and Terragrunt has evolved into a first-class "orchestration" tool for Terraform or OpenTofu.
We wrote a blog post addressing exactly your concern that you might find helpful: https://blog.gruntwork.io/terragrunt-opentofu-better-togethe....
2) I use terragrunt to provide inherited values, such as region, environment, etc. I have a directory tree of `dev/us-west-2/` I can set a variable in my `dev/environment.hcl` that is inherited across everything under that environment. This is useful if you have more than one dev, prod, etc environment.
3) I use terragrunt to allow shared, versioned root modules. I don't include any terraform in my terragrunt repo. The terragrunt repo is just configuration. In my terragrunt.hcl I can reference something like `source = "github.com/example.com/root.git//ipv6-vpc?ref=v0.1.3"` to pull in the version of the root module I want. Again this is useful if you have multiple dev/stage/prod environments.
None of this is actually possible with plain terraform.
With OpenTofu as of the latest release you can already use constant variables in places like module sources and versions, backend configurations, and even in order to for_each on providers!
Disclaimer: involved in OpenTofu
This gave myself a refresher on how they are organizing their cloud infrastructure within their source control systems. I took a lense from the world of terraform since that’s mostly the world i live in today and the last few years.
I explored 10 different ways to structure your Terraform config roots, each promising scalability but delivering varying degrees of chaos. From single-environment simplicity to multi-cloud madness, customers are stuck navigating spaghetti directories and state file hell.
I probably missed things. Might have gotten things wrong. Take a look and let me know what you think.
What patterns are you using that I missed?
This is split over hundreds of microservice repositories, each of which maintains its own Terraform.
We don't read state from other Terraform deployments, and use published reusable modules when convenient and a tfvars file for every deployment.
At this point I can't imagine doing Terraform any other way.
It'd be nice to show the other dimension of the git branching strategies to apply. Github flow/feature-branches vs per-env branches of main vs git flow. How and when to apply changes in different environments - before vs after PRs, etc.
TFC uses workspaces, which annoyingly aren't the same thing as terraform workspaces. I've divided up our workspaces into dev, qa, staging, and prod, and each group of workspaces has the OIDC setup to allow management of a specific cloud account. So dev workspaces can only access the dev account, etc etc. Each grouping of workspaces also has a specific role that can access them. Each role then has its own API key.
The issues I've run into are mostly management of workspace variables. So now I have a manager repo and matching workspace that controls all the vars for the couple hundred TFC workspaces. I use a TFC group API key for the terraform enterprise provider, one provider per group. This prevents potential mistakes where dev vars could get written to qa, etc etc.
Manager repo
- dev TFE provider
- qa TFE provider
- staging TFE provider
- prod TFE provider
Workspace variables are set by a single directory of terraform, so there's good sharing of the data and locals blocks.I use lists of workspaces categorized by "pipeline deployers" and "application resource deployers", along with lists of dev, qa, staging, and prod workspaces. I then use terraform's "setintersection" function to give me "dev pipeline" workspaces, "prod app" workspaces, etc. I also do the same with groups of variables, as there's some that are specific to pipeline workspaces, and so on. It works well, and it's nice to have an almost 100% terraform control of vars and workspaces.
I split app and pipeline workspaces based on historical decisions, I'm not sure if I'd replicate that on a new project. The workflow there is that an app workspace creates the resources for a given deployment, then saves pertinent details to a couple of parameters. The pipeline workspace then pulls those parameters and uses them to create a pipeline that builds and deploys the code.
Unfortunately I can't share code from this particular setup, but I do intend to write about it "someday".
> Multi-Environment Setup with Shared Modules
But the con of saying versioning is tricky across modules is damn near impossible to reliably manage.. especially because if I'm introducing a new variable to a shared module A) I need to also add this variable in the inputs of each of the environment.
I haven't found a way to manage multiple versions of the modules across environments if all using the same shared modules. Is it even possible?
Define a default which is backwards compatible.
I am a big fan of modularisation, it is possible to extend this approach to divide logically your infrastructure and mirror that by separating out the terraform state files too.
Number of TF deployment increases but they each have smaller blast radiuses and you now need to manage making available outputs of builds to those which are dependent on them.
Python deterministically generating terraform HCL files based on yaml.
Execution wrappers that encapsulate terraform in CI/CD to parse the json output and prevent database deletion, but apply everything else.
Scripts that pull every git repo and execute every terraform file they can find while walking the directory tree.
Terraform is about 80% of the way to a good tool, that last 20% is a ball-ache and solved totally differently every time; the best setups I’ve seen is where terraform just “hands off” to something else after making a minimum infrastructure.
But, otherwise, it can get incredibly messy.
That sounds terrible. I'm sorry you had to deal with that.
If they could go back in time, would modules have been good enough?
Yet as a language, it’s quirky as heck. For example how modules are basically wrappers on providers and how different modules can all most “see inside” other modules to iron out dependency ordering but yet also can’t. And speaking of, circular dependencies suck to work around in a modular way without tearing half your structure apart.
Like I said, I am not anywhere close to an expert on terraform and can only describe my limited experience building a fairly simple stack on top of it. The whole thing is just… both amazing and also weird and a bit frustrating. And I have yet to “grow” into multiple environments… lots of my complaints are probably down to my limited experience with it and, honestly, not much out there in terms of best practices for maintaining scalable configuration (or maybe my ADD brain refuses to dive into that, who knows?)
My last adventure into infrastructure as code was with Puppet and Salt. All of that was provisioning on top of bare metal. It was all file operations and the “provider specific modules” were really just wrappers to nicely encapsulate things like nginx or apt. Perhaps it is because of Puppet or Salt’s much more limited scope that didn’t have me feeling the same way.
I mean terraform can be used to configure just about anything that has an API if you wanted. Maintaining a declarative language around that is bound to have its quirks.
For environment specific things use conditionals:
nodes = terraform.workspace == "prod" ? 2 : 1