Mark Meyer: Cloud & Platform Architecture for Digital Products.

Managing Terraform Complexity

The majority, and probably close to 80%, of the Terraform code I see is not idiomatic. It doesn't follow the design principles the language creators set forth. I will talk about module structure in this article. This is one example of a Terraform (anti-) pattern that has a direct impact on operations.

Architecture as a source of Toil and Complexity

I will describe a nested Terraform code structure. This uses the term layer, but I think you will see why it is nested.

Layer 1: At the lowest level, we have resources. Resources are managed by providers. They're a low-level component, from the vantage point of a Terraform user.
Layer 2: The next level up, we have resource customizations. All of these are implemented as a Terraform module that wraps exactly one resource (see 1.). The rationale here is, to set a baseline. If we use resources haphazardly and uncontrolled, chaos will ensue. Making sure that resources are only instantiated in the custom resource layer makes it easy to see that the project sets certain properties of a resource.
Layer 3: Now comes the components layer. Every item in this layer is implemented as a module and exclusively uses modules from layer 2 to assemble a component. In this setting, a component might be a web terminus, assembled from a load blancer, its rules and maybe some security mechanisms.
Layer 4: The application layer in turn is implemented as Terraform modules exclusively built out of modules from layer 3.
Layer 5: Applications are grouped into environments, which in turn are - of course - modules. One such environment might be called "stage". They describe specific arrangements that are part of this style of environment.
Layer 6: Finally, the "root" module instantiates multiple modules from layer 5 and feeds each environment a set of parameters.

This example illustrates the deep nesting of modules.

Structuring your Terraform in this way is horrible.

The better way to structure your Terraform is a "sea of modules", or "module composition" as the Terraform documentation calls it. The Terraform docs have the following tidbit:

In most cases, we strongly recommend keeping the module tree flat, with only one level of child modules [...] We call this flat style of module usage module composition, because it takes multiple composable building-block modules and assembles them together to produce a larger system. Instead of a module embedding its dependencies, creating and managing its own copy, the module receives its dependencies from the root module, which can therefore connect the same modules in different ways to produce different results. source: tf docs

I concur with this recommendation. There are possibly many reasons to give this advice, but my main concern is operational toil. A Terraform module is much like an object in an object-oriented language. It has an interface (inputs and outputs), and it has internal state. The problem here is rooted in the persistence of this internal state. Terraform couples persistent external state to object identity. Whenever you need to change this relationship, you end up with fixups that I don't consider clean code. This coupling leads to failed deployments (lost data, downtime, excess work), because the Terraform structure in the first example does not account for changing state. The sea of modules makes this fairly easy.

To give an example, imagine an application that depends on a database. With the first approach, deleting the application naively will lead to the deletion of the database, because the database is nested. In the composition approach, both live next to each other, and the database might be connected to a new application at any time.

As the sea of modules grows, it can give way to complex dependencies between modules. If people complain about the arbitrary and unintended order of change application (by Terraform) and cyclic dependencies, it's a sure sign that your sea of modules is too complex.

Reduce Complexity

There are two ways to reduce complexity.

Make smaller root modules
Structure module dependencies

Creating smaller root modules can be accomplished by subdividing one root module into multiple root modules. To do this, you need an external mechanism to exchange configuration data between modules. One such external mechanism is Vault (see https://www.hashicorp.com/en/products/vault). Another way to do this is referencing remote state (see: tf docs).

This approach creates implicit dependencies between root modules. These dependencies are implicit in the sense that a module deployment might fail, depending on the values it's dependencies provide. If two such modules are owned and deployed by different teams, your deployment immediately becomes a cross-team meeting (best case).

If you want to preserve the sea of modules feeling, structuring module dependencies is an alternative. This can be a layered approach, but contrary to the approach above, we will not use module nesting.

This approach uses so-called gates to separate layers. Several layers are thought to form a stack. Layers are functional layers, in the sense that they reflect classic system diagrams and bootstrap order. Layer n-1 is always deployed before layer n. A classic example of this kind of functional subdivision is

network layer
database layer
application layer

These can be implemented in Terraform via null resources. Let me give an example.

resource "null_resource" "gate_database" {

	depends_on = [ null_resource.gate_network, aws_db_instance.rds_1, aws_db_instance.rds_2]

}

Implementing these leads to the following dictums:

every resource conceptually in layer n depends on gate n-1
gate n depends on all resources in gate n-1
gate n transitively depends on gate n-1

This allows you to target a defined layer in the stack for provisioning. For example, if you want to change something in the application layer, you start by provisioning the target "gate_database". That might take 20 minutes. After that's done, you do a targeted deploy of "gate_application". If that fails, you can do a short and orderly revert to "gate_database" without losing much time.

This simplifies development. It does not constrain or overburden operations. The drawbacks are the "horrible" syntactic verbosity (according to a former colleague) and the required discipline to maintain a strict dependency order.

Final notes

Let me share an observation.

Terraform does not allow for dependency injection. Modules cannot be parameters of modules (remember the comparison between modules and objects above). Modules are also not introspectable. In Strachey's words, modules are not first-class objects of the Terraform language. The only way in the Terraform language to move data in and out of a module is by means of primitive data types (strings, dictionaries, &c).

Terraform does allow for distributing module source via registries, even remote ones. This feature does help in large infra development settings. At its core, however, the Terraform language does not allow for interchangeable implementations of modules. Terraform modules do not provide an interface or interface contract. It's not a general-purpose programming language. It remains a configuration language that is fairly good at managing resource lifecycle and complexity, if you let it.

I'll close this article by reiterating a remark I made above: If your infra deployment design discussions become cross-team meetings, this is the best case. These meetings are the place where infrastructure architecture should be happening. And this is a place where conversational architecture is eminently applicable.