DevOps & CloudMay 30, 2026 7 min read

Migrating a Terraform Monorepo to Stacks: What We'd Do Differently

We moved a 40-module Terraform monorepo to HCP Terraform Stacks. Here's what broke, what we gained, and the four decisions we'd reverse if we started over.

We spent most of Q1 migrating a client's Terraform monorepo — 40-ish modules, three environments, two cloud accounts — onto HCP Terraform Stacks. The migration worked. It also taught us that "Stacks-shaped" is not the same as "monorepo-shaped," and that a few decisions we made in week one cost us about three weeks in month two.

This is the honest version of that project: what Stacks actually solved, what it didn't, and the four calls we'd make differently if we started again next Monday.

Why we left the flat monorepo

The original layout was the classic one most teams end up with after two or three years of growth: a single repo, a modules/ directory, and an envs/ directory with one folder per environment per region. State lived in S3 with DynamoDB locking. CI was GitHub Actions running terraform plan on PRs and terraform apply on merge to main.

It worked. It also had three problems we couldn't shake:

Plan times. A change to a shared module triggered plans across every environment that referenced it. End-to-end PR feedback was 18–25 minutes on a good day.
Blast radius. Anyone with merge rights could ship to production. We had OPA policies, but the review surface was too big to enforce them consistently.
Environment drift. Staging and prod were "the same" until someone needed to test a thing in staging and forgot to backport. We had three modules where the prod version was six months ahead of staging.

Stacks promised a cleaner answer to all three: deployments as first-class objects, a single configuration that fans out across environments, and orchestrated rollouts with policy gates between them. On paper, that's exactly what we wanted.

What Stacks actually changed

The mental model shift is the important part. In a flat monorepo, an environment is a directory. In Stacks, an environment is a deployment of a single declared component graph. You describe the graph once in components.tfstack.hcl and the rollout strategy once in deployments.tfdeploy.hcl.

A stripped-down version of what ours looked like:

# components.tfstack.hcl
component "network" {
  source = "./modules/network"
  inputs = {
    cidr_block = var.cidr_block
    region     = var.region
  }
}

component "data" {
  source = "./modules/data"
  inputs = {
    vpc_id     = component.network.vpc_id
    subnet_ids = component.network.private_subnet_ids
    db_size    = var.db_size
  }
}

component "app" {
  source = "./modules/app"
  inputs = {
    vpc_id   = component.network.vpc_id
    db_host  = component.data.db_endpoint
    image    = var.app_image
  }
}

And the deployment side:

# deployments.tfdeploy.hcl
deployment "staging" {
  inputs = {
    region     = "us-east-1"
    cidr_block = "10.10.0.0/16"
    db_size    = "db.t4g.medium"
    app_image  = var.app_image
  }
}

deployment "prod" {
  inputs = {
    region     = "us-east-1"
    cidr_block = "10.20.0.0/16"
    db_size    = "db.r6g.xlarge"
    app_image  = var.app_image
  }
}

orchestrate "sequential" "rollout" {
  deployment "staging" {}
  deployment "prod" {
    depends_on = [deployment.staging]
  }
}

That's the win. Drift between environments now requires you to type a different value in a single file, in the open, on a PR. It doesn't prevent drift, but it makes drift visible.

Plan times got better, but not for free

Plan times dropped roughly 40% on average for changes that only touched one component, because Stacks only re-plans the components whose inputs changed. Changes that touched the network component (which everything depends on) still took as long as before — sometimes longer, because the dependency walk is more thorough. If you were hoping Stacks would magically parallelise your whole graph, it won't. The dependency edges you declared are the dependency edges you get.

The state migration was the scary part

We had ~40 modules with live state. Stacks doesn't import flat-monorepo state; you're effectively starting fresh and asking Terraform to adopt existing resources into the new component model.

We used three techniques, in this order:

moved blocks within modules to refactor anything we wanted to rename before the cutover. Cheap and safe.
import blocks (the HCL kind, not the CLI) generated from a script that walked the old state files. We ran this in a non-Stacks workspace first to validate the imports compiled cleanly.
removed blocks in the old workspaces to release resources from the old state without destroying them, paired with import blocks in the new Stacks configuration.

The order matters. If you import into Stacks before removing from the old state, both workspaces think they own the resource and you get duplicate-apply races. We learned that on a NAT gateway. The NAT gateway was fine. The on-call engineer's evening was not.

The rule we settled on: remove first, import second, never in the same PR. One PR to release, merge, verify, then a second PR to adopt.

Four decisions we'd reverse

Here's the part worth bookmarking.

1. We made the component graph too granular

We started with 14 components because the old monorepo had 14 "logical" modules. That was wrong. Components in Stacks are units of dependency and units of plan parallelism. They should be coarse enough that an engineer can reason about one without loading the others into their head.

We collapsed to seven components in month two: network, data, secrets, app, workers, observability, edge. Plans got faster and PR reviews got shorter. If we did it again, we'd start at five and split only when something hurt.

2. We put the app image in the Stack

Passing app_image as a Stack input meant every application deploy triggered a Stacks run. That's IaC and CD getting tangled up — slower, more expensive, and politically awkward when the app team wanted to deploy and the platform team's CI was queued.

We should have kept image rollouts in the existing deployment pipeline (in this case ECS service updates via a separate workflow) and only used Stacks for infrastructure shape. Stacks runs are billed per managed resource per hour on HCP Terraform; pulling routine image bumps out of Stacks would have meaningfully reduced our managed-resource-hour count.

3. We didn't budget for the CI rewrite

The existing GitHub Actions workflows did not translate. Stacks expects you to use HCP Terraform's run pipeline, with VCS-driven triggers and its own policy checks. We spent about a week trying to keep our old CI in front of it, gave up, and rebuilt the workflow around HCP Terraform's native triggers and Sentinel policies.

If you're migrating, budget two engineer-weeks for CI/CD plumbing alone. We budgeted zero.

4. We onboarded all environments at once

We migrated dev, staging, and prod in the same two-week window because "the configuration is shared anyway." In hindsight, dev should have lived in Stacks for at least a month before we touched prod. Several of the import-order bugs we hit in prod were latent in dev for days; we just didn't have time to notice them because everything was moving.

The lesson is boring and true: migrate one environment, let it bake, then migrate the next. The fact that Stacks makes environments cheap to clone is exactly why you can afford to stage the rollout.

When Stacks is not the right answer

A few cases where we'd stay on a flat monorepo:

Single environment, single account. The orchestration features are the whole point. If you have one prod and nothing else, Stacks is overhead.
You're heavily invested in OpenTofu. Stacks is HCP Terraform / Terraform Enterprise only at time of writing. If you've already picked OpenTofu for licensing reasons, this isn't a migration, it's a fork.
Your modules are not idempotent. Stacks will surface every flaky module you have, because it re-plans aggressively. Fix the modules first.

If you want a second opinion on which of these applies to you, our DevOps & cloud engineering team does these reviews regularly, and we've written more on IaC tradeoffs over on the blog.

Where we'd start

If you're staring at a flat Terraform monorepo today and wondering whether Stacks is worth it, here's the order we'd recommend:

Spend a day drawing the component graph you would want — five to seven nodes, not fifteen.
Pick your least important environment and migrate only that. Treat it as throwaway. Time how long it takes.
Pull anything that changes more than once a week (images, feature flags, config) out of the Stack before you migrate prod.
Budget the CI rewrite explicitly and tell whoever owns the roadmap.
Migrate prod last, after at least two weeks of the new model surviving real changes in a lower environment.

Stacks is genuinely good. It is not a refactor you can squeeze into a sprint, and it punishes teams that try to lift-and-shift their existing module layout without rethinking it. Give it the design time it needs, and it'll pay you back in shorter plans and fewer 2am drift surprises.

#Terraform#IaC#DevOps#HCP Terraform#Migration

Want a team like ours?

72Technologies builds production software for the kind of teams who actually read this blog.

Start a project

Keep reading

AWS NAT Gateway Bills Ate Our Margins. Here's How We Cut Them 78%.

A single misconfigured VPC route turned our NAT Gateway into a five-figure monthly line item. Here's the audit trail, the fixes, and what we'd do differently.

July 19, 2026 6 min

GCP Cloud Run vs AWS Lambda for Bursty APIs: What Broke, What Held

We ran the same bursty checkout API on Cloud Run and Lambda for six months. Cold starts, concurrency, and billing quirks all bit us in ways the marketing pages don't mention.

July 14, 2026 6 min

Vercel Preview Deployments Are Leaking Secrets. Audit Yours Now.

Preview URLs are treated like staging by developers and like production by attackers. Here's how we found real secrets exposed across three client accounts, and the guardrails we now enforce by default.

July 11, 2026 6 min