DevOps & CloudJune 9, 2026 6 min read

Pulumi vs Terraform in 2026: A Real Migration, Not a Bake-Off

We moved part of a production AWS estate from Terraform to Pulumi over six months. Here's what actually changed, what broke, and where we'd quietly stay on HCL.

Six months ago we started moving a chunk of a client's AWS estate from Terraform to Pulumi. Not because Terraform was failing — it wasn't — but because the platform team wanted to write infrastructure in TypeScript alongside their CDK-adjacent application code. This is what we learned, including the parts that made us quietly keep Terraform for a couple of subsystems.

If you're Googling "should we switch to Pulumi in 2026", skip the marketing pages. Here's the version with the bruises.

The estate we were working with

The scope wasn't a toy. Roughly:

~140 Terraform modules across three AWS accounts (dev, staging, prod)
A Terraform Cloud workspace per environment per service (~60 workspaces)
VPCs, EKS, RDS Aurora, a pile of Lambda, SQS, EventBridge, and a CloudFront + S3 static frontend
Secrets in AWS Secrets Manager, with some legacy SSM Parameter Store leftovers
A separate Pulumi project the data team had been running for ~18 months for their Glue and Step Functions work

The goal: collapse to one IaC tool where it made sense, keep the team productive, and avoid a Big Bang weekend.

Why Pulumi, honestly

Not because of any single killer feature. The actual reasons, ranked by how often they came up in planning:

The platform team already wrote TypeScript daily. Code review fatigue on HCL was real.
They wanted real loops, real conditionals, real testing — not for_each gymnastics.
Pulumi's component model fit how they thought about "a service" (a bundle of queue + Lambda + alarm + dashboard).
They wanted to unit-test infra logic with Jest, not just terraform plan diffs in PRs.

Reasons we explicitly did not care about:

"Pulumi is faster." In our runs, plan/preview times were within 10–20% of Terraform for equivalent stacks. Noise.
"No HCL." HCL is fine. The cost of HCL is reviewing it, not writing it.
The OpenTofu fork situation. By 2026 it's settled enough that it wasn't a forcing function either way.

What actually went well

Component resources earned their keep

The single biggest win was Pulumi's ComponentResource. We modelled "a service" as one component that produced its queue, DLQ, Lambda, log group, alarms, and dashboard. A new service became ~30 lines of TypeScript:

const orders = new Service("orders", {
  handler: "./dist/orders",
  memory: 512,
  queue: { visibilityTimeoutSec: 60, dlqMaxReceives: 5 },
  alarms: { errorRateThreshold: 0.02 },
}, { provider: prodProvider });

We had a Terraform module that did roughly the same thing, but it was 400 lines of HCL with three dynamic blocks and a locals section that nobody enjoyed touching. The TypeScript version got proper types, autocomplete, and — crucially — refactoring tools that worked.

Testing infra logic

We wrote Jest tests that asserted things like "every Lambda in prod has an error-rate alarm wired to PagerDuty" and "no S3 bucket is created without a lifecycle policy". These run in CI in under 20 seconds per project. You can do equivalents in Terraform with terraform test or Checkov-style policies, and we used both, but the developer ergonomics of expect(resources).toContainAlarmFor(lambda) are hard to match.

Secrets handling was less awful

Pulumi's encrypted config (with the AWS KMS backend) meant we stopped having half our secrets in Terraform Cloud variable sets and half in Secrets Manager. We standardised on Secrets Manager for runtime secrets and Pulumi config for build-time things like third-party API keys used during deployment. Two places, clearly delineated. Previously we had four.

What went badly

State migration is where you bleed

The official path is pulumi import. It works. It is also tedious and produces code you will rewrite. For ~140 modules, we did not import everything by hand. We wrote a script that walked Terraform state, generated Pulumi import blocks, and ran them in batches per stack. Expect:

~10–15% of resources to need manual fix-up (mostly IAM policies with embedded JSON and anything using aws_iam_policy_document data sources)
Generated code that is technically correct but stylistically nothing like what you'd write
A non-trivial amount of "why is this resource showing as a replacement?" debugging, usually due to attribute ordering or default values

We ended up doing two passes: import to get the state right, then refactor the code to match our component patterns. Skipping the refactor pass would have left us with worse code than the HCL we started with.

Blast radius felt larger

This is the uncomfortable one. With Terraform, the worst a junior could do in a PR was usually scoped by the module boundary. With Pulumi and a general-purpose language, we caught two PRs in review that — through perfectly innocent TypeScript — would have iterated over the wrong array and tried to delete production subnets. Both were caught. Neither would have been possible to write in HCL.

We responded with:

Mandatory pulumi preview output posted to PRs, with a required human ack for any deletion
A custom policy pack that blocks deletion of resources tagged protection=hard outside of an explicit "destructive" pipeline
Per-stack IAM roles in CI so a dev stack literally cannot touch prod, regardless of code

You should do the equivalent in Terraform too. The difference is that in Pulumi you must.

CI minutes went up

Our Pulumi previews on the larger stacks (EKS + 40-ish services) take 3–6 minutes. Terraform plans on the same scope were 2–4 minutes. Not catastrophic, but on a busy day with 30+ PRs, the bill is noticeable. Caching node_modules and the Pulumi plugin directory helped; pre-warming a Docker layer with the providers helped more.

Where we kept Terraform

We did not migrate everything. Two areas stayed on Terraform, deliberately:

The VPC and account baseline

The networking layer changes maybe twice a year. It's read by every other stack. It has a very stable, well-known Terraform module ecosystem (the terraform-aws-modules ones are battle-tested). Migrating it would have bought us nothing except risk. We exposed its outputs via SSM parameters and Pulumi reads them as StackReference-style lookups against a small wrapper.

Compliance-scoped resources

Our client's auditors had pre-approved Terraform modules for some PCI-adjacent resources. Re-certifying equivalent Pulumi components would have taken a quarter of someone's time. Not worth it for code that changes twice a year.

The lesson: "one IaC tool" is a nice slogan, but the actual answer is "one tool per blast-radius zone, and stable boundaries between them".

Numbers, with the usual caveats

In our experience on this migration:

Total engineering time: ~14 person-weeks over six months, spread across two engineers part-time
Resources migrated: ~1,800
Production incidents caused by the migration: 1 (a misconfigured CloudFront origin, caught within 8 minutes by synthetic checks, ~3 minutes of degraded cache hit ratio)
Lines of IaC: down roughly 35% after the refactor pass
Developer satisfaction on the infra surveys: up, but they also got a new coffee machine, so calibrate accordingly

Don't take these as benchmarks. Your estate is not our estate.

Would we do it again?

For this team, yes. For a team that doesn't write TypeScript daily, almost certainly no — the productivity story collapses if the language isn't already in your bloodstream. Python Pulumi is fine but the typing story is weaker, and Go Pulumi is verbose enough that you'll wonder why you left HCL.

If your main complaint about Terraform is "the syntax is annoying", that is not a good enough reason. If your complaint is "we can't model our domain and we can't test our infra logic", Pulumi is worth a serious pilot.

Where we'd start

If you're considering this migration in 2026, do this first:

Pick one non-critical service. Migrate it end-to-end, including CI, secrets, alarms, and on-call runbooks. Time it honestly.
Write the policy pack before you migrate the second service. Blast-radius controls are not a Phase 2 item.
Decide upfront which subsystems you will not migrate, and write that down. Re-litigating this every sprint is exhausting.
Budget for a refactor pass after import. The imported code is a starting point, not a destination.

If you'd like a second pair of eyes on an IaC migration plan, our DevOps and cloud team does this work — usually starting with a two-week assessment rather than a quote-on-day-one. And if you want more of these write-ups, the rest of the blog has the war stories that didn't fit here.

#DevOps#Cloud#Pulumi#Terraform#AWS#IaC

Want a team like ours?

72Technologies builds production software for the kind of teams who actually read this blog.

Start a project

Keep reading

OpenTelemetry Sampling in Production: The Config That Saved Our Trace Bill

Head sampling threw away the traces we needed. Tail sampling blew up our collector memory. Here's the sampling config we landed on after six months in production.

July 30, 2026 6 min

CloudFront to Vercel: The Cache Header Mismatch That Cost Us a Weekend

We fronted a Vercel app with CloudFront to satisfy a compliance requirement. Two weeks later, stale checkouts and missing Set-Cookie headers taught us how differently these two CDNs think about caching.

July 27, 2026 6 min

Vercel Edge Middleware Latency: What We Measured When We Moved Auth to the Edge

We moved auth checks from a Node API route to Vercel Edge Middleware expecting free speed. Some routes got faster, some got slower, and the bill moved in ways we didn't predict.

July 25, 2026 6 min