Terraform vs Pulumi in 2026: A Migration We Half-Finished
We spent six months partially migrating a production AWS estate from Terraform to Pulumi. Here's what we kept, what we rolled back, and the boring reasons IaC choices rarely come down to language.

Last year a client asked us to move their AWS infrastructure from Terraform to Pulumi because "the team already knows TypeScript." Six months in we stopped halfway, on purpose. This is what we learned about where each tool actually earns its keep — and why the language argument is usually the least interesting part of the decision.
The setup we walked into
The client ran a mid-sized SaaS on AWS: roughly 40 microservices on ECS Fargate, three RDS clusters, a handful of Lambdas, S3, CloudFront, and the usual VPC plumbing. Their Terraform was about 18,000 lines spread across 22 modules, managed with Terraform Cloud, and had grown organically over four years. Drift was real but tolerable. Apply times sat around 4–7 minutes for typical PRs.
The pitch for Pulumi was reasonable on paper:
- Engineers wanted loops, conditionals, and abstractions in a real language
- HCL had started feeling cramped around dynamic ECS task definitions
- A few engineers had been burned by
countvsfor_eachindex shifts - Pulumi's component resources looked like a clean answer to "why is this module 600 lines of copy-paste?"
None of that was wrong. It just wasn't the whole picture.
What Pulumi clearly did better
We started by porting one bounded context: the ingestion pipeline. About 3,000 lines of HCL became roughly 1,400 lines of TypeScript. That ratio held across the next two services we migrated.
Real abstractions instead of module gymnastics
The biggest win was component resources. In Terraform, a "service" module had to expose every knob via variables and outputs, because consumers couldn't reach inside. In Pulumi, we wrote a FargateService class that encapsulated the ALB target group, task definition, autoscaling policy, and CloudWatch alarms — and exposed only the surface that mattered.
export class FargateService extends pulumi.ComponentResource {
public readonly url: pulumi.Output<string>;
constructor(name: string, args: FargateServiceArgs, opts?: pulumi.ComponentResourceOptions) {
super("app:infra:FargateService", name, {}, opts);
const tg = new aws.lb.TargetGroup(`${name}-tg`, { /* ... */ }, { parent: this });
const task = new aws.ecs.TaskDefinition(`${name}-task`, { /* ... */ }, { parent: this });
// autoscaling, alarms, log groups...
this.url = tg.arn.apply(/* ... */);
this.registerOutputs({ url: this.url });
}
}
That pattern collapsed three of our chattiest modules. Code review got faster because reviewers could read intent instead of HCL plumbing.
Testing that doesn't feel like a chore
Pulumi's unit tests with mocks let us assert things like "every Fargate task has a log group with retention set" in plain Jest. We'd done similar things in Terraform with terraform-compliance and OPA, but the feedback loop was slower and the tests were harder to maintain. With Pulumi we got compliance checks running in under a second per file.
Async values are a real model
HCL pretends everything is synchronous and papers over the dependency graph. Pulumi's Output<T> makes the async nature explicit. That sounds like a tax until you hit a case where you need to compute something from a resource that doesn't exist yet — at which point Terraform's depends_on hacks suddenly look much worse.
What Terraform quietly did better
After the third service, the energy shifted. Not because Pulumi broke, but because the things Terraform handled invisibly started costing us.
State, drift, and the blast radius question
Terraform Cloud's state locking, run history, and policy-as-code (Sentinel/OPA) were boring and worked. Pulumi has equivalents in Pulumi Cloud, and they're fine, but moving meant retraining the on-call rotation on a new console, new audit trail, new RBAC model. That's a real cost nobody puts on the migration ticket.
We also hit a class of bug we hadn't anticipated: when a TypeScript refactor accidentally changed a resource's logical name, Pulumi wanted to destroy and recreate it. Terraform would have done the same with a moved block, but our engineers had years of muscle memory for terraform state mv. With Pulumi's pulumi state rename and aliases, the operations were available — just unfamiliar, and unfamiliar in production is how you take an RDS instance down.
We didn't take an RDS instance down. We got close enough during a staging dry-run that the room went quiet for about thirty seconds.
The ecosystem gap is smaller than it was, but it's still there
Terraform's provider and module ecosystem in 2026 is still the larger one. For 90% of resources this doesn't matter — Pulumi can consume Terraform providers via its bridge. But when something goes wrong inside that bridge, the stack traces are not friendly. We lost a full afternoon to a bug where a Pulumi-bridged provider serialized a null differently than the underlying Terraform provider expected.
Plans you can actually read
terraform plan output is verbose but greppable, diff-friendly, and every reviewer on the team can parse it. pulumi preview is cleaner visually but harder to paste into a PR comment for an async review. Small thing. Adds up.
CI cost and cold starts
Pulumi programs in Node spin up a language host on every operation. On our self-hosted runners, that added 15–40 seconds per stack per run compared to Terraform. Across hundreds of PRs a week, that's not free. You can mitigate it with persistent runners and dependency caching, but it's another knob.
Why we stopped halfway
Around month five we sat down and asked: what does fully completing this migration buy us that the hybrid state doesn't?
The honest answer was: not much. The services where Pulumi shone — high-cardinality, dynamic, component-heavy stuff — were already migrated. What remained was the boring foundation: VPC, IAM baselines, Route53, the org-level guardrails. That code barely changed. It had been written carefully, was well-tested, and rewriting it just to have one IaC tool would have introduced risk for no operational gain.
So we drew a line:
- Pulumi owns application infrastructure: ECS services, Lambdas, per-service S3 buckets, application-level alarms
- Terraform owns platform infrastructure: VPC, IAM, KMS, Route53, shared RDS, org-level SCPs, CloudTrail
- A small Pulumi stack reads Terraform outputs via the remote state data source
It's not pretty on a slide. It works in practice.
How to decide for your own team
If you're staring at the same choice in 2026, the language is the wrong axis. Better questions:
Who runs IaC on call at 3 a.m.?
If the answer is "application engineers," Pulumi's affinity for their daily language is a genuine ergonomic win. If the answer is "a platform team that lives in HCL," the migration ROI is much smaller than it looks.
How much of your infra is dynamic vs static?
Static infra (networking, IAM, DNS) doesn't benefit from a general-purpose language. Dynamic infra (per-tenant resources, fan-out services, generated configs) does. Map your repo against that axis before deciding.
What's your testing posture?
If you already invest in policy-as-code and pre-apply validation, Pulumi's unit tests are a meaningful upgrade. If you don't test infra at all today, switching tools won't fix that — process will.
How allergic is your org to vendor lock-in?
Both tools have OSS cores and paid backends. The state-backend question is more important than the language question. Self-hosting Pulumi state in S3+DynamoDB is straightforward; so is Terraform with the same backend. Pick deliberately.
Where we'd start
If you're considering this today, don't migrate. Pilot. Pick one bounded service, ideally one with a lot of repetitive resource definitions, and port it to Pulumi behind a feature branch. Run it in parallel in staging for two weeks. Measure the things that actually hurt: review time, preview time, on-call comfort, and how many times someone had to ask "how do I do X in this tool?"
Then — and this is the part most teams skip — write down the criteria for not continuing. "We'll migrate the rest if A, B, and C" is a much healthier commitment than "we're moving to Pulumi." Half-migrations are fine when they're chosen. They're a disaster when they're the residue of a stalled mandate.
If you want a second pair of eyes on an IaC estate before you commit, our DevOps and platform team does this kind of review regularly. The interesting answer is almost never "rewrite everything."
Want a team like ours?
72Technologies builds production software for the kind of teams who actually read this blog.
Start a projectKeep reading

The Day Our GCP Cloud Run Cold Starts Took Down Checkout
A Cloud Run service that ran fine for eighteen months started timing out checkout on a Friday afternoon. The fix wasn't more CPU — it was a misread of how min-instances, concurrency, and startup CPU boost actually interact.

The AWS NAT Gateway Bill That Ate Our Margin
A quiet $40/day NAT Gateway line item turned into the second-largest cost on our AWS account. Here's how we found it, what was actually driving it, and the VPC endpoint plumbing that fixed it.

The Sentry Bill That Tripled Overnight: A Quota Postmortem
A single deploy turned a calm Sentry account into a $4k surprise. Here's what happened, what we changed, and how to stop event floods before finance notices.
