DevOps & CloudMay 16, 2026 6 min read

Terraform vs Pulumi in 2026: A Migration We Half-Finished

We spent six months partially migrating a production AWS estate from Terraform to Pulumi. Here's what we kept, what we rolled back, and the boring reasons IaC choices rarely come down to language.

Last year a client asked us to move their AWS infrastructure from Terraform to Pulumi because "the team already knows TypeScript." Six months in we stopped halfway, on purpose. This is what we learned about where each tool actually earns its keep — and why the language argument is usually the least interesting part of the decision.

The setup we walked into

The client ran a mid-sized SaaS on AWS: roughly 40 microservices on ECS Fargate, three RDS clusters, a handful of Lambdas, S3, CloudFront, and the usual VPC plumbing. Their Terraform was about 18,000 lines spread across 22 modules, managed with Terraform Cloud, and had grown organically over four years. Drift was real but tolerable. Apply times sat around 4–7 minutes for typical PRs.

The pitch for Pulumi was reasonable on paper:

Engineers wanted loops, conditionals, and abstractions in a real language
HCL had started feeling cramped around dynamic ECS task definitions
A few engineers had been burned by count vs for_each index shifts
Pulumi's component resources looked like a clean answer to "why is this module 600 lines of copy-paste?"

None of that was wrong. It just wasn't the whole picture.

What Pulumi clearly did better

We started by porting one bounded context: the ingestion pipeline. About 3,000 lines of HCL became roughly 1,400 lines of TypeScript. That ratio held across the next two services we migrated.

Real abstractions instead of module gymnastics

The biggest win was component resources. In Terraform, a "service" module had to expose every knob via variables and outputs, because consumers couldn't reach inside. In Pulumi, we wrote a FargateService class that encapsulated the ALB target group, task definition, autoscaling policy, and CloudWatch alarms — and exposed only the surface that mattered.

export class FargateService extends pulumi.ComponentResource {
  public readonly url: pulumi.Output<string>;

  constructor(name: string, args: FargateServiceArgs, opts?: pulumi.ComponentResourceOptions) {
    super("app:infra:FargateService", name, {}, opts);

    const tg = new aws.lb.TargetGroup(`${name}-tg`, { /* ... */ }, { parent: this });
    const task = new aws.ecs.TaskDefinition(`${name}-task`, { /* ... */ }, { parent: this });
    // autoscaling, alarms, log groups...

    this.url = tg.arn.apply(/* ... */);
    this.registerOutputs({ url: this.url });
  }
}

That pattern collapsed three of our chattiest modules. Code review got faster because reviewers could read intent instead of HCL plumbing.

Testing that doesn't feel like a chore

Pulumi's unit tests with mocks let us assert things like "every Fargate task has a log group with retention set" in plain Jest. We'd done similar things in Terraform with terraform-compliance and OPA, but the feedback loop was slower and the tests were harder to maintain. With Pulumi we got compliance checks running in under a second per file.

Async values are a real model

HCL pretends everything is synchronous and papers over the dependency graph. Pulumi's Output<T> makes the async nature explicit. That sounds like a tax until you hit a case where you need to compute something from a resource that doesn't exist yet — at which point Terraform's depends_on hacks suddenly look much worse.

What Terraform quietly did better

After the third service, the energy shifted. Not because Pulumi broke, but because the things Terraform handled invisibly started costing us.

State, drift, and the blast radius question

Terraform Cloud's state locking, run history, and policy-as-code (Sentinel/OPA) were boring and worked. Pulumi has equivalents in Pulumi Cloud, and they're fine, but moving meant retraining the on-call rotation on a new console, new audit trail, new RBAC model. That's a real cost nobody puts on the migration ticket.

We also hit a class of bug we hadn't anticipated: when a TypeScript refactor accidentally changed a resource's logical name, Pulumi wanted to destroy and recreate it. Terraform would have done the same with a moved block, but our engineers had years of muscle memory for terraform state mv. With Pulumi's pulumi state rename and aliases, the operations were available — just unfamiliar, and unfamiliar in production is how you take an RDS instance down.

We didn't take an RDS instance down. We got close enough during a staging dry-run that the room went quiet for about thirty seconds.

The ecosystem gap is smaller than it was, but it's still there

Terraform's provider and module ecosystem in 2026 is still the larger one. For 90% of resources this doesn't matter — Pulumi can consume Terraform providers via its bridge. But when something goes wrong inside that bridge, the stack traces are not friendly. We lost a full afternoon to a bug where a Pulumi-bridged provider serialized a null differently than the underlying Terraform provider expected.

Plans you can actually read

terraform plan output is verbose but greppable, diff-friendly, and every reviewer on the team can parse it. pulumi preview is cleaner visually but harder to paste into a PR comment for an async review. Small thing. Adds up.

CI cost and cold starts

Pulumi programs in Node spin up a language host on every operation. On our self-hosted runners, that added 15–40 seconds per stack per run compared to Terraform. Across hundreds of PRs a week, that's not free. You can mitigate it with persistent runners and dependency caching, but it's another knob.

Why we stopped halfway

Around month five we sat down and asked: what does fully completing this migration buy us that the hybrid state doesn't?

The honest answer was: not much. The services where Pulumi shone — high-cardinality, dynamic, component-heavy stuff — were already migrated. What remained was the boring foundation: VPC, IAM baselines, Route53, the org-level guardrails. That code barely changed. It had been written carefully, was well-tested, and rewriting it just to have one IaC tool would have introduced risk for no operational gain.

So we drew a line:

Pulumi owns application infrastructure: ECS services, Lambdas, per-service S3 buckets, application-level alarms
Terraform owns platform infrastructure: VPC, IAM, KMS, Route53, shared RDS, org-level SCPs, CloudTrail
A small Pulumi stack reads Terraform outputs via the remote state data source

It's not pretty on a slide. It works in practice.

How to decide for your own team

If you're staring at the same choice in 2026, the language is the wrong axis. Better questions:

Who runs IaC on call at 3 a.m.?

If the answer is "application engineers," Pulumi's affinity for their daily language is a genuine ergonomic win. If the answer is "a platform team that lives in HCL," the migration ROI is much smaller than it looks.

How much of your infra is dynamic vs static?

Static infra (networking, IAM, DNS) doesn't benefit from a general-purpose language. Dynamic infra (per-tenant resources, fan-out services, generated configs) does. Map your repo against that axis before deciding.

What's your testing posture?

If you already invest in policy-as-code and pre-apply validation, Pulumi's unit tests are a meaningful upgrade. If you don't test infra at all today, switching tools won't fix that — process will.

How allergic is your org to vendor lock-in?

Both tools have OSS cores and paid backends. The state-backend question is more important than the language question. Self-hosting Pulumi state in S3+DynamoDB is straightforward; so is Terraform with the same backend. Pick deliberately.

Where we'd start

If you're considering this today, don't migrate. Pilot. Pick one bounded service, ideally one with a lot of repetitive resource definitions, and port it to Pulumi behind a feature branch. Run it in parallel in staging for two weeks. Measure the things that actually hurt: review time, preview time, on-call comfort, and how many times someone had to ask "how do I do X in this tool?"

Then — and this is the part most teams skip — write down the criteria for not continuing. "We'll migrate the rest if A, B, and C" is a much healthier commitment than "we're moving to Pulumi." Half-migrations are fine when they're chosen. They're a disaster when they're the residue of a stalled mandate.

If you want a second pair of eyes on an IaC estate before you commit, our DevOps and platform team does this kind of review regularly. The interesting answer is almost never "rewrite everything."

#DevOps#Terraform#Pulumi#AWS#IaC

Want a team like ours?

72Technologies builds production software for the kind of teams who actually read this blog.

Start a project

Keep reading

Sentry Performance Costs Doubled Overnight. Here's What We Found.

Our Sentry bill jumped from ~$900 to ~$2,100 in a single billing cycle with no traffic change. Here's the investigation, the culprits we found, and the sampling strategy we settled on.

July 9, 2026 6 min

Pulumi vs Terraform in 2026: A Migration Story We Almost Regretted

We migrated a mid-sized AWS + Vercel estate from Terraform to Pulumi, hit real walls, and rolled part of it back. Here's what actually happened and when Pulumi is worth it.

July 6, 2026 7 min

OpenTelemetry Sampling: Why Head-Based Cost Us Real Incidents

We ran head-based sampling in OpenTelemetry for a year and it burned us during two real incidents. Here's what tail sampling actually costs, what it saved, and how we'd configure it from scratch.

July 3, 2026 6 min