DevOps & CloudJune 17, 2026 6 min read

GCP Cloud Run vs AWS Lambda for a Real Next.js Backend: What We Picked and Why

We ran the same Next.js API workload on Cloud Run and Lambda for three months. Cold starts, cost, observability, and one nasty timeout bug shaped the decision.

We had a Next.js app with a chatty backend — about 40 API routes, mixed read/write, a couple of long-running webhook handlers, and one route that streamed LLM responses. The frontend lived on Vercel. The question was where to put everything else: AWS Lambda behind API Gateway, or Cloud Run on GCP. We ran both for three months. Here's what actually mattered.

Why this comparison, not Vercel functions

Before the obvious question: yes, we considered keeping the API on Vercel Functions. We didn't because we needed long-lived connections to a self-hosted vector DB inside a VPC, a 5-minute timeout on one ingestion route, and tighter control over egress costs. Vercel is excellent for the web tier; we just didn't want it owning the data plane too.

So the real choice was Lambda vs Cloud Run. Both are container-friendly in 2026, both autoscale to zero, both have decent OpenTelemetry stories. On paper they look interchangeable. They aren't.

The workload, honestly

A few things to set context, because "it depends" is the only honest answer to serverless comparisons:

~3.2M requests/month, very spiky (peaks at 80 RPS, idles at 2 RPS overnight)
p50 backend work: 60–120ms (DB read, JSON out)
p95: 400–900ms (a couple of routes do RAG retrieval)
One streaming route holds connections open for 20–60s
One ingestion route runs 90–180s on PDF uploads
All routes need VPC access to Postgres and a vector store

That last constraint — VPC + long requests + streaming — is what made this interesting.

Round 1: cold starts and the streaming route

We deployed the same containerised Node 20 app to both platforms. Lambda via a container image behind a Function URL (we dropped API Gateway after measuring its overhead). Cloud Run with min-instances set to 0 for a fair fight, then later to 1.

Cold start numbers, measured over a week of synthetic traffic from us-east-1 and us-east4:

Lambda (512MB → 2GB): 800ms–1.4s for the container init, plus 200–400ms for Node warm-up. Provisioned concurrency dropped this to ~80ms but cost us roughly $42/month per provisioned instance.
Cloud Run (1 vCPU, 1GB): 600ms–1.1s cold, ~60ms warm. Min-instances=1 cost us about $28/month and killed the cold start problem entirely.

That's already a tiebreaker leaning toward Cloud Run, but the streaming route is what made it lopsided. Lambda's Function URL supports response streaming, but you have to use awslambda.streamifyResponse and the developer experience is rough — especially with Next.js route handlers that assume a standard Response. Cloud Run, being just a container that speaks HTTP, streamed without any glue code.

// Cloud Run: this just works
export async function POST(req: Request) {
  const stream = await llm.stream(await req.json());
  return new Response(stream, {
    headers: { 'Content-Type': 'text/event-stream' },
  });
}

On Lambda we ended up wrapping the handler, and the wrapper leaked memory under load until we pinned a specific runtime version. Not Lambda's fault exactly, but a real cost.

Round 2: the 180-second ingestion route

Lambda's max timeout is 15 minutes — fine. Cloud Run goes up to 60 minutes on a request. Both cover us.

The problem was concurrency. Lambda is one-request-per-instance. Our PDF ingestion route fans out to embedding calls, and during a burst of uploads we hit account concurrency limits faster than expected. We bumped the limit, then bumped it again. Each instance also opened its own DB connection, which meant we needed RDS Proxy. That's fine, but it's another piece.

Cloud Run lets you set concurrency per instance (default 80, we used 20 for this workload). One container handles many requests, shares one DB pool, and the math gets simpler. For the ingestion service specifically, we needed roughly 60% fewer container-seconds on Cloud Run to handle the same burst, because requests overlapped on the same instance.

What about Lambda's SnapStart?

SnapStart for Node landed properly in 2025 and it's good. We tested it. Cold starts dropped to ~150–250ms, which is competitive. But SnapStart adds restore-time gotchas — anything cached at init (DB clients, secret manager pulls) needs careful handling, and we burned half a day on a stale JWT signing key that got snapshotted. Workable, but more cognitive load than "just keep one Cloud Run instance warm".

Round 3: cost, measured not theorised

We ran both in parallel for a month with real traffic mirrored. Rough monthly numbers, excluding data transfer and the databases themselves:

Lambda: $310 compute + $58 provisioned concurrency + $24 CloudWatch Logs ingestion = ~$392
Cloud Run: $245 compute (with min-instances=1 on two services) + $11 Cloud Logging = ~$256

Cloud Run won by about 35%, but the gap was almost entirely (a) min-instances being cheaper than provisioned concurrency, and (b) Cloud Run's per-instance concurrency reducing total billable time on the ingestion service. For a stateless CRUD API with no streaming and no long requests, the gap closes to maybe 10%.

A fair warning: CloudWatch Logs ingestion costs creep up on you. We've seen teams pay more for logs than for Lambda itself. Set retention policies on day one.

Round 4: observability and the bug that picked the winner

We instrument everything with OpenTelemetry exporting to a self-hosted collector, then on to our backend of choice. Both platforms support OTel, but the integration story differs.

On Lambda, we used the AWS-managed OTel layer. It works, but the layer adds ~150ms to cold starts and we found gaps in span context propagation across Function URL → internal SDK calls. Workable with manual context injection, annoying.

On Cloud Run, OTel is just "start the SDK in your container". No layer, no init weirdness, full control over batching and sampling.

The bug: during a traffic spike, our Lambda streaming route started returning 502s. CloudWatch showed nothing useful — the function logs said the handler completed successfully. After two days we found it in X-Ray: API Gateway (we hadn't fully migrated to Function URLs at that point) was timing out at 29s on responses that were still streaming. The function was happy, the gateway wasn't, and the error surfaced as a generic 502 with no correlation ID back to the function invocation.

On Cloud Run, when something times out, you get a clean log line with the trace ID, the request, and the upstream. We've had incidents on Cloud Run too — it's not magic — but the debugging loop is shorter.

Where Lambda actually wins

This isn't a Cloud Run victory lap. Lambda is the better choice when:

You're deep in AWS already and IAM/VPC/SQS integration matters more than raw ergonomics
Your workload is event-driven: S3 triggers, DynamoDB streams, EventBridge. Cloud Run has Eventarc but it's less mature
You need very fine-grained per-function permissions
You're running Go or Rust and cold starts are sub-100ms anyway
Your traffic is genuinely bursty and idle — Lambda's scale-to-zero with no min-instance cost is hard to beat for low-volume internal tools

We still use Lambda for our event pipeline (S3 → Lambda → SQS → worker). It's the right tool there.

The decision

We moved the Next.js backend to Cloud Run. Three reasons, in order: streaming was painless, per-instance concurrency cut our ingestion cost meaningfully, and the observability story let us debug a real incident in minutes instead of days.

If you're a small team picking one platform for a Next.js app with mixed workloads, Cloud Run is the lower-friction default in 2026. If you're already operating at AWS scale with mature IaC and a platform team, the integration gravity of Lambda is real and probably wins.

Where we'd start

If you're making this call today: run the comparison on your actual worst route, not your average one. Pick whichever route makes you nervous — the long one, the streaming one, the one that spikes — and deploy it to both platforms behind a feature flag for two weeks. Measure cold starts at 3am, not at noon. Look at your logs bill, not just your compute bill. Most importantly, deliberately cause a failure and see how long it takes to find the root cause. That last test told us more than any benchmark.

If you want a second pair of eyes on a migration like this, our DevOps and cloud team does this kind of evaluation regularly, and we've written about related infrastructure decisions elsewhere on the blog.

#AWS#GCP#Serverless#Next.js#Observability

Want a team like ours?

72Technologies builds production software for the kind of teams who actually read this blog.

Start a project

Keep reading

Vercel Edge Middleware Cold Starts Wrecked Our p95. Here's the Fix.

Edge middleware promised sub-50ms execution. Our p95 said otherwise. Here's what we found when we instrumented it properly, and the three changes that brought latency back under control.

June 25, 2026 6 min

Terraform State Locking Failed Mid-Apply. Here's What We Learned.

A DynamoDB throttle event left our Terraform state half-written and locked. Here's the postmortem, the recovery steps, and the guardrails we added so it doesn't happen again.

June 23, 2026 6 min

Our AWS NAT Gateway Bill Hit $4k/Month. Here's How We Cut It by 80%.

A single NAT Gateway line item quietly ate our cloud budget. Here's the traffic audit, the VPC endpoint rollout, and the gotchas nobody mentions in the AWS docs.

June 20, 2026 7 min