All articles
DevOps & CloudJune 1, 2026 7 min read

The CloudFront-to-Vercel Edge Migration That Almost Broke Auth

We moved a Next.js app from CloudFront + Lambda@Edge to Vercel and learned the hard way that signed cookies, edge regions, and middleware ordering don't translate cleanly. Here's what bit us.

The CloudFront-to-Vercel Edge Migration That Almost Broke Auth

We spent six weeks moving a mid-sized Next.js app off CloudFront + Lambda@Edge and onto Vercel. On paper it was a simplification: fewer moving parts, no more wrestling with cloudfront-viewer-* headers, faster deploys. In practice, we shipped a bug at 3am UTC that quietly logged out about 4% of authenticated users for nine hours before our session-duration dashboards picked it up.

This is what actually went wrong, what the migration playbook should have looked like, and the bits of the new setup we still like.

The setup we were leaving

The old stack had grown organically over three years:

  • CloudFront in front of an S3 origin for static assets and an ALB origin for the Next.js SSR tier on ECS Fargate.
  • A Lambda@Edge function on the viewer-request event that validated a signed session cookie, rewrote the request, and occasionally redirected to a login flow.
  • A second Lambda@Edge on origin-response that set security headers and patched Set-Cookie domains for a couple of legacy subdomains.

It worked. Cold starts on Lambda@Edge were painful — in our experience, p99 added 200–400ms on a fresh region — but caching hid most of it. The real cost was operational: deploys took 15–25 minutes because Lambda@Edge replicates globally, rollbacks were awkward, and nobody on the team enjoyed debugging it.

Why we wanted Vercel

The Next.js app was already the centre of gravity. We wanted:

  • Preview deployments per PR without bespoke pipelines.
  • Middleware co-located with the app code.
  • Faster iteration on the auth flow, which product wanted to redesign.

We were not chasing performance. The CloudFront setup was, honestly, fine on latency. This is worth saying out loud because it shaped which tradeoffs we were willing to accept.

What we got wrong about edge middleware

Lambda@Edge and Vercel Edge Middleware look similar from a job-description distance: both intercept requests close to the user, both run a slim runtime, both can rewrite/redirect/respond. They are not the same thing.

Three differences bit us:

1. The cookie domain assumption

On CloudFront, our Lambda@Edge ran on requests to app.example.com and could read/write cookies scoped to .example.com without ceremony, because we owned the whole CloudFront distribution and the cert. On Vercel, the production deployment was on app.example.com via a CNAME, but preview deployments lived on *.vercel.app.

Our middleware did this:

// middleware.ts — the version that broke
import { NextResponse, type NextRequest } from 'next/server'
import { verifySession } from '@/lib/auth'

export async function middleware(req: NextRequest) {
  const token = req.cookies.get('sid')?.value
  const session = token ? await verifySession(token) : null

  if (!session && req.nextUrl.pathname.startsWith('/app')) {
    const url = req.nextUrl.clone()
    url.pathname = '/login'
    return NextResponse.redirect(url)
  }

  const res = NextResponse.next()
  if (session?.refreshed) {
    res.cookies.set('sid', session.newToken, {
      domain: '.example.com', // <-- the problem
      httpOnly: true,
      secure: true,
      sameSite: 'lax',
    })
  }
  return res
}

On preview URLs, setting domain: '.example.com' from a vercel.app host means the browser silently drops the cookie. QA hit /app, got logged in, then got bounced to /login on the next request because the cookie they thought they had was never actually stored. We caught this in staging — but only on production-domain previews, which most engineers weren't using.

The fix was to only set the explicit domain when the request host matched our apex:

const host = req.headers.get('host') ?? ''
const isProdHost = host.endsWith('.example.com') || host === 'example.com'

res.cookies.set('sid', session.newToken, {
  httpOnly: true,
  secure: true,
  sameSite: 'lax',
  ...(isProdHost ? { domain: '.example.com' } : {}),
})

Obvious in hindsight. Less obvious when you're porting code that ran in exactly one environment for three years.

2. Region semantics

Lambda@Edge runs in the AWS region closest to the viewer. Our session verification called a Redis cluster in eu-west-1 via a regional Lambda — slow from us-west-2, but predictable.

Vercel Edge Middleware runs on a different network topology. We configured preferredRegion for the route handlers but middleware doesn't accept it the same way. The first week post-migration, our session verifications were hitting Redis from edge POPs we'd never thought about, and the p95 of the auth check went from ~25ms to 80–140ms in our measurements for users in regions far from eu-west-1.

The fix wasn't on Vercel's side; it was architectural. We moved session verification to use a stateless JWT with a short TTL and a refresh path that hits Redis only when the JWT is near expiry. We should have done this years ago. The migration just forced the conversation.

3. Middleware execution order vs. caching

On CloudFront, the cache key was something we controlled explicitly. On Vercel, the default caching for static and ISR routes is aggressive and correct — but middleware that varies responses based on cookies has to declare that variance, or you get cross-user contamination.

We never shipped a cache-poisoning bug, but we caught a near-miss in a preview: a marketing page that read a feature-flag cookie in middleware and rewrote to a variant, with no Vary consideration. On CloudFront we'd had a behaviour-level config forwarding that cookie into the cache key. On Vercel we needed to either skip middleware on that path or make the variance explicit via the response.

The incident

Nine hours of intermittent logouts. Here's what the timeline actually looked like:

  • T+0: cutover at 02:00 UTC on a Tuesday. Smoke tests pass.
  • T+45min: Sentry shows a small uptick in SessionNotFound errors. We attribute it to the cutover and watch it.
  • T+3h: rate is steady at ~4% of authenticated requests. Still not screaming.
  • T+6h: support tickets trickle in. Users on Safari are over-represented.
  • T+9h: we correlate the 4% to users whose previous session had been issued on a preview deployment during QA the week before, with a stale .example.com cookie that was now colliding with the new one on a slightly different path.

The smoking gun was a Path=/ vs Path=/app mismatch between the old Lambda@Edge cookie and the new middleware cookie. Browsers were sending both, and our backend picked the wrong one.

Rollback would have meant repointing DNS back to CloudFront, which we could do, but the TTL and the fact that the new cookies were already in the wild made it messy. We shipped a forward-fix: middleware that, on detecting two sid cookies, invalidated both and forced a clean re-login. About 2,000 users got logged out once. Better than the alternative.

What the playbook should have been

If we ran this migration again, the order would be:

  1. Audit every cookie the old edge layer touches. Name, domain, path, sameSite, secure, max-age. Put it in a table. Diff it against what the new middleware will emit. Do this before writing a line of middleware code.
  2. Run both edges in parallel for at least a week, with a header or query flag routing a small percentage of real traffic to the new path. We did this for the SSR tier but not for the auth-bearing middleware, because middleware is the routing layer. Building a shadow path would have caught the cookie domain bug.
  3. Treat preview deployments as a real environment. Add CI checks that exercise auth flows on a preview URL, not just on a production-domain alias.
  4. Decouple session verification from edge geography before migrating, not during. The JWT change should have been a separate, boring PR shipped a month earlier.
  5. Instrument cookie-state explicitly. We had auth metrics. We did not have a metric for "user is presenting two session cookies simultaneously", which is exactly the failure mode that bit us. Now we do.

What we actually like about the new setup

Not everything was painful. Deploys are genuinely faster — minutes instead of tens of minutes. Preview deployments have changed how product reviews auth changes; designers can click through a real login flow on a branch. The middleware code lives next to the app code and gets reviewed by the same people who own the routes.

We also stopped paying for Lambda@Edge invocations, though the Vercel bill ate most of that saving. The win was operational, not financial.

Where we'd start

If you're considering this migration, start with a cookie inventory and a list of every place your current edge layer reads request state. Then ask whether your session model assumes anything about the network path — region, latency to a data store, header presence. If the answer is yes, fix that first, in your current environment, and migrate a boring service. The migration itself should be the least interesting part.

If you want a second pair of eyes on an edge or auth migration, our DevOps and cloud team does this kind of work, and we've written more migration postmortems on the blog.

#Vercel#AWS#Next.js#Edge#Migration

Want a team like ours?

72Technologies builds production software for the kind of teams who actually read this blog.

Start a project