Vercel Edge Middleware Cold Starts Wrecked Our p95. Here's the Fix.
Edge middleware promised sub-50ms execution. Our p95 said otherwise. Here's what we found when we instrumented it properly, and the three changes that brought latency back under control.
Edge middleware is sold as the magic layer: auth, redirects, A/B tests, all running close to the user in under 50ms. We bought the pitch. Then our p95 TTFB on authenticated routes started climbing past 600ms and the dashboard kept insisting the middleware was "fast".
This is the breakdown of what actually happened, how we measured it, and the three changes that fixed it without abandoning the edge runtime.
The symptom: dashboards green, users complaining
Our setup was a fairly standard Next.js 14 App Router deployment on Vercel. A single middleware.ts handled three things:
- JWT verification for
/app/*routes - Geo-based redirects for marketing pages
- A feature flag check that called an external service
Vercel's analytics showed middleware execution times averaging 12-18ms. Sentry's web vitals told a different story: p75 TTFB was 340ms, p95 was 680ms, and a slow tail of requests was hitting 1.2s. Support tickets matched the tail, not the average.
The gap between "middleware execution time" and "time the user actually waits" turned out to be the entire problem.
What Vercel's metric actually measures
The execution time Vercel reports is the time your middleware function spends running. It does not include:
- Cold start / isolate boot time for the edge runtime
- DNS, TLS, and connection setup to any
fetch()you make from inside middleware - Time waiting for the underlying request to be routed to a region after middleware completes
In our case, the feature flag fetch() was the silent killer. The middleware logic ran in 14ms. The fetch to our flag service took anywhere from 40ms to 900ms depending on region and whether the connection was warm.
Instrumenting it properly
We wired OpenTelemetry into the middleware using the @vercel/otel package and exported to our existing collector. The edge runtime doesn't support the full Node SDK, but @vercel/otel handles the constraints.
// instrumentation.ts
import { registerOTel } from '@vercel/otel';
export function register() {
registerOTel({
serviceName: 'web-edge',
instrumentationConfig: {
fetch: {
propagateContextUrls: ['*'],
},
},
});
}
Then we added explicit spans around each logical block in the middleware:
import { trace } from '@opentelemetry/api';
const tracer = trace.getTracer('edge-middleware');
export async function middleware(req: NextRequest) {
return tracer.startActiveSpan('middleware', async (span) => {
try {
const token = req.cookies.get('session')?.value;
const user = await tracer.startActiveSpan('verify-jwt', async (s) => {
try { return await verifyJwt(token); }
finally { s.end(); }
});
const flags = await tracer.startActiveSpan('fetch-flags', async (s) => {
try {
return await fetch(FLAG_URL, {
headers: { 'x-user': user.id },
cache: 'no-store',
}).then(r => r.json());
} finally { s.end(); }
});
return applyDecisions(req, user, flags);
} finally {
span.end();
}
});
}
Once the traces landed, the picture was obvious. The fetch-flags span owned 60-85% of total middleware wall time on cold paths. JWT verification was a flat 8ms. Everything else was noise.
Cold isolates are real, just not what you think
The edge runtime keeps isolates warm per region per deployment. Deploy frequently, or have traffic from a low-volume region, and you pay isolate boot cost. In our experience this is typically 30-120ms on top of execution, not the multi-second cold starts you'd see on a Lambda VPC config.
The more painful cost was connection setup. Every cold isolate had to redo DNS resolution and TLS handshake to our flag service. That alone added 80-200ms on first request.
Three changes that actually helped
We tried the obvious stuff first and most of it didn't matter. Bundling smaller, removing a logger, switching JSON parsers — single-digit-ms improvements. Here's what actually moved p95.
1. Stop calling external services from middleware
This is the unpopular one. The whole point of middleware is to do something before the request hits your route. But every fetch() from middleware is a serial dependency added to TTFB, and you're often calling from a region that isn't optimal for the upstream service.
We moved the feature flag evaluation into an edge-cached snapshot. A background job publishes the flag ruleset as a small JSON blob to Vercel's Edge Config every 30 seconds. Middleware reads from Edge Config, which is colocated with the runtime.
import { get } from '@vercel/edge-config';
const flagRules = await get('flag-rules');
const flags = evaluateRules(flagRules, user);
Reads dropped from 40-900ms to 1-4ms. We lost the ability to flip a flag and have it take effect in under a second, but a 30-second propagation window was a fine trade for the latency win.
2. Move expensive checks out of middleware entirely
Not everything belongs at the edge. We were doing JWT verification with a remote JWKS fetch on first invocation. Even with caching, the cache miss case was brutal.
We split the work:
- Middleware: cheap signature check using a public key embedded at build time, plus expiry validation
- Route handler: full claims validation, revocation check, session lookup
If the cheap check fails, middleware redirects to login immediately. If it passes, the route handler does the deeper check with access to a regional database. This pushed the expensive path off the critical TTFB and into a place where we could parallelize it with other server work.
3. Use matcher aggressively
This sounds obvious but we found middleware running on routes that didn't need it: static assets that snuck past the default exclusions, healthcheck endpoints, OG image routes. Every one of those was paying middleware cost for no reason.
export const config = {
matcher: [
'/((?!_next/static|_next/image|favicon.ico|api/health|api/og|.*\\.(?:svg|png|jpg|jpeg|gif|webp)$).*)',
],
};
We also split logic that only applied to authenticated routes behind a path check at the very top of the function, before any async work. Cheaper than a matcher in some cases because you keep one middleware file, but you skip the expensive branches.
What the numbers looked like after
We rolled the changes out over two weeks, one at a time, so we could attribute the wins. Rough shape of what we saw across our traffic:
- Edge Config migration: p95 TTFB dropped from ~680ms to ~280ms on authenticated routes
- JWT split: p95 dropped another 60-80ms, p99 dropped more (the tail was the JWKS fetch)
- Matcher cleanup: minor on p95, but our middleware invocation count dropped about 22%, which mattered for billing
Your numbers will differ. The point isn't the specific gains — it's that the diagnosis was only possible once we stopped trusting the platform's execution-time metric and measured end-to-end with our own traces.
The mental model we use now
Middleware is for decisions that need to happen before routing and that can be made with local or near-local data. If you find yourself calling a service from middleware, ask:
- Can this data live in Edge Config, a signed cookie, or the JWT itself?
- Can this check be deferred to the route handler without hurting UX?
- Is the worst-case latency of that external call acceptable as part of every request's TTFB?
If the answer to all three is no, the work probably doesn't belong in middleware. Edge isn't free latency — it's relocated latency, and the relocation only helps if the work you're doing is genuinely local.
Where we'd start
If you suspect middleware is dragging your TTFB, do this in order before changing anything:
- Add OpenTelemetry via
@vercel/oteland wrap every async block in middleware with its own span. One afternoon of work. - Look at p95 and p99 of each span, not the average. Averages hide everything that matters.
- For any span that calls
fetch(), ask whether that data could live in Edge Config or be deferred to the route handler. - Audit your
matcherand add early-exit conditions for paths that don't need the full pipeline.
If you want a second pair of eyes on a performance regression like this, our team works on exactly these problems — see our DevOps and platform engineering work. And if you're earlier in the stack and trying to decide whether edge middleware is the right tool at all, the answer is usually "only for the cheap stuff".
Want a team like ours?
72Technologies builds production software for the kind of teams who actually read this blog.
Start a projectKeep reading

Terraform State Locking Failed Mid-Apply. Here's What We Learned.
A DynamoDB throttle event left our Terraform state half-written and locked. Here's the postmortem, the recovery steps, and the guardrails we added so it doesn't happen again.

Our AWS NAT Gateway Bill Hit $4k/Month. Here's How We Cut It by 80%.
A single NAT Gateway line item quietly ate our cloud budget. Here's the traffic audit, the VPC endpoint rollout, and the gotchas nobody mentions in the AWS docs.

GCP Cloud Run vs AWS Lambda for a Real Next.js Backend: What We Picked and Why
We ran the same Next.js API workload on Cloud Run and Lambda for three months. Cold starts, cost, observability, and one nasty timeout bug shaped the decision.
