Streaming Suspense Boundaries: Where to Put Them So TTFB Actually Drops
Suspense in the Next.js App Router is a TTFB lever, not a loading spinner. Here's how we decide where the boundaries go on real product pages — and where they backfire.

Most teams adopt the Next.js App Router, sprinkle a few loading.tsx files around, and call it streaming. Then they wonder why TTFB barely moved and LCP got worse. Suspense boundaries are a precision tool, and the default of "wrap everything that fetches" is almost always wrong.
This is a field guide to placing boundaries on real product pages — what we look for, what we avoid, and the gotchas that only show up under load.
What a Suspense boundary actually does
In the App Router, every <Suspense> boundary (and the loading.tsx convention, which is just sugar over one) is a flush point. The server sends the shell up to that boundary, then streams the rest as the inner Server Components resolve. The browser starts parsing HTML, running scripts, and painting paint-eligible content before the slow stuff is ready.
Three consequences fall out of this:
- TTFB is measured at the first byte of the shell, not the full document. More aggressive flushing = lower TTFB.
- LCP is whatever paints first that qualifies. If your LCP element is inside a Suspense fallback, you've actively delayed it.
- CLS is sensitive to fallback geometry. A fallback that doesn't match the resolved component's dimensions will shift the page when it swaps.
That's the whole mental model. Everything below is just applying it.
The decision tree we actually use
Before wrapping anything, we ask four questions in order:
1. Is this data on the critical render path for LCP?
If the slow component contains or sits above the LCP element, do not wrap it. You'll just push LCP later. Fetch it in the parent, or move the LCP element above the fold and out of the boundary.
A common mistake: a product page where the hero image URL comes from the same query as the reviews. Teams wrap the whole product card in Suspense, the hero image goes into the fallback, and LCP regresses by 400–900 ms in our experience.
Fix: split the query. Fetch the hero data eagerly in the page component, and only wrap the reviews.
// app/product/[id]/page.tsx
import { Suspense } from 'react';
import { getProductHero } from '@/lib/product';
import { Reviews, ReviewsSkeleton } from './reviews';
export default async function ProductPage({
params,
}: {
params: Promise<{ id: string }>;
}) {
const { id } = await params;
const hero = await getProductHero(id); // fast, indexed query
return (
<>
<ProductHero data={hero} /> {/* contains the LCP image */}
<Suspense fallback={<ReviewsSkeleton />}>
<Reviews productId={id} /> {/* slow aggregation */}
</Suspense>
</>
);
}
2. Is the slow work independent, or does it block later UI?
Suspense only helps if the work is genuinely independent. If component B needs data from component A, putting them in sibling boundaries doesn't parallelise anything — A still has to finish first. In that case, hoist the shared fetch up and pass props down, or use React.cache to dedupe.
3. Will the fallback match the final layout?
A fallback that's 200 px tall replacing content that's 600 px tall is a guaranteed CLS hit. We require skeletons to be dimensionally accurate, not "close enough." If you can't predict the height, reserve it with min-height based on the typical case and accept some empty space below.
4. Is this above or below the fold?
Below-the-fold content rarely benefits from a Suspense boundary. The user won't see it for a few hundred milliseconds anyway. Wrapping it just adds streaming overhead and a fallback that flashes during fast renders on good connections. We tend to lazy-load below-the-fold sections via dynamic() with ssr: false for genuinely non-critical widgets, and skip Suspense entirely.
The four placements that earn their keep
After a few dozen audits, the patterns that consistently help are narrow:
Personalised slots inside a mostly-static page
Marketing pages, product detail pages, blog posts — these have one or two personalised regions (cart count, recently viewed, recommendation rail) sitting in an otherwise cacheable shell. Wrap only the personalised slot. The static shell streams in immediately, often before the edge has even started the personalised query.
Expensive aggregations below the fold
Dashboards with a header KPI strip and a long tail of charts. Boundary at the start of the chart grid, not around each chart. One boundary, one fallback, one flush — multiple small boundaries multiply overhead without helping perceived performance, because the user reads top-to-bottom.
Third-party data with unpredictable latency
Any call to an external API whose p99 you don't control (payment provider status, shipping estimates, social embeds). Always wrap. Always set a timeout inside the component too — Suspense doesn't save you from a 30-second hang.
async function ShippingEstimate({ sku }: { sku: string }) {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 2500);
try {
const res = await fetch(`https://api.example.com/ship/${sku}`, {
signal: controller.signal,
next: { revalidate: 60 },
});
if (!res.ok) return <ShippingFallback />;
const data = await res.json();
return <ShippingDetails data={data} />;
} catch {
return <ShippingFallback />;
} finally {
clearTimeout(timeout);
}
}
Authenticated regions on otherwise public pages
The "sign in" vs "hello, name" header slot. Wrapping this lets the public shell — and crucially, the LCP element below it — render without waiting on session lookup. This is often the single highest-ROI boundary on a site.
Places we've stopped putting boundaries
- Around the whole page. That's just
loading.tsx, and it defeats the point. The shell can't stream if the entire route is suspended. - Around fast queries. If a fetch reliably finishes in under ~50 ms, the boundary adds more streaming and reconciliation overhead than it saves. Measure first.
- Around forms. Server Actions and form state get awkward when the form itself is inside a fallback that may swap mid-interaction. Render forms eagerly.
- Inside lists. One boundary per list item turns a single response into N flush points. Group them.
Measuring whether your boundaries actually help
Vibes won't tell you. We instrument three things:
- Server timing headers on the route handler, broken down by which async work the boundary is waiting on. Next.js exposes some of this via
experimental.serverComponentsHmrCachedebug output; we usually add our ownServer-Timingentries. - Real-user TTFB and LCP segmented by route, via the
web-vitalslibrary reporting to whatever analytics endpoint you use. Synthetic Lighthouse runs hide tail latency. - A before/after diff on the streamed HTML. Run
curl --no-bufferagainst the route and watch where the chunks land. If your "flush boundary" produces one chunk, the boundary isn't doing anything — usually because something upstream awaited the same data.
curl --no-buffer -N https://your-app.example.com/product/123 \
| while IFS= read -r line; do
printf '%s %s\n' "$(date +%H:%M:%S.%3N)" "$line"
done
The timestamps in front of each chunk make it obvious whether you're streaming or just pretending to.
The gotchas that bite in production
- Awaiting in a layout above the boundary. A layout that does
await getUser()blocks every child's flush, boundary or not. Push the await down into a Server Component that the boundary wraps. cookies()andheaders()opt routes into dynamic rendering. That's fine, but it means the boundary's parent can't be static. If you're chasing static-first delivery, isolate dynamic APIs behind the boundary.- Middleware that rewrites or sets cookies runs before any streaming and adds to TTFB. Audit it. We've seen 200 ms middlewares completely mask the wins from careful boundary placement.
- Edge runtime cold starts can dominate TTFB on low-traffic routes. Streaming doesn't help a 600 ms cold start. Either keep the route warm or move to the Node runtime if its cold start profile is better for your workload.
- Error boundaries are separate. A thrown error inside a Suspense boundary doesn't render the fallback — it bubbles to the nearest
error.tsx. Pair every meaningful Suspense boundary with an error boundary unless you've thought hard about what "this section failed" should look like.
Where we'd start
If you're inheriting an App Router app and want to make it faster this week: open the slowest route, identify the LCP element, and remove every Suspense boundary that sits above it. Then add exactly one boundary around the slowest below-the-fold region. Measure RUM for a week. That single pass usually recovers 100–300 ms of LCP in our experience, and it costs almost nothing.
Everything else — granular boundaries, parallel data loading, PPR — is worth it, but only after the boundaries you already have are pulling their weight. If you'd like a second pair of eyes on a route that isn't behaving, our team does these audits as part of our web development engagements.
Want a team like ours?
72Technologies builds production software for the kind of teams who actually read this blog.
Start a projectKeep reading

Cache Invalidation in Next.js App Router: A Field Guide
revalidateTag, revalidatePath, and the Data Cache look simple until you ship them. Here's how we reason about Next.js caching layers, what bites teams in production, and the mental model we wish we'd had on day one.

React 19's useOptimistic in Anger: Patterns That Survive Network Failures
useOptimistic feels magical in demos and brittle in production. Here's how we wire it up so optimistic UI doesn't lie to users when the network goes sideways.

Partial Prerendering in Next.js: When the Hype Meets Real Apps
Partial Prerendering promises the speed of static with the freshness of dynamic. Here's what actually happens when you ship it past a marketing site, and where it quietly bites.
