Streaming SSR and Suspense Boundaries: Where to Draw the Line
Streaming SSR is free performance — until it isn't. Where you place Suspense boundaries decides whether your page feels fast or stutters its way through a waterfall of spinners.

Streaming SSR is one of those features that sounds like free performance. Flip it on, ship HTML in chunks, watch the LCP number drop. In practice, the win lives or dies on where you put your Suspense boundaries — and most teams put them in the wrong place the first time.
This is a field guide based on what we keep fixing on client projects: how streaming actually works in the Next.js App Router, why a single loading.tsx is rarely enough, and how to think about boundary placement as a perceived-performance decision rather than a code-organisation one.
What streaming SSR actually buys you
With traditional SSR, the server renders the whole tree, then flushes the HTML. If one query in the footer takes 800ms, the user stares at a blank tab for 800ms. Streaming SSR — built on React 18+'s renderToPipeableStream and wired into the App Router — lets the server flush HTML in pieces. Anything outside a Suspense boundary ships immediately. Anything inside ships when it's ready, with the fallback rendered in the meantime.
The practical effect is twofold:
- Time to First Byte stays low, because the shell flushes without waiting for slow data.
- Largest Contentful Paint can improve — but only if your LCP element is in the fast part of the tree.
That second condition is where teams trip. If you wrap your hero in Suspense to "unblock the rest of the page," you've just pushed your LCP element behind a fallback. Congratulations, you made things worse on the one metric Google actually grades you on.
The mental model: fast shell, slow islands
Think of every route as a shell plus a set of islands. The shell is everything the user needs to see immediately to feel like the page has loaded: header, hero copy, primary CTA, layout scaffolding. Islands are everything that can arrive a beat later: personalised recommendations, review counts, inventory badges, comment threads, analytics-driven modules.
The shell should be rendered from fast data — cached, static, or co-located with the request (cookies, geo). The islands go inside Suspense boundaries, each with a fallback that matches its final layout to avoid CLS.
A worked example: a product page
Here's a stripped-down App Router page that gets the boundaries roughly right:
// app/products/[slug]/page.tsx
import { Suspense } from 'react';
import { ProductHero } from './_components/product-hero';
import { ReviewsPanel } from './_components/reviews-panel';
import { RecommendationsRail } from './_components/recommendations-rail';
import { InventoryBadge } from './_components/inventory-badge';
import {
ReviewsSkeleton,
RecommendationsSkeleton,
InventorySkeleton,
} from './_components/skeletons';
interface PageProps {
params: Promise<{ slug: string }>;
}
export default async function ProductPage({ params }: PageProps) {
const { slug } = await params;
// Fast, cached read — blocks the shell on purpose.
// We *want* the hero to be in the first flush.
return (
<main>
<ProductHero slug={slug} />
<Suspense fallback={<InventorySkeleton />}>
<InventoryBadge slug={slug} />
</Suspense>
<Suspense fallback={<ReviewsSkeleton />}>
<ReviewsPanel slug={slug} />
</Suspense>
<Suspense fallback={<RecommendationsSkeleton />}>
<RecommendationsRail slug={slug} />
</Suspense>
</main>
);
}
Notice what's not wrapped: ProductHero. It pulls from a cached catalog read and contains the LCP element (the product image and title). Blocking on it is the right call — it ships in the first HTML flush, and the browser can start the image request immediately.
The other three modules each get their own boundary. That's deliberate. If you wrapped all three in a single Suspense, the slowest one would hold the other two hostage.
Why not just use loading.tsx?
loading.tsx is a route-level Suspense boundary. It's fine for navigations where the entire page needs to swap, but it's a blunt instrument. A single boundary at the route root means the entire page waits for the slowest data fetch before any of it streams in. You've effectively un-streamed your stream.
Use loading.tsx as a safety net for the shell, not as your primary streaming strategy. The real work happens with in-page Suspense boundaries near the slow data.
The boundary placement heuristics we use
After shipping enough of these, a few rules of thumb hold up:
- Never wrap your LCP element in Suspense. If the hero image, headline, or primary call-to-action lives behind a fallback, LCP degrades even though TTFB looks great.
- One boundary per independent slow source. If reviews and recommendations come from different services with different latency profiles, give them separate boundaries. A shared boundary couples their worst cases.
- Fallbacks must reserve final dimensions. A skeleton that's 40px tall replaced by a 320px component is a CLS landmine. Match the final layout, or use
min-heightaggressively. - Push boundaries down the tree, not up. A boundary high in the tree blocks more siblings. The closer the boundary sits to the actual async data, the less collateral damage.
- Don't wrap synchronous components. Suspense around a component that doesn't await anything adds overhead with no benefit and confuses the next person reading the code.
The waterfall trap
The most common production gotcha: nesting async server components inside async server components, each with their own data fetch, all inside a single Suspense.
// Anti-pattern
<Suspense fallback={<Skeleton />}>
<OuterAsync> {/* awaits A */}
<InnerAsync /> {/* awaits B, can't start until A resolves */}
</OuterAsync>
</Suspense>
The inner fetch can't start until the outer one resolves, because React has to render OuterAsync to discover InnerAsync exists. You've created a serial waterfall inside what looks like one Suspense unit.
The fix is either to hoist data fetching to a parent that kicks off both requests in parallel with Promise.all, or to give the inner component its own Suspense boundary so at least the outer content can paint:
// Better: parallel fetches, single boundary
async function ProductPanel({ slug }: { slug: string }) {
const [details, stock] = await Promise.all([
getProductDetails(slug),
getStockLevel(slug),
]);
return <PanelView details={details} stock={stock} />;
}
When in doubt, log your server fetch timings. If two fetches that should be parallel are clearly sequential, you've got an accidental waterfall.
Streaming, error boundaries, and the empty-shell problem
A boundary that streams a fallback also needs to handle failure. Without an error boundary, a thrown error inside a Suspense will bubble up to the nearest parent error boundary — often the route's error.tsx, which swallows the entire page. You streamed a beautiful shell, then replaced it with a full-page error because the reviews service timed out.
Pair each major Suspense with a co-located error boundary. In the App Router, that usually means a small client component wrapping the server component, or per-segment error.tsx files when the boundary aligns with a route segment. The goal: a failed island degrades to an inline error message, not a nuked page.
Watch your headers
One quiet production bug worth flagging: once streaming starts, response headers are already flushed. You cannot set cookies or call redirect() from inside a streamed Suspense child. If you need to set auth cookies or redirect based on a slow check, do it in the shell — before the first flush — or in middleware. We've debugged this one more than once on auth-gated routes that worked locally and silently failed in production.
Measuring whether it worked
Don't trust your gut. After moving boundaries around, check:
- TTFB in the field (Chrome UX Report or RUM). Streaming should keep this low even when backend load spikes.
- LCP on the actual element you care about. Use the Performance panel's LCP marker — if it lands on a skeleton, your boundary is too high.
- CLS around each fallback. If it jumps when the real content arrives, your skeleton dimensions are wrong.
- INP for any interactive shell element. Streaming doesn't help here, but it shouldn't hurt; if it does, you're hydrating too aggressively.
Synthetic tests lie about streaming because they often wait for full load before measuring. RUM is the only honest signal.
Where we'd start
If you're auditing an existing App Router project, open one high-traffic route and ask three questions. What's the LCP element, and is it inside a Suspense? Which Suspense boundary wraps the slowest data, and is it sharing space with faster siblings? What happens if any single island throws?
Fix those three on your top route before touching anything else. You'll usually claw back a few hundred milliseconds of perceived load time without writing a single new fetch — just by moving brackets around. That's the whole game with streaming SSR: it's less about the technology and more about being honest with yourself about which parts of the page actually need to be there first.
If you'd rather not run that audit yourselves, it's the kind of thing our web engineering team does in a week.
Want a team like ours?
72Technologies builds production software for the kind of teams who actually read this blog.
Start a projectKeep reading

Cache Tags in Next.js: How We Stopped Nuking the Entire CDN on Every Publish
A war story about how `revalidateTag` and a disciplined tagging scheme replaced our nightly full-cache purge — and the four gotchas we hit getting there in production.

React Server Components and the Prop Serialization Wall
Server Components feel magical until you try to pass a Date, a class instance, or a function through the boundary. Here's how to design around the serialization wall instead of fighting it.

Partial Prerendering in Production: What Breaks When You Turn It On
Partial Prerendering looked like free performance on the demo slide. Then we shipped it. Here's what actually breaks when you flip the flag on a real Next.js app — and how we'd roll it out now.
