Web DevelopmentMay 19, 2026 7 min read

Streaming Suspense Boundaries: Where to Put Them So TTFB Actually Drops

Suspense in the Next.js App Router is a TTFB lever, not a loading spinner. Here's how we decide where the boundaries go on real product pages — and where they backfire.

Most teams adopt the Next.js App Router, sprinkle a few loading.tsx files around, and call it streaming. Then they wonder why TTFB barely moved and LCP got worse. Suspense boundaries are a precision tool, and the default of "wrap everything that fetches" is almost always wrong.

This is a field guide to placing boundaries on real product pages — what we look for, what we avoid, and the gotchas that only show up under load.

What a Suspense boundary actually does

In the App Router, every <Suspense> boundary (and the loading.tsx convention, which is just sugar over one) is a flush point. The server sends the shell up to that boundary, then streams the rest as the inner Server Components resolve. The browser starts parsing HTML, running scripts, and painting paint-eligible content before the slow stuff is ready.

Three consequences fall out of this:

TTFB is measured at the first byte of the shell, not the full document. More aggressive flushing = lower TTFB.
LCP is whatever paints first that qualifies. If your LCP element is inside a Suspense fallback, you've actively delayed it.
CLS is sensitive to fallback geometry. A fallback that doesn't match the resolved component's dimensions will shift the page when it swaps.

That's the whole mental model. Everything below is just applying it.

The decision tree we actually use

Before wrapping anything, we ask four questions in order:

1. Is this data on the critical render path for LCP?

If the slow component contains or sits above the LCP element, do not wrap it. You'll just push LCP later. Fetch it in the parent, or move the LCP element above the fold and out of the boundary.

A common mistake: a product page where the hero image URL comes from the same query as the reviews. Teams wrap the whole product card in Suspense, the hero image goes into the fallback, and LCP regresses by 400–900 ms in our experience.

Fix: split the query. Fetch the hero data eagerly in the page component, and only wrap the reviews.

// app/product/[id]/page.tsx
import { Suspense } from 'react';
import { getProductHero } from '@/lib/product';
import { Reviews, ReviewsSkeleton } from './reviews';

export default async function ProductPage({
  params,
}: {
  params: Promise<{ id: string }>;
}) {
  const { id } = await params;
  const hero = await getProductHero(id); // fast, indexed query

  return (
    <>
      <ProductHero data={hero} /> {/* contains the LCP image */}
      <Suspense fallback={<ReviewsSkeleton />}>
        <Reviews productId={id} /> {/* slow aggregation */}
      </Suspense>
    </>
  );
}

2. Is the slow work independent, or does it block later UI?

Suspense only helps if the work is genuinely independent. If component B needs data from component A, putting them in sibling boundaries doesn't parallelise anything — A still has to finish first. In that case, hoist the shared fetch up and pass props down, or use React.cache to dedupe.

3. Will the fallback match the final layout?

A fallback that's 200 px tall replacing content that's 600 px tall is a guaranteed CLS hit. We require skeletons to be dimensionally accurate, not "close enough." If you can't predict the height, reserve it with min-height based on the typical case and accept some empty space below.

4. Is this above or below the fold?

Below-the-fold content rarely benefits from a Suspense boundary. The user won't see it for a few hundred milliseconds anyway. Wrapping it just adds streaming overhead and a fallback that flashes during fast renders on good connections. We tend to lazy-load below-the-fold sections via dynamic() with ssr: false for genuinely non-critical widgets, and skip Suspense entirely.

The four placements that earn their keep

After a few dozen audits, the patterns that consistently help are narrow:

Personalised slots inside a mostly-static page

Marketing pages, product detail pages, blog posts — these have one or two personalised regions (cart count, recently viewed, recommendation rail) sitting in an otherwise cacheable shell. Wrap only the personalised slot. The static shell streams in immediately, often before the edge has even started the personalised query.

Expensive aggregations below the fold

Dashboards with a header KPI strip and a long tail of charts. Boundary at the start of the chart grid, not around each chart. One boundary, one fallback, one flush — multiple small boundaries multiply overhead without helping perceived performance, because the user reads top-to-bottom.

Third-party data with unpredictable latency

Any call to an external API whose p99 you don't control (payment provider status, shipping estimates, social embeds). Always wrap. Always set a timeout inside the component too — Suspense doesn't save you from a 30-second hang.

async function ShippingEstimate({ sku }: { sku: string }) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 2500);

  try {
    const res = await fetch(`https://api.example.com/ship/${sku}`, {
      signal: controller.signal,
      next: { revalidate: 60 },
    });
    if (!res.ok) return <ShippingFallback />;
    const data = await res.json();
    return <ShippingDetails data={data} />;
  } catch {
    return <ShippingFallback />;
  } finally {
    clearTimeout(timeout);
  }
}

Authenticated regions on otherwise public pages

The "sign in" vs "hello, name" header slot. Wrapping this lets the public shell — and crucially, the LCP element below it — render without waiting on session lookup. This is often the single highest-ROI boundary on a site.

Places we've stopped putting boundaries

Around the whole page. That's just loading.tsx, and it defeats the point. The shell can't stream if the entire route is suspended.
Around fast queries. If a fetch reliably finishes in under ~50 ms, the boundary adds more streaming and reconciliation overhead than it saves. Measure first.
Around forms. Server Actions and form state get awkward when the form itself is inside a fallback that may swap mid-interaction. Render forms eagerly.
Inside lists. One boundary per list item turns a single response into N flush points. Group them.

Measuring whether your boundaries actually help

Vibes won't tell you. We instrument three things:

Server timing headers on the route handler, broken down by which async work the boundary is waiting on. Next.js exposes some of this via experimental.serverComponentsHmrCache debug output; we usually add our own Server-Timing entries.
Real-user TTFB and LCP segmented by route, via the web-vitals library reporting to whatever analytics endpoint you use. Synthetic Lighthouse runs hide tail latency.
A before/after diff on the streamed HTML. Run curl --no-buffer against the route and watch where the chunks land. If your "flush boundary" produces one chunk, the boundary isn't doing anything — usually because something upstream awaited the same data.

curl --no-buffer -N https://your-app.example.com/product/123 \
  | while IFS= read -r line; do
      printf '%s  %s\n' "$(date +%H:%M:%S.%3N)" "$line"
    done

The timestamps in front of each chunk make it obvious whether you're streaming or just pretending to.

The gotchas that bite in production

Awaiting in a layout above the boundary. A layout that does await getUser() blocks every child's flush, boundary or not. Push the await down into a Server Component that the boundary wraps.
cookies() and headers() opt routes into dynamic rendering. That's fine, but it means the boundary's parent can't be static. If you're chasing static-first delivery, isolate dynamic APIs behind the boundary.
Middleware that rewrites or sets cookies runs before any streaming and adds to TTFB. Audit it. We've seen 200 ms middlewares completely mask the wins from careful boundary placement.
Edge runtime cold starts can dominate TTFB on low-traffic routes. Streaming doesn't help a 600 ms cold start. Either keep the route warm or move to the Node runtime if its cold start profile is better for your workload.
Error boundaries are separate. A thrown error inside a Suspense boundary doesn't render the fallback — it bubbles to the nearest error.tsx. Pair every meaningful Suspense boundary with an error boundary unless you've thought hard about what "this section failed" should look like.

Where we'd start

If you're inheriting an App Router app and want to make it faster this week: open the slowest route, identify the LCP element, and remove every Suspense boundary that sits above it. Then add exactly one boundary around the slowest below-the-fold region. Measure RUM for a week. That single pass usually recovers 100–300 ms of LCP in our experience, and it costs almost nothing.

Everything else — granular boundaries, parallel data loading, PPR — is worth it, but only after the boundaries you already have are pulling their weight. If you'd like a second pair of eyes on a route that isn't behaving, our team does these audits as part of our web development engagements.

#Next.js#React#Performance#App Router#Core Web Vitals

Want a team like ours?

72Technologies builds production software for the kind of teams who actually read this blog.

Start a project

Keep reading

Route Handlers vs Server Actions for Public APIs: Pick the Right Door

Server actions look tempting for every mutation, but they're the wrong front door for public APIs. Here's the rule we use in production, and the failure modes that taught us the difference.

July 6, 2026 6 min

Partial Prerendering in Production: What Actually Ships and What Doesn't

Partial Prerendering promised the best of static and dynamic. After shipping it across three production apps, here's what works, what breaks, and where the seams still show.

July 3, 2026 7 min

Server Actions at Scale: Why We Stopped Using Them for Everything

Server Actions in Next.js feel like magic until your app grows up. Here's where they shine, where they hurt, and the rules we now apply before reaching for them in production.

June 30, 2026 6 min