DevOps & CloudMay 19, 2026 6 min read

The Sentry Bill That Tripled Overnight: A Quota Postmortem

A single deploy turned a calm Sentry account into a $4k surprise. Here's what happened, what we changed, and how to stop event floods before finance notices.

We upgraded a frontend SDK on a Thursday. By Monday, Sentry had ingested more events in four days than it usually does in a quarter, and someone in finance was asking polite but pointed questions. This is the postmortem we wrote internally, sanded down for public reading.

The shape of the incident

The app in question is a mid-sized B2B dashboard. Normal volume is somewhere between 40k and 90k error events per day across web and two mobile clients, which fits comfortably inside our paid plan. On the Thursday in question, we shipped a minor version bump to a popular framework SDK — nothing dramatic in the changelog, just "improved instrumentation."

By Saturday morning the dashboard was showing roughly 1.4 million events per day. We hit the on-demand spend cap before anyone looked at a graph, and the account silently started dropping events. The bill, when it landed, was about 3.2× our usual monthly Sentry spend. Not catastrophic, but enough to be a board-deck footnote.

The annoying part: nothing was broken. Users were fine. The product was healthy. We were paying to record a particular kind of noise at extreme resolution.

What actually changed

The SDK upgrade flipped two defaults we hadn't been tracking:

Automatic instrumentation of fetch failures now captured aborted requests as errors. Our app aggressively cancels in-flight requests when users navigate, which is normal behaviour.
Console.error breadcrumbs were promoted to events under certain conditions, including a noisy third-party widget that logs a warning on every page load in Safari.

Neither is a bug. Both are defensible defaults. But the combination, multiplied by our traffic, turned a quiet stream into a fire hose.

Why we didn't notice for three days

We had a Slack alert for "new issue type," not for "event volume anomaly." The new errors were grouped into two issue fingerprints, so Slack saw two new issues, shrugged, and moved on. Our weekly digest would have caught it. The bill caught it first.

The five-minute triage

Once we realised what was happening, the stop-the-bleeding phase was straightforward. In order:

Set a hard spend cap in the Sentry org settings (we'd had a soft one).
Add inbound filters for the two dominant fingerprints.
Drop the traces_sample_rate on the affected project from 0.2 to 0.02 while we investigated.
Add a beforeSend hook to discard AbortError and the third-party widget's warning class.

The beforeSend hook is the most reusable piece. Roughly:

import * as Sentry from "@sentry/browser";

const IGNORED_MESSAGES = [
  /AbortError/i,
  /ResizeObserver loop/i,
  /Non-Error promise rejection captured/i,
];

const IGNORED_SOURCES = [
  "chrome-extension://",
  "safari-extension://",
  "moz-extension://",
];

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  tracesSampleRate: 0.02,
  beforeSend(event, hint) {
    const error = hint?.originalException;
    const message =
      typeof error === "string" ? error : (error as Error)?.message ?? "";

    if (IGNORED_MESSAGES.some((re) => re.test(message))) {
      return null;
    }

    const frames = event.exception?.values?.[0]?.stacktrace?.frames ?? [];
    if (
      frames.some((f) =>
        IGNORED_SOURCES.some((src) => f.filename?.startsWith(src)),
      )
    ) {
      return null;
    }

    return event;
  },
});

A few notes on this snippet, because the details matter:

Filter in beforeSend, not just with ignoreErrors. The latter happens earlier but is string-match only and misses some shapes.
Always return null to drop, not undefined. undefined lets the event through.
Filter browser extension noise. It is almost never your bug, and on a popular site it can be 20–40% of raw events.

The real fix: a sampling policy, not a vibe

The deeper problem wasn't the SDK. It was that we'd never written down what we wanted Sentry to do for us. Sampling was set to whatever the quickstart suggested two years ago, and no one had revisited it.

We rewrote the policy as four rules:

1. Errors are not sampled. Transactions are.

Dropping error events to save money is a trap. You lose the long-tail bugs that only fire for one user in a thousand. Instead, be ruthless about filtering classes of non-errors: aborted requests, third-party noise, expected validation failures.

Transactions (performance traces) are where sampling actually belongs. We moved to dynamic sampling: 100% for slow requests, 100% for errors, ~1% for everything else.

2. Per-route sample rates

A checkout page deserves more observability than a marketing landing page. We use tracesSampler to set rates per route:

tracesSampler: (samplingContext) => {
  const url = samplingContext.location?.pathname ?? "";
  if (url.startsWith("/checkout")) return 1.0;
  if (url.startsWith("/api/internal")) return 0.5;
  if (url.startsWith("/health")) return 0;
  return 0.01;
},

Health checks should never be sampled. Ever. We saw a non-trivial chunk of our previous spend going to traces of a Kubernetes liveness probe.

3. Release-gated quotas

For every new release, we reserve a small fraction of the monthly quota and watch the first 24 hours of event volume against a baseline. If a release drives more than 2× the rolling 7-day median event rate, we get a page. Not a Slack message — a page. This would have caught the Thursday deploy by Friday morning.

4. Owner per project

Every Sentry project now has a named owner who gets the weekly volume digest. "Platform team" is not an owner. A person is an owner. This is dull and human and works.

What we'd do differently if starting fresh

If we were setting up error monitoring on a new product today, in roughly this order:

Start with beforeSend populated. Even an empty function is a reminder it exists. Drop AbortError, extension noise, and ResizeObserver loop limit exceeded from day one.
Set the spend cap before you set the DSN. It is much easier to argue for raising a cap than for refunding an overage.
Tag events with release and environment aggressively. When something goes wrong, you want to be able to answer "is this new in v1.42?" in one click.
Treat SDK upgrades like dependency upgrades on a critical service. Read the changelog. Diff the defaults. Deploy to a canary environment and watch the volume for 24 hours.
Build a small dashboard outside Sentry. Pull the events-per-hour metric into whatever you already use (Grafana, Datadog, a Slackbot). Don't rely on the vendor's UI to surface anomalies in the vendor's billing.

A note on the comparable services

We get asked whether switching to a competitor would have helped. Honestly: no. The same event flood would have shown up on Datadog Error Tracking, Honeycomb, Rollbar, or a self-hosted GlitchTip instance. The pricing models differ but the failure mode is the same — you pay for what you ingest, and a noisy SDK ingests a lot.

The self-hosted route trades a billing problem for a storage and ops problem. For a team under about fifteen engineers, that trade is usually worse, not better. Above that, it starts to make sense if you already run Postgres and object storage seriously.

What the incident actually cost

Beyond the bill, the real cost was three engineer-days: one to triage, one to write the sampling policy, one to retrofit beforeSend hooks across four services and update the deployment runbook. We also burned a small amount of trust with finance, which is worth more than the dollars.

The events we dropped during the cap-hit window are gone. If a real bug had landed in that window, we would have missed it. That's the part that keeps us honest about prevention.

Where we'd start

If you have a Sentry account that's been running for more than a year without anyone looking at the sampling config: open it this week. Check three things — your top 10 issues by event count, your tracesSampleRate, and whether you have a spend cap. If any of those look wrong, fix the cheap one first (the cap), then the medium one (beforeSend), then the structural one (per-route sampling). You don't need a project for it. You need an afternoon.

#Sentry#Observability#Cost Control#Incident#DevOps

Want a team like ours?

72Technologies builds production software for the kind of teams who actually read this blog.

Start a project

Keep reading

Sentry Performance Costs Doubled Overnight. Here's What We Found.

Our Sentry bill jumped from ~$900 to ~$2,100 in a single billing cycle with no traffic change. Here's the investigation, the culprits we found, and the sampling strategy we settled on.

July 9, 2026 6 min

Pulumi vs Terraform in 2026: A Migration Story We Almost Regretted

We migrated a mid-sized AWS + Vercel estate from Terraform to Pulumi, hit real walls, and rolled part of it back. Here's what actually happened and when Pulumi is worth it.

July 6, 2026 7 min

OpenTelemetry Sampling: Why Head-Based Cost Us Real Incidents

We ran head-based sampling in OpenTelemetry for a year and it burned us during two real incidents. Here's what tail sampling actually costs, what it saved, and how we'd configure it from scratch.

July 3, 2026 6 min