SEO & GrowthJune 13, 2026 6 min read

GSC Bulk Data Export to BigQuery: A Practical Setup for Programmatic SEO Teams

The Search Console UI tops out at 1,000 rows and 16 months. If you run programmatic SEO, that's not enough. Here's how we wire GSC's BigQuery export into a query workflow that actually drives decisions.

If you run a programmatic SEO site with more than a few thousand URLs, the Search Console UI is a toy. You hit the 1,000-row cap on every report, the 16-month window quietly truncates your trend lines, and anonymized queries swallow your long tail. The fix Google shipped — the GSC Bulk Data Export to BigQuery — is the closest thing we have to raw search log access, and most teams still aren't using it well.

This is how we set it up on programmatic sites, what the schema actually looks like in practice, and the queries we keep coming back to.

Why the bulk export matters for programmatic SEO

The Search Analytics API is fine for dashboards. It breaks down the moment you want to answer questions like:

Which of our 80,000 template-generated pages got at least one impression in the last 30 days but zero clicks?
How did average position shift on /city/{slug}/jobs pages after we rewrote the H1 template?
Which queries are we ranking for that we never built a page for?

Those questions need joins, window functions, and a lot more than 1,000 rows. The bulk export drops daily partitions of raw performance data — including anonymized queries as aggregated rows — straight into a BigQuery dataset you own. No sampling caps. No 16-month wall once you start accumulating history.

The export is forward-only. The day you enable it is day zero. If you're reading this and haven't set it up, that's the one action worth taking before you finish the article.

Setting up the export

The official setup is documented by Google, but a few things bite teams in production. Here's the order we follow.

1. Create a dedicated project and dataset

Don't dump GSC data into your main analytics project. We use a separate GCP project per property family, with a single dataset like searchconsole. This keeps IAM tidy and makes cost attribution obvious when finance asks.

2. Grant the service account

GSC writes via a Google-managed service account: search-console-data-export@system.gserviceaccount.com. It needs:

BigQuery Job User on the project
BigQuery Data Editor on the dataset

If either is missing, the export silently fails for the day and you find out a week later when a partition is empty.

3. Enable the export in Search Console

In the property settings, under Bulk data export, paste the project ID and dataset name. Google validates permissions on save. If validation passes but data doesn't arrive within 48 hours, check the project's BigQuery audit logs — usually it's a billing issue on the destination project.

4. Set partition expiration immediately

This is the one most teams skip. The export creates date-partitioned tables. Without an expiration, you'll be storing every partition forever, and on a busy property that's not free.

ALTER TABLE `your-project.searchconsole.searchdata_site_impression`
SET OPTIONS (
  partition_expiration_days = 1095  -- 3 years, adjust to taste
);

ALTER TABLE `your-project.searchconsole.searchdata_url_impression`
SET OPTIONS (
  partition_expiration_days = 1095
);

If you want infinite history, fine — but make that a conscious decision, not a default.

The schema, briefly

You get three tables. The two that matter for daily work are:

searchdata_site_impression — aggregated at the property level. Good for trend lines and query-level analysis.
searchdata_url_impression — aggregated at the URL level. This is the one programmatic SEO teams live in.

Key columns on searchdata_url_impression:

Column	Notes
`data_date`	Partition column. Always filter on this.
`url`	Full URL, including protocol and trailing slash quirks.
`query`	NULL when the query was anonymized.
`is_anonymized_query`	Boolean. About 30–50% of rows on long-tail sites in our experience.
`country`, `device`, `search_type`	Useful for cohorting.
`impressions`, `clicks`, `sum_top_position`	Compute average position as `sum_top_position / impressions + 1`.

That last detail trips everyone up. The export doesn't store average position directly — it stores the sum of top positions, and you derive the average yourself. The + 1 is because the API uses zero-indexed positions internally.

Queries we actually use

These are the ones we keep saved in a shared workspace. They're not clever — they're useful.

Find template pages that get impressions but no clicks

SELECT
  url,
  SUM(impressions) AS impressions,
  SUM(clicks) AS clicks,
  SAFE_DIVIDE(SUM(sum_top_position), SUM(impressions)) + 1 AS avg_position
FROM `your-project.searchconsole.searchdata_url_impression`
WHERE data_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND CURRENT_DATE()
  AND search_type = 'WEB'
  AND url LIKE '%/city/%/jobs%'
GROUP BY url
HAVING impressions > 50 AND clicks = 0
ORDER BY impressions DESC
LIMIT 500;

These are your title/meta rewrite candidates. The template is ranking — the SERP snippet just isn't earning the click.

Detect template-level position drift after a deploy

If you ship a change to a template on, say, March 12, you want to know whether average position on URLs matching that template moved against the rest of the site.

WITH daily AS (
  SELECT
    data_date,
    CASE WHEN url LIKE '%/city/%/jobs%' THEN 'template_a' ELSE 'other' END AS bucket,
    SUM(sum_top_position) AS sum_pos,
    SUM(impressions) AS imps
  FROM `your-project.searchconsole.searchdata_url_impression`
  WHERE data_date BETWEEN '2026-02-26' AND '2026-03-26'
    AND search_type = 'WEB'
  GROUP BY 1, 2
)
SELECT
  data_date,
  bucket,
  SAFE_DIVIDE(sum_pos, imps) + 1 AS avg_position
FROM daily
ORDER BY data_date, bucket;

Plot that in your BI tool of choice. If the template line diverges from the control line at the deploy date, you have a signal.

Find queries you rank for but don't have pages for

SELECT
  query,
  SUM(impressions) AS impressions,
  SUM(clicks) AS clicks,
  COUNT(DISTINCT url) AS ranking_urls
FROM `your-project.searchconsole.searchdata_url_impression`
WHERE data_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) AND CURRENT_DATE()
  AND query IS NOT NULL
  AND search_type = 'WEB'
GROUP BY query
HAVING impressions > 200 AND ranking_urls > 1
ORDER BY impressions DESC
LIMIT 1000;

Multiple URLs ranking for the same query usually means cannibalization or a missing canonical destination. On programmatic sites, it's often a sign your template taxonomy needs a new node.

Cost control

The export itself is free. Querying it is not. A few habits keep the bill sane:

Always filter on data_date. Unfiltered scans against years of url-level data are the single biggest cost trap. We've seen a junior analyst burn three figures on one accidental SELECT *.
Use clustering. The exported tables are partitioned by date but not clustered. On large properties, add a clustering key on url or query via CREATE TABLE ... AS SELECT into a derived table for repeated analysis.
Materialize daily rollups. If your dashboard hits the raw tables every refresh, you're paying for the same scan over and over. A scheduled query that builds a daily_url_summary table cuts dashboard costs by an order of magnitude.
Set a project-level query cost cap. BigQuery supports custom quotas. Use them.

Joining with GA4

If you also export GA4 to BigQuery, you can join sessions and engagement to the URLs that earned the impressions. Keep in mind GA4's page_location includes query strings and fragments — strip them before joining, or your match rate will be embarrassing.

What about GSC's existing API?

We still use the Search Analytics API for two things: ad-hoc lookups when we don't want to write SQL, and backfill before the export was enabled. For anything recurring — anomaly detection, template performance, cannibalization audits — BigQuery wins on every dimension that matters.

Where we'd start

If your programmatic site is on Search Console and you haven't turned the export on, do that today, even if you have no plan to query it for a month. Every day you wait is a day of data you can never get back. Then:

Set partition expiration on both tables.
Write one scheduled query that materializes a daily URL-level rollup.
Build one dashboard answering one question your team actually argues about — usually "did the last template change help or hurt?"

Everything else compounds from there. If you'd like a hand wiring this into a broader content engine, our data and analytics work is where we usually start that conversation.

#SEO#BigQuery#Search Console#Programmatic SEO#Analytics

Want a team like ours?

72Technologies builds production software for the kind of teams who actually read this blog.

Start a project

Keep reading

Content Freshness Signals at Scale: When to Actually Re-Publish Programmatic Pages

Bulk-updating dateModified on a million pages is a great way to get ignored — or worse. Here's how we decide which programmatic pages deserve a real refresh, and how to wire the signal cleanly.

June 21, 2026 7 min

Faceted Navigation on Programmatic SEO Sites: Rules That Keep Google Sane

Facets are where programmatic SEO sites quietly bleed crawl budget and rank signals. Here's the rule set we use to decide which combinations earn a URL, which get noindex, and which never see a link.

June 18, 2026 6 min

GA4 + GSC Joins in BigQuery: Building a Query-to-Revenue View for Programmatic SEO

Most programmatic SEO teams track impressions and revenue in separate silos. Here's how we stitch GA4 and GSC together in BigQuery to get a real query-to-revenue view that actually drives roadmap decisions.

June 15, 2026 6 min