SEO & GrowthJune 15, 2026 6 min read

GA4 + GSC Joins in BigQuery: Building a Query-to-Revenue View for Programmatic SEO

Most programmatic SEO teams track impressions and revenue in separate silos. Here's how we stitch GA4 and GSC together in BigQuery to get a real query-to-revenue view that actually drives roadmap decisions.

Every programmatic SEO team we've worked with hits the same wall around month six: GSC shows queries, GA4 shows revenue, and nobody can answer "which queries actually made us money this quarter." The keyword-not-provided era never really ended — it just moved to a different layer of the stack. Here's the BigQuery pattern we use to stitch the two sides back together without lying to ourselves about attribution.

Why the join is hard in the first place

Google deliberately doesn't give you a query field on the GA4 session. GSC has the query but not the user, the conversion, or the revenue. The two systems also disagree on almost everything: time zones, sampling, deduplication, what counts as a "click" versus a "session," and how late-arriving data gets reconciled.

So any join you build is a probabilistic bridge, not a primary key match. The honest goal is: for a given landing page on a given day, distribute GSC query impressions and clicks across the GA4 sessions and revenue that actually landed there. That's the model. Anyone promising you a true 1:1 query→user join is selling something.

What you need turned on

Before any of this works:

GA4 → BigQuery export enabled (daily, ideally streaming too)
GSC → BigQuery bulk data export enabled (this gives you searchdata_url_impression and searchdata_site_impression)
Both datasets in the same region, or you'll pay egress on every query
A dim_pages table you control, with canonical URL, template type, and content cluster

If you don't have GSC bulk export on yet, stop reading and go turn it on. The sampled API will not get you there at programmatic scale.

The data model: three layers, not one giant view

We build this as three stacked layers in BigQuery. Mixing them in one query is how you end up with 40-minute dashboards and wrong numbers.

Layer 1: normalised daily facts

Two tables, both keyed on (date, page_url):

fct_gsc_daily — impressions, clicks, avg position, and an array of top queries per URL per day
fct_ga4_daily — sessions, engaged sessions, conversions, revenue, by landing page per day

The trick is URL normalisation. GSC reports the URL Google indexed; GA4 reports page_location as the browser saw it. Trailing slashes, query strings, fragments, AMP variants, and locale prefixes all break the join. We run both through the same SQL UDF.

CREATE OR REPLACE FUNCTION `analytics.normalize_url`(u STRING) AS (
  REGEXP_REPLACE(
    REGEXP_REPLACE(
      LOWER(SPLIT(u, '#')[OFFSET(0)]),
      r'\?.*$', ''
    ),
    r'/$', ''
  )
);

Not fancy. It just needs to be the same function called on both sides. Version it, and never "improve" it without a backfill.

Layer 2: the query-to-page allocation

This is where the modelling happens. For each (date, page_url), GSC gives you N queries with their own click counts. GA4 gives you M sessions on that page. We allocate sessions and revenue across queries proportionally to GSC clicks for that page on that day.

WITH gsc AS (
  SELECT
    data_date AS date,
    `analytics.normalize_url`(url) AS page_url,
    query,
    SUM(clicks) AS gsc_clicks,
    SUM(impressions) AS gsc_impressions
  FROM `project.searchconsole.searchdata_url_impression`
  WHERE is_anonymized_query = FALSE
  GROUP BY 1,2,3
),
page_clicks AS (
  SELECT date, page_url, SUM(gsc_clicks) AS total_clicks
  FROM gsc GROUP BY 1,2
),
ga4 AS (
  SELECT date, page_url, sessions, revenue, conversions
  FROM `project.analytics.fct_ga4_daily`
)
SELECT
  g.date,
  g.page_url,
  g.query,
  g.gsc_clicks,
  g.gsc_impressions,
  SAFE_DIVIDE(g.gsc_clicks, p.total_clicks) AS click_share,
  ga.sessions * SAFE_DIVIDE(g.gsc_clicks, p.total_clicks) AS allocated_sessions,
  ga.revenue  * SAFE_DIVIDE(g.gsc_clicks, p.total_clicks) AS allocated_revenue,
  ga.conversions * SAFE_DIVIDE(g.gsc_clicks, p.total_clicks) AS allocated_conversions
FROM gsc g
JOIN page_clicks p USING (date, page_url)
JOIN ga4 ga USING (date, page_url);

A few things to flag here, because they bite people:

is_anonymized_query = FALSE — GSC hides low-volume queries. Those rows have no query value. If you include them, your click_share denominators get weird. Decide once: do you allocate only across known queries, or do you bucket the anonymised tail as (anonymized) and treat it as its own "query"? We prefer the second; it keeps the totals honest.
Direct/referral sessions on the same URL get included. If a page gets organic and paid traffic, you're over-attributing revenue to organic queries. Filter ga4 to session_source = 'google' and session_medium = 'organic' before the join. Most teams forget this and inflate SEO revenue by 20–40% in our experience.
Click share is a model, not a measurement. Document it. Put it in the table description.

Layer 3: the rollups people actually look at

Nobody opens BigQuery to scan a query-level table with 80M rows. Build the aggregates the team will use:

agg_revenue_by_query_cluster — group queries into intent clusters (transactional, informational, branded)
agg_revenue_by_template — join to dim_pages and roll up by template type
agg_query_opportunity — high impressions, low click share, decent allocated revenue per click → expansion targets

This third one is where programmatic SEO teams find their next 90 days of work.

Reconciling with the GA4 UI (so finance doesn't yell)

Your SUM(allocated_revenue) for organic Google will not exactly match GA4's "Organic Search" channel revenue. Expected reasons:

GA4 thresholding and modelling on small segments
Late conversions arriving after the daily export cut
Cross-device modelled sessions that don't have a page_location in the raw export
Your organic filter being stricter than GA4's channel grouping

We aim for within 3–5%. If you're off by more, the URL normaliser is usually the culprit. Log the unmatched URLs from both sides daily — that diff catalogue is the single most useful debugging tool in this whole pipeline.

What this unlocks

Once the table exists, the questions change. Instead of "which pages get traffic," the product team starts asking:

Which query clusters have a revenue-per-click above our content production cost?
Which templates convert organic clicks at half the site average — and should we deindex them?
Which queries are growing in impressions but flat in allocated revenue (intent mismatch)?
Which pages have one breakout query carrying 80% of the revenue (fragility risk)?

That last one matters more than people think. A single algorithm update on a single query can erase a quarter. The query-to-revenue view makes that concentration visible before it bites.

Things we got wrong the first few times

We allocated by impressions instead of clicks. Position 8 impressions don't generate sessions. Use clicks.
We didn't lock the URL normaliser. Someone "fixed" it, the join recall jumped, and dashboards moved 12% overnight with no real-world change. Version it like code.
We let the query table run unpartitioned. Costs spiralled. Partition by date, cluster by page_url. Set a 25-month expiration to match GSC's own retention.
We treated the allocation as truth in exec decks. Always show the model assumption inline. "Revenue allocated proportionally to GSC clicks per URL per day" — one sentence, every chart.

Where we'd start

If you're standing this up next week: turn on both BigQuery exports today, write the URL normaliser as a UDF before you write any SQL, and build Layer 1 with a hard organic filter on the GA4 side. Don't touch Layer 3 dashboards until Layer 2 reconciles within 5% of the GA4 UI for four consecutive weeks. The teams that skip that reconciliation step end up rebuilding the whole pipeline a quarter later — we've seen it three times, and we've done it once ourselves.

If you want a hand wiring this into an existing data warehouse, our data and analytics engineering team does this kind of work, and we've written more on the upstream pieces in the programmatic SEO posts on the blog.

#SEO#Analytics#BigQuery#GA4#Search Console#Programmatic SEO

Want a team like ours?

72Technologies builds production software for the kind of teams who actually read this blog.

Start a project

Keep reading

Content Freshness Signals at Scale: When to Actually Re-Publish Programmatic Pages

Bulk-updating dateModified on a million pages is a great way to get ignored — or worse. Here's how we decide which programmatic pages deserve a real refresh, and how to wire the signal cleanly.

June 21, 2026 7 min

Faceted Navigation on Programmatic SEO Sites: Rules That Keep Google Sane

Facets are where programmatic SEO sites quietly bleed crawl budget and rank signals. Here's the rule set we use to decide which combinations earn a URL, which get noindex, and which never see a link.

June 18, 2026 6 min

GSC Bulk Data Export to BigQuery: A Practical Setup for Programmatic SEO Teams

The Search Console UI tops out at 1,000 rows and 16 months. If you run programmatic SEO, that's not enough. Here's how we wire GSC's BigQuery export into a query workflow that actually drives decisions.

June 13, 2026 6 min