All articles
SEO & GrowthMay 17, 2026 6 min read

GA4 + GSC: Building a Query-Level Performance Pipeline That Actually Tells You What to Fix

GA4 hides intent. GSC hides revenue. Joining them at the query level is where the real SEO decisions live — here's how we wire it up without losing our minds.

GA4 + GSC: Building a Query-Level Performance Pipeline That Actually Tells You What to Fix

Every SEO dashboard we inherit has the same flaw: GA4 tells you what users did, GSC tells you what queries brought them, and nobody can answer the obvious question — which queries are making us money, and which pages are bleeding traffic we already paid to acquire? The fix isn't another Looker template. It's a small data pipeline.

This is how we build one for clients running programmatic SEO surfaces, content sites monetised with AdSense, and ecommerce catalogs. The shape is the same; the joins are what change.

Why the default integrations aren't enough

Google gives you two native options: link GSC to GA4 in the UI, or pull both into Looker Studio. Both are fine for a glance. Neither survives contact with a site that has more than a few thousand indexed URLs.

The specific problems we keep hitting:

  • GA4's GSC integration is page-level, not query-level. You see clicks per landing page, not which query drove which session.
  • Sampling. Once you cross a few million events in an exploration, GA4 silently samples. You won't notice until your numbers stop reconciling.
  • Query anonymisation in GSC. Roughly 30 – 50% of queries get hidden behind (other) for privacy. You have to design around it, not pretend it isn't there.
  • Date alignment. GSC reports in Pacific time with a 2 – 3 day lag. GA4 reports in your property timezone, near real-time. Naive joins produce phantom drops.

The answer for any site past the toy stage is the same: export both to BigQuery and do the joins yourself.

The pipeline at a glance

Here's the shape we end up with on most projects:

GSC ──► BigQuery (bulk export, daily)
                    │
                    ├──► staging.gsc_daily
                    │
GA4 ──► BigQuery (native export, daily + streaming)
                    │
                    ├──► staging.ga4_sessions
                    │
                    ▼
          marts.page_query_performance
                    │
                    ▼
        Looker Studio / internal tool

Three moving parts, all free or near-free up to reasonable volumes:

  1. GSC Bulk Data Export to BigQuery (set up once in Search Console settings).
  2. GA4 BigQuery linking (free for standard properties, with a daily export job and an optional streaming one).
  3. A scheduled query in BigQuery that joins them into a single mart table.

Cost reality check

For a site doing ~500k sessions/month and ~2M GSC impressions, our BigQuery bill typically runs in the single-digit USD per month range, mostly storage. The cost spikes come from poorly partitioned queries hitting full table scans — solvable with discipline, not budget.

Setting up the exports

The GSC export lives under Settings → Bulk data export in Search Console. You give it a BigQuery project and a dataset; it creates three tables, of which searchdata_url_impression is the one that matters. It contains a row per (date, URL, query, country, device, search type).

GA4's export is configured in the Admin panel under BigQuery Links. Pick "Daily" at minimum. If you can stomach the slightly higher cost, add streaming so you can debug tracking issues without waiting overnight.

A few non-obvious setup notes from production:

  • Turn on GSC export before you need the data. It only starts collecting from the day you enable it — there's no backfill.
  • Set BigQuery dataset location to match where your queries will run. Cross-region joins will cost you.
  • Add a _TABLE_SUFFIX filter to every query you write against GA4. The export creates one table per day (events_YYYYMMDD), and forgetting the suffix is how you scan a year of data by accident.

The join that actually matters

The useful unit of analysis is (page, query, date). From there you can roll up to page, query, cluster, or template. Here's a trimmed version of the mart query we use:

WITH gsc AS (
  SELECT
    data_date AS date,
    url,
    query,
    SUM(impressions) AS impressions,
    SUM(clicks) AS clicks,
    SAFE_DIVIDE(SUM(sum_top_position), SUM(impressions)) + 1 AS avg_position
  FROM `proj.searchconsole.searchdata_url_impression`
  WHERE data_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
                      AND DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)
    AND is_anonymized_query = FALSE
  GROUP BY date, url, query
),
ga4 AS (
  SELECT
    PARSE_DATE('%Y%m%d', event_date) AS date,
    CONCAT('https://www.example.com',
           (SELECT value.string_value FROM UNNEST(event_params)
             WHERE key = 'page_location')) AS url,
    COUNT(DISTINCT (SELECT value.int_value FROM UNNEST(event_params)
                     WHERE key = 'ga_session_id')) AS sessions,
    COUNTIF(event_name = 'purchase') AS purchases,
    SUM(IF(event_name = 'purchase', ecommerce.purchase_revenue, 0)) AS revenue
  FROM `proj.analytics_XXXXXX.events_*`
  WHERE _TABLE_SUFFIX BETWEEN
        FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY))
    AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
  GROUP BY date, url
)
SELECT
  g.date, g.url, g.query,
  g.impressions, g.clicks, g.avg_position,
  a.sessions, a.purchases, a.revenue,
  SAFE_DIVIDE(a.revenue, g.clicks) AS revenue_per_click
FROM gsc g
LEFT JOIN ga4 a USING (date, url);

A few things to call out:

  • We lose query granularity on the GA4 side, because GA4 doesn't know which query drove a session. The join is (date, url), and the query column comes only from GSC. That's fine — we're attributing page-level revenue across the queries that drove impressions to that page, proportional to clicks downstream.
  • We drop anonymised queries in the GSC CTE. Keep them in a separate aggregate if you want to reconcile totals.
  • The + 1 on avg_position is because GSC's sum_top_position is zero-indexed. People forget this and end up with charts that say their best query is at position 0.4.

Schedule it, don't run it

Wrap the query in a BigQuery scheduled query that writes to marts.page_query_performance partitioned by date. Use WRITE_TRUNCATE on a rolling 90-day window if you can stomach the rewrites; otherwise MERGE on (date, url, query). Schedule it for ~04:00 in your timezone — late enough that yesterday's GA4 export has landed, early enough that your team sees fresh data at standup.

What you do with the mart

This is where most pipelines die: built, never used. Three views earn their keep on almost every project we run.

1. The "striking distance" view

Queries with impressions > 500/month, average position between 8 and 20, and zero or near-zero clicks. These are pages Google is willing to rank but where you're losing the SERP. Filter to commercial intent (we tag queries with simple regex rules in a sidecar table) and you have your content refresh backlog for the quarter.

2. The cannibalisation view

Group by query, count distinct URLs receiving impressions in the last 28 days. Anything with two or more competing URLs where neither cracks position 5 is a cannibalisation candidate. Surface the URLs side by side with their respective click and revenue numbers — the decision (consolidate, canonicalise, or differentiate) becomes obvious.

3. The decay view

Week-over-week delta on clicks, joined to publish date and last-modified date from your CMS. Pages older than 12 months losing more than 20% week-over-week are your refresh candidates. We've seen this single report justify entire content ops headcounts.

Honest limitations

A few things this pipeline won't do, no matter how clever the SQL:

  • It won't recover anonymised queries. Plan content around topics, not single keywords.
  • It won't give you true query-to-revenue attribution. GA4 doesn't see the query. Proportional attribution by clicks is the closest honest approximation.
  • It won't replace judgement. A page with high impressions and no clicks might be the wrong intent match, a weak title, a hostile SERP feature, or all three. The data narrows the search; humans still decide.

If you're running AdSense surfaces alongside this, layer AdSense's BigQuery export in too — (date, url) joins cleanly, and suddenly your revenue-per-click column tells you which content templates are worth scaling. We've written more about that monetisation side over on the 72Technologies blog.

Where we'd start

If you're staring at a dashboard that doesn't answer questions, don't rebuild it. Start smaller:

  1. Enable the GSC bulk export today. You can't backfill it, so every day you wait is a day of data you'll never have.
  2. Link GA4 to BigQuery, daily export only at first.
  3. Write one query — the striking-distance view above is the highest ROI in our experience — and run it manually for two weeks before you automate anything.
  4. Once the team is actually opening it, then build the mart, then the scheduled query, then the dashboard.

The pipeline is the boring part. The discipline of looking at it every Monday morning is what moves traffic. If you want help wiring this into a content ops workflow, that's the kind of work our growth engineering team does week in, week out.

#SEO#Analytics#GA4#Search Console#BigQuery

Want a team like ours?

72Technologies builds production software for the kind of teams who actually read this blog.

Start a project