How the UTM Audit Engine Works: From CSV to A–F Grade
How the UTM audit engine works: it parses your export, clusters duplicate values with the Claude API, grades it A–F, and ranks issues by traffic impact.
Updated Jun 30, 2026
You paste a GA4 source/medium export, and about a minute later you have an A–F grade and a ranked list of what is splitting your channels. This page opens that minute up: every stage the utm audit engine runs, what the Claude API does and does not touch, and why a person approves the result before anything is treated as canonical.
The five stages of the audit engine#
The engine is a short pipeline. Each stage hands a cleaner, more structured view to the next, and only one stage uses AI. Everything runs synchronously for up to roughly 500 links, so you watch it happen rather than wait on a queue. The first public scan has no signup wall.
- 1ParseSplit each row into source / medium / campaign
- 2ClusterClaude API groups duplicates → canonical tokens
- 3GradeA–F health score + one-sentence reason
- 4RankIssues sorted by traffic impact
- 5BreakdownBy dimension + the Unassigned bucket
- 1
1. Parse the export
Each row is split into its
utm_source,utm_medium, andutm_campaignvalues (plus term and content when present). The engine trims stray whitespace, decodes encoded characters, and flags rows with no UTM at all — the ones GA4 drops into Unassigned. See exporting the right GA4 report for the exact columns to include. - 2
2. Cluster the duplicates with Claude
This is the one AI stage. The Anthropic Claude API reads the distinct values in each dimension and groups semantic duplicates —
fb,Facebook,facebook, andmeta— along with case variants and typos into a single proposed canonical token. GA4 is case-sensitive, soemail,Email, andEMAILare three different mediums until something collapses them; the engine proposes the collapse, it does not assume it. - 3
3. Grade it A–F
The engine scores how fragmented the values are and how many sessions are sitting in Unassigned, then returns a letter grade with a one-sentence reason. The grade is deliberately blunt — a number you can put on a slide — and it gives you a sense of how deep the cleanup goes before you spend time on it.
- 4
4. Rank issues by traffic impact
Every issue cluster is sorted by the share of sessions it touches, biggest first. A case duplicate splitting 30% of your paid social outranks a one-off typo on a dead campaign. You read down the list and stop when the curve flattens.
- 5
5. Break it down by dimension
Finally the engine separates the damage across
utm_source,utm_medium,utm_campaign, and the Unassigned bucket, so you can see whether the drift is a few mis-cased sources or a campaign-naming free-for-all. Each dimension links straight to its clusters.
Where the AI stops and you decide#
The clustering step is bounded on purpose. The Claude API returns a proposal: here are the values it believes are the same channel, and here is the canonical token it suggests for each group. Nothing about that proposal is applied until you approve it, and approval is one click per group. Override anything that does not match how your team actually talks — if you say paid-social and the engine guessed cpc, you keep paid-social.
What the audit engine never does
It does not write to GA4, and it does not touch your ad platforms. There is no API call back into Google or Meta. The output is a variant→canonical map plus a grade — a diagnosis you act on in your reporting layer, not an automated change to your live data.
Why ranking by traffic impact matters#
A raw list of every inconsistency is noise. The engine ranks clusters by the sessions they affect because that is what changes your numbers. When UTM data is fragmented, as much as 26% of conversions can be credited to the wrong channel, and 10–20% of GA4 sessions commonly land in Unassigned — but those losses concentrate in a few high-traffic clusters, not evenly across the export.
| Dimension | What the engine checks | Typical first finding |
|---|---|---|
| utm_source | Case + nickname duplicates of the same channel | Facebook and facebook split in two |
| utm_medium | Synonyms and casing (GA4 is case-sensitive) | Email, email, EMAIL read as three mediums |
| utm_campaign | Separators, dates, and typos in campaign names | spring_sale vs Spring-Sale vs springsale |
| Unassigned | Sessions with no UTM the engine can map | 10–20% of sessions with un-tagged links |
Once you approve the canonical tokens, that map is what you carry into your reporting layer and into the governed taxonomy you lock for new links. Because the audit cannot rewrite GA4 history — past hits are immutable — the map is how Facebook and facebook finally sum to one channel where you report. The deeper mechanics of why a single capital letter forks a channel are covered in UTM case sensitivity in GA4.
Run it again later
The same engine powers automated drift detection: re-run the audit on a schedule, and the grade plus the ranked clusters become a trend you can watch instead of a one-time cleanup.
How does the UTM audit engine cluster duplicate sources?
The cluster stage sends the distinct values in each dimension to the Anthropic Claude API, which groups semantic duplicates — nicknames like fb and meta, case variants like Facebook and facebook, and typos — into one proposed canonical token per group. It is the only AI stage in the pipeline, and it returns a proposal, not a change. You approve each group in one click, and you can override any token that does not match how your team names things.
Does the audit engine change my GA4 or ad platform data?
No. The audit is read-only. It parses the CSV you paste or upload and produces a grade plus a variant→canonical map; it never writes back to GA4 or any ad platform, and it makes no API calls into Google or Meta. GA4 hits are immutable anyway, so the practical fix is to apply the approved map where you report (Looker Studio, BigQuery, or a spreadsheet) and to route new links through a governed builder so the drift stops at the source.
How does the engine decide the A–F health grade?
The grade is deterministic math on top of the clusters you keep: it weighs how fragmented your source, medium, and campaign values are against how many sessions land in Unassigned. A clean account where the same value is always written the same way grades toward an A; a year-old shared account with case duplicates and un-tagged links commonly lands around a C. The grade is most useful as a trend — show that you moved from a C to a B and that the channel totals can now be trusted.
How are UTM issues ranked by traffic impact?
Each issue cluster is scored by the share of sessions it affects and sorted biggest-first. A case duplicate splitting a third of your paid social traffic ranks above a one-off typo on a campaign that drove a handful of clicks. This lets you fix the few clusters that move the channel totals and stop reading once the impact curve flattens, instead of grinding through every inconsistency in the export.
Can I override what the AI suggests?
Yes — that is the point of the one-click approval. The Claude API proposes the groupings and a canonical token for each, but nothing is treated as canonical until you accept it. You can rename a suggested token, split a group the engine merged, or merge two it kept apart. Your decisions become the locked taxonomy that the governed link builder then enforces going forward.
See the engine run on your own data
Paste a GA4 export and get an A–F grade with ranked issue clusters in about a minute. No signup for the first scan.