UTM Canonical Taxonomy: From Dirty Variants to Clean

A canonical UTM taxonomy maps every dirty variant — Facebook, fb, meta — to one approved value you accept in one click, so GA4 reports stop fragmenting.

Updated Jun 30, 2026

You do not really have 40 source values — you have 8 channels typed 40 ways. Facebook, facebook, fb, and meta are one channel to you and four separate lines to GA4. A canonical taxonomy is the decision about which single spelling wins, written down once and enforced everywhere.

What a canonical value is#

A canonical value is the single, approved way to write one utm_source, utm_medium, or utm_campaign token. Everything else that means the same thing — a typo, a trailing space, a capital letter, a synonym — is a variant of it. GA4 is case-sensitive: email, Email, and EMAIL are three different mediums, so picking one spelling is not a style preference. It is the difference between one channel and three.

Canonical vs variant

Canonical is the value you keep (facebook). A variant is any other string that resolves to it (Facebook, fb, meta, facebook ). Your taxonomy is just the full set of canonicals plus the variants each one absorbs.

Facebookfacebookfbmeta
facebook
Emaile-mailEMAIL
email
Paid_Socialcpcsocialpaid-social
paid_social
BlackFridayblack-fridaybf2026
black_friday_2026
Four channels, each typed several ways, collapse to one canonical token apiece.

How canonicals get suggested — and accepted in one click#

You do not write the taxonomy by hand. Paste or upload your source/medium export and the audit groups the values for you. The Anthropic Claude API clusters semantic duplicates, case variants, and typos, then proposes one canonical token per cluster. The suggestion is bounded and reviewable — nothing changes until you approve it, and it never writes back to GA4 or your ad platforms.

  1. 1

    Run the audit

    Paste or upload a CSV of your source/medium values. No signup is needed for the first scan, which handles up to about 500 links synchronously.

  2. 2

    Read the suggested clusters

    Each cluster shows the variants found and the single canonical value proposed for them, ranked by how much traffic the merge affects.

  3. 3

    Accept in one click

    Approve the whole suggested taxonomy at once, or edit any canonical before you accept it — you stay in control of the final spelling.

  4. 4

    The map is saved

    Accepted canonicals, and the variants each one absorbs, are written to your taxonomy store and audit-logged for later review.

  1. 1Audit clusters variantsfb, Facebook, meta
  2. 2Claude suggests canonicalone token per cluster
  3. 3You accept in one clickor edit first
  4. 4Saved to taxonomy storedirty → canonical
  5. 5Builder enforces itdropdowns only
One pass: the audit suggests, you approve, the store remembers, and the builder enforces.

The taxonomy store: a dirty-to-canonical map#

Accepting a suggestion does more than relabel a report. It records a mapping: every dirty variant points to its canonical value. That map is the taxonomy store, and it is what makes the cleanup durable instead of a one-time edit. The next time a known variant shows up, the store already knows where it belongs.

How the store records each merge — every variant on the left resolves to one canonical on the right.
Facebook, fb, FB, meta, facebookfacebook
Email, e-mail, EMAILemail
Paid_Social, cpc, socialpaid_social
BlackFriday, black-friday, bf2026black_friday_2026
A few rows from a taxonomy store: many variants in, one canonical out.
Variants in your dataCanonical valueWhy they merge
Facebook, fb, metafacebookOne channel, different case and synonyms
Email, e-mail, EMAILemailGA4 reads each case as a separate medium
Paid_Social, cpc, socialpaid_socialOne medium written three ways
BlackFriday, bf2026black_friday_2026One campaign, two ad-hoc names

Variants found in your export

FacebookfacebookfbmetaPaid_SocialcpcBlackFridaybf2026

Canonical values the store keeps

facebookpaid_socialblack_friday_2026
What the store ingests from a messy export versus the canonical set it keeps.

How the taxonomy powers the governed builder#

Once the store exists, it stops being a report and becomes a guardrail. The governed link builder reads the canonical set and turns the source, medium, and campaign fields into dropdowns locked to approved values — auto-lowercased and audit-logged. The variants you just merged can no longer be typed, so the same drift does not quietly return next quarter.

One store, two jobs

The taxonomy is the contract between the audit, which finds drift, and the builder, which prevents it. The audit fills the store; the builder reads it. Keep them pointed at the same canonical set and clean links become the only links your team can make.

If you have not run the audit yet, how the audit works walks through the A–F grade and the clusters first. To agree on the canonicals as a team before you lock them, start from a UTM naming convention for teams so everyone signs off on the same spellings.

What is a canonical value in a UTM taxonomy?

A canonical value is the one approved spelling for a UTM token — for example facebook for the source, or email for the medium. Every other string that means the same thing (Facebook, fb, meta) is a variant that resolves to it. Because GA4 is case-sensitive, choosing one canonical per token is what keeps a single channel from splitting into several lines in your reports.

How do I accept suggested canonical values in one click?

Run an audit on your source/medium export. The audit clusters every variant and proposes one canonical per cluster, ranked by traffic impact. You can approve the entire suggested taxonomy at once, or edit any individual canonical first. Accepting writes the mapping to your taxonomy store — nothing changes in GA4, and you keep control of the final spelling.

What does the taxonomy store actually save?

It saves a dirty-to-canonical map: each canonical value plus the list of variants that should resolve to it. That mapping is what makes a cleanup durable. When a known variant appears again, the store already knows its canonical, and the governed builder reads the same set so new links only use approved values.

Will accepting a canonical taxonomy change my historical GA4 data?

No. The taxonomy store maps dirty variants to canonical values for your reporting and for the builder going forward; it never writes back to GA4 or your ad platforms. Existing links stay live exactly as they were. You standardize what you create from here, and the audit reconciles the history so reports line up.

How does the taxonomy connect to the governed link builder?

The builder reads the canonical set from your taxonomy store and turns the source, medium, and campaign fields into dropdowns locked to those values, auto-lowercased and audit-logged. So the variants you merged in the audit cannot be re-typed, and every new link matches the taxonomy by construction rather than by reminder.

Build your canonical taxonomy from your own data

Paste or upload your GA4 source/medium export, see the suggested canonicals in about a minute, and accept them in one click — no signup for the first scan.

Run a free audit