ProductsDocsBlogConsultingAboutContactGet Started
Back to BlogIdentity resolution engine merging customer records from Whop, FanBasis, and Stripe into a single master profile in Google Sheets for accurate commission tracking
6 min readMageSheet Team

Solving Customer Identity & Alias Pollution in Digital Commerce

ecommercedata-integrityautomation

Selling digital products across diverse platforms like Whop, FanBasis, and Stripe is an excellent strategy for maximizing reach. However, it introduces a massive operational headache: Alias Pollution.

When a customer subscribes via Whop using their personal email, purchases an add-on via FanBasis using a work email, and later uses their spouse's credit card through Stripe, your database treats them as three entirely disjointed customers.

For businesses relying on sales representatives or affiliate models, this identity fragmentation makes it nearly impossible to accurately track lifetime value (LTV) or calculate fair commissions without hours of miserable, manual spreadsheet checking—a prerequisite for anything we build in our B2B sales commission tracking pillar.

This post covers the true operational cost of alias pollution, the identity-resolution pattern that actually works, and how to build a minimum viable version in Google Sheets.

The True Cost of Alias Pollution

When customer data is fragmented across platforms:

  1. Commission Disputes: Sales reps lose credit for their leads because the payment email didn't match the CRM email exactly. One sales rep brought in a customer who later bought through a different platform under a different email, and the commission went to nobody — or worse, to the wrong rep. Trust in the payout system quickly deteriorates, and top sales reps disengage or leave.
  2. Reconciliation Nightmares: Operations managers spend days at the end of every month cross-referencing names, transaction IDs, and IP addresses just to run payroll. A 3-person ops team can lose 40-60 hours per month to this single task.
  3. Inflated Acquisition Costs: Your marketing metrics skew heavily. You might think you acquired three new customers when, in reality, it was one highly engaged super-user using multiple aliases. Your CAC calculation is wrong; your LTV calculation is wrong; your lookalike audiences on Meta and Google are targeting phantom segments.
  4. Support Confusion: A customer contacts support saying "I can't access my account" and your agent finds three partial records. Resolution time triples, and the customer loses faith in your brand.
  5. Legal and Compliance Risk: GDPR and CCPA deletion requests need to resolve all records for a person, not just the one email they mentioned. If you can't consolidate identities, you can't cleanly comply with "delete my data" requests — which is an increasing legal exposure.

The pollution is usually invisible until you look — which is why most teams underestimate it and then discover it the hard way at the end of a quarter.

Enter the Auto-Match Identity Engine

The solution is moving away from basic "VLOOKUP" formulas and implementing algorithmic Identity Resolution.

A proper Auto-Match engine acts as a digital detective. Instead of demanding perfect 1:1 email matches, it profiles customers dynamically across multiple signals:

  • Email (primary signal) — exact match and fuzzy variants (john.doe vs johndoe).
  • Phone number — normalized to E.164 format, matched across platforms.
  • Name — fuzzy-matched with normalization (trim, lowercase, accent-strip).
  • Shipping address — normalized and matched at the postal code + street level.
  • Payment method fingerprint — the last-4 and BIN of a card hint at family-sharing.
  • IP address on login — persistent IP overlap across "different" customers is a strong signal.
  • Purchase timing patterns — someone who buys every Tuesday across three platforms is likely the same person.

The engine computes a confidence score (0-100) per candidate match. Above 90: auto-merge. Between 70-89: queue for human review. Below 70: leave separate. Each decision is logged with the signals it used, so a human can audit any merge that looks questionable.

When john.doe@gmail.com makes a purchase, the system not only logs the transaction but checks its Alias Database. If it detects that john.corp@business.com is a known alias linked to John's master profile (matching phone + shipping address + IP pattern), it intelligently maps the revenue to the correct original entity and the correct sales representative.

Secure Consolidation within Google Workspace

While enterprise Customer Data Platforms (CDPs) charge thousands of dollars per month for this capability, modern Google Apps Script solutions like the Sales Commission Tracker V3 bring this advanced matching engine directly into your secure Google Workspace.

By executing these complex identity resolutions at the spreadsheet level, businesses can eliminate alias pollution instantly, secure accurate payouts, and reclaim countless hours of monthly administrative bloat. The Sheet is the interface finance and sales ops teams already know; the engine lives underneath without asking them to learn a new tool.

Minimum Viable Identity Engine in Sheets

If you're building this from scratch, the minimum viable version has four tabs:

  1. Raw Transactions — one row per raw event from any platform (Whop, FanBasis, Stripe, Gumroad). Columns: platform, platform_transaction_id, email, name, phone, shipping_address, payment_fingerprint, timestamp, amount.
  2. Identity Map — one row per discovered alias. Columns: alias_email, master_customer_id, confidence_score, signals_matched, resolved_at.
  3. Master Customers — one row per unique person. Columns: master_customer_id, canonical_email, canonical_name, first_seen, total_ltv, commission_owner.
  4. Unresolved — flagged rows where confidence was 70-89 and a human needs to confirm or reject the merge.

Apps Script runs the matching logic on a scheduled trigger (daily is usually enough), populates the Identity Map, and updates the Unresolved queue. A human reviews Unresolved weekly — most entries resolve in under 30 seconds with the right tooling.

Roughly 2-3 days of development for a competent Apps Script engineer gets you from zero to working. The ongoing maintenance cost is minimal after the initial tuning period.

Common Pitfalls

  1. Over-aggressive auto-merge. If your confidence threshold is too low, you'll merge distinct customers (spouses, business partners, shared family accounts). Err toward queuing for review, not auto-merging.
  2. No audit trail. Merges should be reversible. Keep a log of every merge decision with the signals used, so when someone disputes it you can roll back.
  3. Ignoring cross-platform timezones. A purchase at 23:00 UTC from one platform and 17:00 EST from another are the same time. Normalize to UTC before comparing timestamps.
  4. Treating email as the master ID. Emails change; people change jobs. Use a stable internal master_customer_id that persists across canonical email changes.
  5. Not communicating the new system to sales reps. If commission attribution changes after identity resolution, sales reps need to understand why. Blindsiding them creates more disputes than alias pollution did.

Getting Started

Identity resolution is one of those problems that gets worse the longer you ignore it. Every month of pollution compounds into harder cleanup later. If you have two or more sales channels and any form of revenue-based commissions, this is worth solving now rather than quarter-end.

The Sales Commission Tracker V3 handles the identity-resolution layer as part of its broader commission engine — you get accurate payouts and clean attribution without building the identity piece yourself.

Further Reading

Frequently Asked Questions

What actually causes alias pollution in the first place?

Four main drivers: (1) customers using different emails by context — personal email for free trials, work email for paid subscriptions, spouse's card for gifts, (2) sales platforms that don't share a unified customer ID (Whop, FanBasis, Stripe, Gumroad all have their own), (3) typos and variant formats (John.Doe vs john.doe vs john doe@gmail.com), and (4) family sharing on payment methods. Each one is benign individually; combined across thousands of customers, they fragment your view of who actually bought what.

Can't I just match on email and call it a day?

You can, and most stores do until it breaks. Email matching works for ~70-80% of cases but misses the 20-30% where customers use different emails across platforms. On a 10,000-customer base, that's 2,000-3,000 misattributed transactions — roughly the size of one sales rep's entire commission pool. It usually doesn't fail loudly; it fails silently as quiet commission disputes and slowly drifting CRM accuracy.

How does a proper auto-match identity engine decide two records are the same person?

Multi-signal matching. A proper engine looks at email (primary), then phone number, name (fuzzy-matched with normalization), shipping address, IP address on login, payment-method fingerprint, and purchase timing patterns. Any one signal is weak; three or more matching signals with no conflicting signal is typically treated as the same person. The engine outputs a confidence score (0-100) and auto-merges above 90, queues for human review between 70-89, and leaves separate below 70.

Is this worth building if I only have one sales channel?

Probably not. If you only sell through Shopify or only through Stripe, you have one customer ID system and alias pollution is a small problem. This becomes worth solving the moment you have two or more channels (Shopify + Gumroad, Magento + Stripe subscriptions, Whop + FanBasis), or when you start paying sales commissions based on revenue attribution. Below that threshold, manual reconciliation monthly is fine; above it, an auto-match engine pays for itself in weeks.

What's the minimum viable version of this in Google Sheets?

Four tabs: (1) Raw Transactions — one row per raw event from any platform, (2) Identity Map — one row per discovered alias with a master customer ID, (3) Master Customers — one row per unique person with their master ID and canonical profile, (4) Unresolved — flagged rows where confidence was too low for auto-merge. Apps Script runs the matching logic on schedule, populates the Identity Map, and a human reviews the Unresolved queue weekly. Roughly 2-3 days of development for a competent Apps Script engineer.

Stay Updated

Get the latest insights on AI, e-commerce, and Magento delivered to your inbox.