← Back to dashboard

Data Sources & Methodology

This page documents exactly where every number on the funnel dashboard comes from, how it's computed, and what its limitations are. It is deliberately blunt about caveats — this dashboard gets more accurate as you point out what's wrong with it. If a number looks off, check it against the relevant section below and tell us which assumption is broken.

1. What the dashboard shows (the cohort model)

It is a cohort funnel. A "cohort" = all leads that were created on a given date (their signup date). Conversions are attributed back to the cohort the lead was created in — so a lead created in May who pays in July still counts toward the May cohort, not July. This is the right way to judge an acquisition channel or month: you measure what the leads you acquired then eventually did.

The conversion window (D3 / D7 / D30 / Lifetime) asks: "what % of the cohort converted within N days of signup?" — letting you see how fast each cohort matures and compare recent vs older cohorts fairly.

2. Where each field comes from

DimensionSourceExact definition
Lead + cohort dateLearner DB (Aurora Postgres) "User" One lead = one User registration. Cohort date = createdAt converted to IST (UTC + 5:30). This is the complete, real signup date for everyone in the app.
Conversion (₹)Learner DB "Payment" A lead "converted" on the date of their first successful payment (status = 'responseReceivedSuccess'). Revenue = sum of their successful payments (lifetime value). days_to_convert = first_payment_date − signup_date.
Exam yearDB "UserProfile".boardExamYear Class-12 board year — chosen because it's stable (a student writes boards once, but may write NEET several times). ~84% of users have it; the rest are bucketed as Unknown.
Acquisition channelGA4 → BigQuery export analytics_181916006 The user's GA4 first-touch source/medium/campaign, joined to the DB by GA4 user_id (Firebase setUserId) = DB User.id. See §3 for the exact rule + taxonomy.
Telesales vs self-serve (not yet shown; available) DB Payment.paymentDescLIKE 'SALES%' = telesales/assisted; else self-serve.

3. Channel attribution (the part to scrutinise most)

Per-lead channel is not reliably stored in the DB (signupSource empty, UTM ~0.7%) or in HubSpot (its hs_analytics_source was contaminated by a one-off manual bulk upload, and its createdate reflects upload date, not signup). So channel comes from GA4 first-touch.

The join

Your app sets the Firebase user_id = the DB User.id. So GA4's first-touch acquisition for a user maps straight to that DB lead.

The anti-bias rule (important)

We attribute a channel only when the lead's signup date is on or after their GA4 first-touch date (createdAt ≥ first_touch_date). If signup pre-dates the recorded first touch, it means GA4 only saw that user later (a surviving app user) — that "first touch" is not their real acquisition, so the lead is marked Unattributed rather than mis-tagged. This is conservative on purpose: ~42% of leads are Unattributed, and we'd rather say "unknown" than claim a wrong channel.

Channel taxonomy

BucketGA4 rule (source / medium / campaign)
Paid App-Installmedium cpc + campaign name contains appdownload / app promotion / app- / ig4a / ig_appdownload
Organic App (Play)source google-play
Paid Search/PMaxmedium cpc (any other campaign)
Organic Searchmedium organic (google, bing, etc.)
Directsource (direct)
Social / Other-ownedinstagram / youtube / facebook, or medium bio / paid_social / banner
Referralmedium referral
Unattributedno GA4 match, or signup pre-dates first touch (see rule above)

4. Known limitations & open questions (please pressure-test these)

5. Coverage snapshot

Spot something wrong, or an assumption that doesn't hold? That's the point — flag it and we'll correct the methodology.

← Back to dashboard