Tivalio logoTivalio
DataGuides

Why averages hide your worst onboarding cohort

Your mean TTV looks fine. Your p75 tells a different story. A short read on why distributions beat averages — always.

March 28, 20265 min read

The comfort of the mean

It is Wednesday standup. The PM flips to a slide and reads the number out loud: "Our average time to value is 5.2 days." Nobody asks a follow-up. The CEO writes it in a doc. The next slide comes up. The meeting moves on.

Meanwhile, inside the product, thirty percent of last month's signups are still stuck somewhere between day twelve and day eighteen. They are not in the 5.2-day number. They are underneath it. The arithmetic mean took a thousand different user journeys, added them up, divided by a thousand, and produced a single reassuring scalar that has almost nothing to say about any individual user. Nobody in the standup is lying. They are just watching the wrong number.

The mean is comforting for three reasons. It is one number. It is easy to compute. It fits on a slide. Those are the exact three reasons it cannot be trusted as a summary of user behavior. A metric that is cheap to report is almost always cheap because it throws away the information that would make it inconvenient.

What the mean misses

Healthy — tight distribution
Broken — 40% stuck

Same mean (5.2 days). Different reality.

Here is the failure mode in one chart. Two cohorts. Same population size. Same mean. One is a healthy product. The other is a product that is quietly bleeding forty percent of its new signups. The mean does not tell you which is which. The shape does.

The first failure is bimodality. Some of your users reach value fast because the product works for them on day one. Some take weeks because the product barely works for them at all. When you average the two groups together, the mean lands somewhere in the empty space between the two peaks, describing a user who does not exist. A 5.2-day mean in a bimodal distribution is not a typical user. It is an arithmetic ghost.

The second failure is the long tail. A cohort with a tight cluster around day three and five percent of users stuck out near day twenty-one will produce a mean that looks merely slightly elevated. You read that slightly elevated number, shrug, and move on. What you missed is that the five percent out in the tail are four times more likely to churn by day thirty than the users who reached value before the median. Those are real users with real credit cards, and the mean hid them.

The third failure is cohort fragmentation. Your paid-search users and your referral users live in completely different TTV distributions. Pooling them into one weekly mean is the statistical equivalent of averaging a sprinter and a marathoner and reporting the result in seconds. You get a number. It is well-typed. It is also telling you nothing about either group. The channel that is breaking is invisible until you segment.

The fourth failure is the silent drift. Your mean stays flat for six weeks, then moves from 5.2 to 5.6 days. A four-tenths shift looks like noise. Underneath, your p95 has moved from nineteen days to twenty-seven, because a specific cohort broke in a specific way after a release. The mean cannot see a p95 problem. By definition, it smooths it out.

Three numbers that tell the truth

If you are going to reduce the distribution to a small number of numbers, three is the right count. Not one. Three.

p50
the typical user
p75
the slow half
p95
the churn zone

The p50 is the median. Half of your users reach value in this time or less. It is the honest version of the number most people meant when they said "average." It is not skewed by the tail, because the tail lives past it. If your p50 is two days, you can put that on a landing page and mean it.

The p75 is the slow half. It answers "how bad is it for the users who are not already in a hurry?" If p75 is three times your p50, a quarter of your users are still working on it well past the point where a typical user has moved on. That gap is where your onboarding is breaking for the people who needed the most help and got the least.

The p95 is the churn zone. This is the time it takes your slowest five percent to reach value, and those users are the ones quietly deciding the product is not for them. They do not tell you. They just stop logging in. A p95 twenty times higher than your p50 is not a statistical artifact. It is a population. It is a real group of users who are about to churn on a schedule you can predict.

The rule is simple: if the distance between your p50 and p95 is more than 4x, you have a hidden cohort problem. The mean will not show it. The percentiles will. You need to find that cohort before the end of the month.

The three numbers together are also the cheapest honest summary of a distribution that exists. You can write p50, p75, p95 on one line of a weekly report and communicate more about your product's onboarding health than a month of mean-based dashboards ever could. You do not need a Chart. You need three numbers and the discipline to look at all three.

What to do tomorrow

Three concrete moves, none of them require a data engineer.

First, pull p75 for your top three acquisition channels separately. Not the mean. The p75. In most products I have looked at, the worst channel is at least 2x the best one, and it is almost always the one you spend the most money on. If you cannot pull p75 by channel in one click, that is the first operational thing to fix.

Second, plot a histogram, not a line chart. A line chart of weekly mean TTV is the single least useful visualization in product analytics. It smooths over the only thing that matters. A histogram of per-user times for the last thirty days of signups tells you in one glance whether your distribution is unimodal and tight, unimodal and long-tailed, or actively bimodal. That shape is the diagnostic. Print it. Put it on a wall.

Third, segment the slowest decile by plan, company size, and first-session behavior. Not the whole population. The bottom ten percent. This is the most valuable ten minutes of analysis work you can do on a PLG product, and almost nobody does it. The tail always has something in common, and the thing it has in common is almost always a product fix that costs less than a week of engineering time.

The longer version of this argument, including the audit checklist, lives in our piece on how most SaaS companies measure TTV wrong. The operational layer that pulls the distribution, runs the percentiles, and segments the tail for you without SQL lives in the Tivalio product.

Stop reporting the mean. Start reporting the shape. The difference between a healthy product and a broken product is not the average of their user times. It is the tail, and the mean was never going to show it to you.

Stop reading dashboards.
Start answering questions.

Connect your data in 5 minutes. See your TTV distribution the same day.

Free forever · No credit card · Cancel anytime