Tivalio logoTivalio
CornerstonesGuides

How to find which user attributes predict churn (without a data scientist)

The 'it's the free users' myth is wrong half the time. Here's how to find the real drivers of churn in your product — no data science degree needed.

April 5, 202613 min read

The "it's the free users" myth

You are in a growth review on a Tuesday. The head of growth puts a churn chart on the screen. It is going the wrong direction. A senior PM leans back and says the sentence that ends every productive conversation about churn before it starts. "Yeah, but most of that is free users. We do not need to worry about it."

The room moves on. The chart stays broken. The quarter ends with the same number.

That sentence is the most expensive thing said in a growth meeting. Not because it is always wrong. Because it is wrong about half the time, and the half where it is wrong is the half where your revenue is leaking. The PM is not lying. The PM is guessing, and the guess is wearing the costume of a fact.

Here are three ways the sentence fails.

The first is conflation. "Free users" is two populations wearing the same label: free-trial users in a paid funnel, and free-forever users who never intended to pay. A trial user who churns is a paid conversion you lost. A free-forever user who churns is top-of-funnel you lost. The remediation is different. The severity is different. Pooling them into "the free users" is the analytical equivalent of saying "we have a weather problem." Which weather. Where.

The second is that the claim hides the paid-plan users who churn early. In every PLG product, there is a cohort of paying customers who signed up on the first of the month and were gone by day nine. They do not show up in a pooled churn rate because they are rare in absolute terms. They show up clearly once you rank by plan, because they are the population with the highest revenue per user and the highest probability of never returning. Dismiss churn as "the free users" and you hide those paid-plan early churners under a label that tells the room they are safe to ignore. They are not. They are the ones who will make next quarter's forecast look bad.

The third is that free-user behavior is not a disposable signal. Free users are the population your paid conversions are drawn from. Their first-session behavior predicts the paid cohort they graduate into. If your free users churn in a characteristic shape at day five, your paid users will churn in a similar shape at day fifty, and the day-five pattern was your warning. A team that drops free users from churn analysis is dropping the single best leading indicator they have.

The point is not that the PM is wrong. The point is that intuition about churn attributes is almost always wrong, not because growth teams are careless, but because the problem is structurally too large to hold in a human head. You need to rank attributes by actual impact, not by how salient they felt in a Tuesday meeting.

What attribute analysis actually is

For every user attribute you observe, you want to know one thing: does it predict whether the user reaches value, and how strongly.

Your product records between twenty and fifty attributes per user. Plan. Country. Timezone. Channel. Referrer. Company size. Job role. Signup day. First completed event. Invited-by flag. Trial status. Most of those are already in your event stream whether you have looked at them or not, which is the point of your Amplitude integration.

Attribute analysis asks, for each of those attributes, how much it moves your target metric. The target is usually time to value, activation rate, or retention at a chosen horizon. You are not asking whether an attribute is "interesting." You are asking: if I knew only this one attribute about a new user, how much better could I predict their outcome than if I knew nothing.

This is not segmentation. Segmentation is "let me look at all French users." It is a lookup. It assumes you already know which slice to pull. Attribute analysis is the opposite: it assumes you do not know which attribute matters, and it ranks all forty so you find out. Segmentation answers questions you already have. Attribute analysis generates the questions that turn out to matter.

The distinction is load-bearing. Most growth teams do segmentation and call it attribute analysis. They slice by the two or three attributes they already care about, find differences, ship a dashboard, and miss the attribute driving eighty percent of the variance because it was not on their shortlist. The whole point of ranking is to catch the attribute you would not have thought to check.

Why the manual approach fails

The problem is combinatorial. Twelve attributes, four values each. Single-attribute slices: forty-eight. Pairwise: over a thousand. Three-way interactions: over thirty thousand. No human is going to inspect thirty thousand slices. What actually happens is the analyst picks the three attributes they already suspect, runs those slices, finds something, ships it, and goes to lunch. The analyst has done real work. The analyst has also confirmed their priors, which is what they were trying not to do.

The anti-patterns are easy to spot. Pattern one is the obvious-attribute trap: the team slices by plan, country, and channel every week because those are the three the CEO asks about, and those three become the universe of things they ever check. The attribute driving churn is first-session-event-type, but nobody has looked at it in nine months.

Pattern two is the averaging trap: the team compares segment means, which is a mistake for all the reasons in the TTV measurement piece. Two segments with identical means can have radically different distributions, and the mean flattens the difference to zero.

Pattern three is interaction blindness. Even if the team ranks single attributes correctly, they miss the case where no single attribute moves the metric but the combination of two does. A weekend signup is fine. A free-trial user is fine. A weekend free-trial signup has a three-times worse TTV. Neither single attribute shows up as a driver. Manual slicing will never find it.

The honest summary: manual attribute analysis scales poorly, confirms priors, and misses interactions. It is not a matter of working harder. It is the wrong shape for a human spreadsheet.

What you actually need: attribute impact ranking

Instead of slicing by one attribute at a time and comparing segments, compute, for every attribute you have, a single impact score. The score answers one question: if I know only this attribute about a new user, how much better can I predict their TTV than if I knew nothing? Rank by the score. Look at the top five.

That is the whole move. You are trading "look at segments" for "rank attributes by their effect on the target metric." Think of the score as variance explained, information gain, or the reduction in prediction error when the attribute is added to a model. The exact math matters less than the ordering. You need a sorted list with the biggest movers at the top, so you can stop debating which attribute to investigate and start investigating the one the data already pointed to.

Attribute impact on TTV, ranked

Attributes on the y-axis. Impact on the target on the x-axis. You are looking for the top three bars. The top three is where your effort belongs for the next month. Everything below is noise relative to the top, and spending time on it is the same mistake as optimizing p99 latency before fixing p50. Biggest thing first.

The ranking has one more property that matters. It is invariant to your priors. It does not care which attributes the CEO keeps asking about. It does not care which are easy to slice in your dashboard. It computes impact and sorts. That is why growth teams resist it: the output will occasionally tell you that the attribute you have been talking about for six months is the fifth most important one. Present that slide anyway.

A worked example with mock data

A fictional SaaS product. A project management tool, fifty thousand MAU. The starting event is signup. The value event is "first project shared with a teammate," because a user who shares a project is four times more likely to retain at day thirty. TTV is the time from signup to that shared-project event.

The team's working assumption is the standard one: "most churn is from free users." Plan tier is the attribute they expect to matter. It is the attribute the quarterly board deck already has a slide for.

Run the ranking across twenty attributes.

The top driver is not plan. It is first-event-type. Users whose first event was "create project alone" had a p50 TTV of eleven days and a p75 of twenty-six. Users whose first event was "invite a teammate" had a p50 of two and a p75 of five. Four times apart at p50. First-event-type explains more than a third of the variance in TTV by itself, and nobody was looking at it because it was not on the board slide.

The second driver is acquisition channel. Organic search users hit p50 at three days. Paid social users hit p50 at nine. The gap is persistent across plan tiers, which tells you it is not a plan artifact. Paid social brings a population that looks the same on paper but behaves differently.

The third driver is company size. Five-to-fifty-person companies hit the shared-project event at p50 = 2.5 days. Solo accounts hit it at p50 = 8. The effect is almost entirely mediated by having a teammate to share with, which is the value event itself.

The fourth driver is signup day of week. Weekend signups have a p75 TTV thirty percent worse than weekday signups. Users who sign up on Saturday do not come back on Sunday, and by Monday they have forgotten why they signed up.

The fifth, finally, is plan. Paid and free differ by twelve percent at p50. That is the real gap. It is less than a third of the first-event effect and less than half the channel effect. "It is the free users" was technically correct in that free users are slightly worse and catastrophically wrong in that plan is fifth, not first. The team was optimizing the wrong lever for the whole quarter.

The working assumption was "it is the free users." The actual top driver was first-event-type: users whose first action was "create project alone" had a 4x worse TTV than users whose first action was "invite a teammate." Plan tier was fifth, behind first-event, channel, company size, and signup day. If the team had shipped the ranking in week one instead of guessing, the quarter's onboarding investment would have gone to first-run experience, not to plan paywall experiments.

The 5 attributes that usually matter in SaaS

The example is fictional. The ordering is not unusual. In most SaaS products the same five attributes show up in the top of a TTV impact ranking. Not always in the same order, but almost always in the same set.

The first is first-event-type. What the user did in their first session predicts whether they reach value better than anything else you can observe. A user who clicks around in the empty state is a different user from one who creates an object. A user who creates alone is different from one who invites someone. First-event-type is a behavior, not a demographic, and behaviors predict future behaviors better than demographics do. If you have one slot, spend it here.

The second is acquisition channel. Channels bring different populations, not just different counts. Organic search, paid search, paid social, referral, content, and sales-led all have distinctive TTV distributions in every product you look at. The cleanest signal is referral: a user invited by an existing customer has a TTV a quarter of the average, because they arrive with context. The dirtiest signal is paid social, because paid social brings users who were surprised into signing up, and surprised users do not onboard.

The third is company size, or the B2C equivalent of user role. For B2B, company size mediates everything about how fast a user can reach a value event that involves other humans. A solo user in a collaboration product is structurally slower than one with teammates on hand.

The fourth is geography, or more precisely, timezone. Activation happens when humans are awake and have twenty minutes to pay attention. A user eight hours off from your business hours onboards slower, because your emails arrive while they are asleep and your support team is off. Small at the median, large at the tail. Not a bug you fix; a reality you plan around.

The fifth is signup day of week. Most growth teams skip it. They should not. Weekend signups have worse TTV than weekday signups in every PLG product, because a user who signs up on Saturday does not come back on Sunday, and by Monday the intent has cooled. The fix is not to stop accepting weekend signups; it is a weekend-aware nurture sequence, which you will not build if you have never looked at the attribute.

What to do once you know the top driver

Ranking is useless without a decision. If attribute X has three times the impact of attribute Y on your TTV, what do you actually do on Monday morning?

3 experiments
per top driver, ranked by expected lift and cost to ship

You have four options, and you pick by cost.

One: change the onboarding path for the worst cohort of the top driver. If first-event-type is your top driver and "create project alone" is the worst first event, redesign the first-run experience so the default path nudges users toward inviting a teammate. You are not blocking the alone path. You are moving the default. Defaults move behavior more than anything you can ship in a quarter, and they cost one week of design time.

Two: restrict acquisition from the worst channel. If the top driver is channel and paid social is the worst, reduce spend there until you have a variant of the onboarding that converts the paid-social population. You are pausing the leak while you fix the pipe.

Three: add friction that filters out users who were never going to succeed. Counterintuitive and it works. A one-minute intent question or an industry picker in signup removes the doomed cohort before they generate noise. Aggregate TTV improves because the denominator got cleaner, and the users you lose were users you were going to lose in week two anyway.

Four: change the value definition itself. Nuclear option, right about ten percent of the time. If the top driver reveals a segment with a fundamentally different journey, ask whether the current value event is the right one for that segment. Painful and slow, but sometimes the only option that repairs the metric.

Pick one. Ship it. Re-rank in two weeks. That is the whole loop. One experiment on the top driver, measure the impact, move to the second. If you pull all the levers at once you cannot tell which one moved the metric. Top of the list first. That is how the Watch, Understand, Custom Research loop is supposed to run, and it is the thing growth teams drop the first time a deadline shows up.

How to automate this

Everything above is doable by hand. It is also the kind of work where, the second time you do it, you realize you will do it every week, and the tooling question becomes real.

This is what the Drivers gallery inside Tivalio is for. The research template ingests your existing event stream, computes the impact ranking across every attribute you record, and returns the sorted bar chart you saw earlier. You do not write the query. You do not pick which attributes to include. The template does the combinatorial work and shows you the distributions for each top cohort, so you see the shape, not just the score.

The research is deterministic: same inputs always return the same output, and the methodology is visible on the card. Every number is computed from the raw events you already send to Amplitude or Mixpanel. If you run the growth review loop on Monday mornings, this is the panel you open first.

Stop guessing which attribute matters. Rank them.

Stop reading dashboards.
Start answering questions.

Connect your data in 5 minutes. See your TTV distribution the same day.

Free forever · No credit card · Cancel anytime