Canonicalizing User Journeys: Removing the Noise
Raw user data is messy. Humans are fidgety. They click buttons twice. They refresh pages. They hit the back button.
Raw Stream:
Home -> Signup -> Signup -> Pricing -> Signup -> Dashboard -> Dashboard -> Dashboard -> Value
If you feed this raw stream into a Time To Value calculation, you get noise. You need to Canonicalize the journey.
What is Canonicalization?
Canonicalization is the process of stripping noise to reveal the Logical Path.
Rule 1: De-duplication (A -> A -> B becomes A -> B)
If a user fires Dashboard Viewed 50 times in a row, it doesn't mean they achieved 50 steps of progress. For TTV modeling, we treat this as one Dashboard Viewed state.
Rule 2: First Occurrence Priority
When measuring TTV, we usually care about the First time something happened.
- First Signup.
- First Value.
If a user deletes their account and signs up again next year, that's a new journey. But if they refresh the signup page 5 times, that's just UI noise.
Rule 3: Removing Loops (A -> B -> A -> C)
This is tricky.
- If the user goes back to A, did they restart?
- Or did they just check something?
Tivalio's engine usually looks for the Shortest Successful Path in the graph to determine benchmarks, while preserving the "Wall Clock Time" of the session.
Why This Matters for TTV
If you don't clean your data, your P90 TTV will be heavily skewed by "Fidgety Users" rather than "Struggling Users."
- Fidgety User: Fast mental processing, lots of clicks. High noise. Real TTV: Fast.
- Struggling User: Slow mental processing, few clicks. Low noise. Real TTV: Slow.
By Canonicalizing, we ensure that we are measuring the Process Speed, not the Click Speed.
