Building Funnels Made Of Multiple Paths Using dbt
Funnels are an important part of product analytics. As we investigate behavioural patterns, we explicititely map out or have mental representations of paths that will measure how different users are journeying through our product.
May they be event conversions, feature adoptions, or other sorts of multi-steps engagements - a common challenge to funnel analysis is how to configure them to capture the complexity of multiple journeys while still viewing them as a whole, which should provide valuable indicators as to how well we are guiding users.
Most analytics services have their own way of dealing with funnels. Some are pretty good, some are too rigid and others look like a meaningless hairball. But they all share one major limitation: they need to be configured within their own tool.
What if you wanted to extract funnels directly in your data warehouse? This would allow for analyses outside of it. You could use Tableau, R, Python or whatever and always access the same funnels with the same results.
Our “engine” is being built with dbt, but the concepts should probably be applicable to any other ETL platform you might have.
Here’s an example of a funnel with multiple paths.
Funnels can have multiple, clearly definable paths. And all those paths can be evaluated independently or as a whole to see how you’re dealing with that specific part of a journey, and for a specific segment of your users.
A Simple Registration Funnel Example
For now, I only have two main requirements for this project:
- Flexibility - different users might take different paths through a funnel
- Associativity - multiple paths can relate to a single funnel
I’ll work from a simple example: a newsletter’s registration funnel. In this example, I expect there are 2 main paths to subscription:
- Visitors who landed on the website, visited the newsletter and subscribed
- Visitors who landed directly on the newsletter and then subscribed
Of course there could be more paths and steps involved, but for simplicity’s sake, this is how I will configure this funnel.
This funnel has paths for 2 different types of visitors: those who knew nothing of the newsletter but stumbled upon it by chance; those who knew about it but were looking for validation. They go through the same funnel (registration), but navigate through it differently.
Building A Funnel Engine
Here’s what I’d like to see in my data warehouse:
- Funnel configurations - list of funnels and paths associated to them
- Attributions - the paths taken by users who entered a funnel
- Funnel completions - steps completed by users who entered a funnel
For now, our example funnel’s configurations is in a csv file which is seeded to dbt. Of course you could get way fancier here, but this is only a proof of concept, so let’s keep it simple.
Here’s what our funnel_paths configuration file looks like…
As you’ve noticed, we have an associated metric_name for each path’s step. It’s a bit beyond the point how that is setup, but for example newsletter_subscription obviously represents an event where a user subscribed to the newsletter.
Now, here’s the core of our engine which consists of 5 files:
create_funnel.sql which is the actual macro where events / metrics are mapped to a user’s path and transformed into a funnel object.
f_newsletter_registration.sql which is the implementation of that funnel macro for newsletter registrations.
f_newsletter_registration_path_attributions.sql which associates users to the correct path based on their behaviour. For this example, attribution is based on which page the user landed on during a session.
funnel_paths.sql which creates a table of all funnel and path configurations.
and finally funnels_table.sql which creates a table of all funnels taken by users.
I won’t go into the details of that code, but here are a few key points to take into consideration:
- The general idea is that this macro transforms events associated to a user session, maps it out to its designated path and returns its funnel instance as a json object.
- This is super-specific to my own setup (for example, I’m using Snowflake functions in there), but the general ideas are there and could be adapted to other environments.
- It’s definitely work in progress. The code is not optimal and there is sure room for more flexibility in this.
Once our funnel paths are configured and assigned to users, we can query our funnels_table this way:
In this case, I’m only interested in the newsletter_registration funnel instances and only in the instances that reached at least the second step of the funnel.
And here’s what a funnel instance would look like…
From that point on, there’s a lot that can be done in terms of analysis. Here’s just an overview of an analysis I did in Tableau.
Here’s what you’re looking at:
- Top-left: A distribution of all paths taken
- Top-right: A tree of all path permutations taken
- Bottom: A drill-down tree that shows completion rates between steps based on the path taken.
From This Point Forward
The reason why I got into this whole project was because of a client that use funnels extensively but uses a BI platform that does not allow for funnel configuration out of the box. We wanted to bring more flexibility and cohesion to how they could define funnels and their associated paths.
In the end, I think this avenue is promising as it does allow to easily configure new funnels with their associated paths. That said, this “engine” has only been used in that context for now, so there’s certainly a need for more various scenarios.
Also, there’s still evidently room for improvements. For now, the attribution of paths is left to be built differently for each funnel. It would be great for it to be a set of rule-based configurations that would take care of attribution automatically. For example, a path should have event X as a first step and event Z as a final step, but whatever in the middle. Or have event X as a first step, event Y as step 2, and event Z as final step
In regards to performance, the advantage of this approach as we mentioned earlier, is that it can act as a single source of truth for your funnels and results. But, you wouldn’t be able to use that within analytics tools such as Google Analytics - you’d still need to use their own engine.
Are there better alternatives? I think that if you do funnel analysis in Amplitude already, that seems like a really good way of doing it right now. And maybe Google Analytics is perfect for your needs. What I’m proposing here is just another approach to be considered if ever you want funnels to be analysed outside of those tools.
So what’s next? Definitely apply this to more scenarios and fine-tune the engine, especially for attribution. If you are interested in learning more and better yet, use what I’ve done for your own project, please drop me a line at firstname.lastname@example.org.