Data Quality First Principles + Survival Analysis + Census + More (PAN #35)
Good morning product analytics friends 👋
I thought I would start this newsletter with a piece a bit outside the scope of this newsletter, but that I hope you will enjoy. As with everything written by Tim O’Reilly, it’s thought-provoking and always offers larger questions to put into perspective underlying trends and how we can act on them.
Welcome to the 21st Century - How To Plan For The Post-Covid Future
As argued by the author, Covid might just be the event that throws us into the 21st century. And that will change the course of how history has unfolded since the 20th century major wars. That forces us to consider how we should build back our economy, but event better.
We’re all responsible for how we’re shaping the future and I believe readers here do have an impact on how we build it. So although things are crazy, what to do about it now? Things aren’t going back to normal. So how will you contribute to building back the new, better normal?
With that, on with the 35th edition of the Product Analytics newsletter!
What has been my highlight?
Data Quality From First Principles
Holistics.io by @ejames_c
There’s been quite a few circumstances lately where data quality was at the heart of what I worked on, discussed or read about. There’s this project I worked on where we needed to surface erroneous entries in a transactional database, discussions with colleagues about using a screening layer in our data warehouse projects, this really amazing answer from Maxime Lavoie in dbt’s slack channels about how to deal with bad data (read it now before it’s lost forever), etc, etc…
And then I stumbled upon this article by Cedric Chin on how to frame, surface and deal with data quality issues by thinking of them through first principles. Essentially, errors are rooted into PPT: people, processes and technologies. As you start to identify data quality issues and categorizing them as either a P, P or T issue, a data team can come up with fixes but also address underlying PPT problems and become a more mature data organization.
Data quality is a never ending topic to explore, but to think of them in first principles is definitely foundational and provides a solid framework to not only deal with technological problems to those issues, but how processes are at issue as well, or maybe that people responsible for producing that data don’t have adequate training, etc.
Growing your product with the help of data.
Octalysis – the complete Gamification framework
YukaiChou.com by Yu-kai Chou
I’ve been working on a co-presentation lately about the feedback loop between qualitative and quantitative data, and a point we want to drive home is that you need to know what you want to measure before even gathering data and coming up with metrics. This is a bit cliché, but I’ve seen and participated in projects where we lose focus on the business questions and just play with the data for the sake of playing with data.
This piece fits into this category. What’s presented here is a nicely laid out and quite rich framework to think about user engagement. If ever you’re unsure as to what constitutes engagement, gamification has a lot of depth and will certainly feed your reflections.
Don’t let the title fool you, it’s not a framework for games only. There are examples in there for Facebook and Twitter which are definitely insightful if you want to think about engagement in the context of your product’s analytics.
Factory operations to transform data into analytics.
Blue-Green Deployment with dbt and BigQuery
Youtube.com by @calogica
In my experience, dbt has enabled analytics projects to be deployed in a more structured way. You can test the freshness of your sources, build your warehouse, test schemas, test data, etc. All by pulling from a git repository which in itself enables all the advantages of building from code. Now if ever your tests fail, you’ll be automatically notified and you can act on this before your users find out.
That’s already leaps and bounds better than how BI tools were previously pulling from half baked data and where you’d live in fear of receiving the dreaded emails from users saying that they’re noticing weirdness in the data. Or that a report is broken. Or worse, that some key metrics are just completely off.
Deploying with confidence makes your life more enjoyable, and keeps end user’s trust in your data high. That’s as important, if not more.
But how to go even further? How do you keep bad data from being published in your dashboards? This is the central question covered in the first presentation of this meetup, where Claus Herther shares how they adopted blue-green deployment ideas from devops (WAP / Write-Audit-Publish in the data world) and implemented those with dbt and BigQuery.
Deriving insights from your product’s data.
Using Survival analysis to Calculate Your User’s LTV
Strong.io by @jwdink
That one’s stretching the limits of my stats skills. But oh is it a good read. I’ve actually started working on LTVs for 2 projects and a colleague suggested I do some reading on survival analysis. I knew of it, but can’t say I had any clue on how to implement this.
Survival analysis in the context of user’s lifetime value is the probability that a user will not churn in the next time period.
If lifetime value is the expected profit to be made of a user during their lifetime’s use of your product, then you cannot just do an average of profits multiplied by average lifetime. This is wrong as it doesn’t take into account that some users are still active and still generating revenues. This is where survival analysis comes in to actually calculate the expected lifetime of users.
If this is all pretty new to you, just as it is for me, then this article is a very solid intro to the subject.
What’s happening the product analytics market.
Census to Move Data From Your DW
Techcrunch.com by @borisjabes
I’m super late to this, as when the story broke that Census raised seed money, I was a bit distracted by… you know, world crazyness. But I finally had a look at this and oh, I think I get it.
Here’s my take on what Census is. You move data to your cloud data warehouse. You stage, transform and model that data. You derive some cool new data points from that data, such as channel attribution. And finally you send those shiny new data points to external apps such as Amplitude.
Well at least, that’s how I would use it. I actually have a demo lined up with the founder as he confirmed that the above scenario is top of mind for them. So 🤞 I’ll be able to execute this with Census in the near future, cause I know exactly for which project I’d use it first!