Discovering Amundsen + Meltano's Roots + More (PAN #33)
Edition #33 - May 18, 2020 Originally sent via Mailchimp
Good morning product analytics friends 👋
I’m assuming that most of us work hard every day to build efficient data infrastructures that reflect truths about digital products. As organizations adopt data-driven cultures, our stakeholders become even more demanding. And so we generate, transform and expose even more data.
But as datasets proliferate, they can somehow be hidden and forgotten. And as business questions pile on, not being able to quickly find the right dataset to work from triggers us to just start from scratch… again.
Documentation and discovery is a serious issue for more mature data-driven organizations. How to make data discoverable in a context where they’re the product of so many modular tools interconnecting together, is a challenge. That’s where Amundsen comes in and it’s the focus of our first story below.
With that, on with the 33rd edition of the Product Analytics newsletter!
What has been my highlight?
Amundsen — Lyft’s data discovery & metadata engine
Lyft.com by @mark_grover
In our last edition, I shared an article on data governance and discovery and ended with a todo task to look into Amundsen. Well, here we are.
As we stated above, lots of work is involved in making rich data available to analysts. So how can we make sure they find the right datasets when pressing business question emerges?
Lyft’s Amundsen tool catalogues metadata associated to datasets and makes it discoverable and intelligible. What is that metadata? Here are its ABCs:
- A - Application Context: “information needed by humans or applications to operate.”
- B - Behaviour: “information about how the data is created and used over time.”
- C - Change: " information about how the data is changing over time."
What’s interesting is the coverage of metadata that is catalogued. We’re not only talking about data sources, but also BI reports, streams, processing of the data, etc. Right now, all that metadata is distributed in many layers of your data stack so it’s really hard to get an overview of all that is available within an organization. Amundsen catalogues all that metadata and facilitates its discovery and understanding.
Want to learn even more? Watch their last community meeting which shows real live implementations of Amundsen. Really interesting stuff!
Growing your product with the help of data.
Why customer retention is the ultimate growth strategy
Segment.com by @geoffreykeating
There are hardly breakthroughs into the metrics that should have your full attention whenever growing a product. And we all know that customer retention is key - the math behind leaky buckets rarely leads to solid growth. But how you analyze customer retention and how you act on it, that is important and can lead to breakthroughs.
This guide from Segment is their own tactical approach to understanding and acting on customer retention.
First thing to explore is not churn, but what makes your best customers stick. Segment comes up with an Activation score that measures how engaged a user is.
Another angle to look at retention is by splitting it between early, middle and late stages of a user’s journey. There are different objectives in a user’s journey during different phases and retention should itself be understood differently during them.
This article also covers other tactics such as measuring an account’s vitality, preventing accidental churn, etc.
Factory operations to transform data into analytics.
Meltano is Returning to its Roots
Meltano.com by @DouweM
Here’s a lengthy update from a project I love following and that I’ve talked about before [here and here], Meltano. They’re ambitious, and even though this is a Gitlab project, it is ran as a startup trying to find product-market fit.
In the first of two posts about the pivot they’re going through, they talk about their journey so far and offer an assessment of their current situation.
“The most significant conclusion came pretty quickly: As an open source project, Meltano’s scope is simply too broad and ambitious.”
What that means is that Meltano cannot do it all themselves, they need to nurture a community of developers that want to invest their time in creating an “open source self-hosted platform for running data integration and transformation (ELT) pipelines”.
The second post goes deeper into how they want to achieve that objective. But I think I’ll come back to it in a later edition as it is packed with ideas that I can’t summarize in one line. To be followed.
Deriving insights from your product’s data.
The Analysts as Cartographer: Customer Journey Mapping
For the podcast listeners out there, here’s one about customer journey mapping. I’ve had conversations on that topic with a friend of mine and the struggle with a purely data-driven approach to this is that you do not have perspective on what’s driving behaviour. And that is (from my understanding) the essence of the Job To Be Done framework - what’s is driving an individual to behave the way they do.
Only looking at data, you can build personas based on behaviour, but that is a very rigid limited, “after-the-fact” way of looking at users. Whereas, if you only talk to users without using data, you lose perspective and things become anecdotal.
A good example of this are funnels - they usually are so rigid and only look at a specific behavioural pattern. But users might take detours and only when you start understanding the “why” can you understand that there might be multiple paths for a single funnel.
If you are interested in that aspect of product analytics, this episode is very conversational and might lead you to consider different perspectives next time you are looking at your user’s journeys.
What’s happening the product analytics market.
Market News Overload
Too many market stories this week. So why choose, when I can just overload you with all of them. You’re welcome 😬
Fivetran has started releasing dbt packages to facilitate integration of their sources in dbt projects. So far, they released packages for Netsuite, Mailchimp and Salesforce. I know of other projects coming up that will facilitate integration of sources in dbt. This is an interesting layer that is starting to take shape.
Tableau released their 2020.2 edition and it seems like a good one! Relationships allow you to define how tables relate to one another, which then provides more flexibility to do your analysis. The other big announcement is about metrics. You get to define them as objects outside dashboards and data sources (at least, that’s my understanding) and you can only track them if that’s all that matters to you. Haven’t tried the new version yet, but seems like a good step up!
What, you haven’t had enough? Fine, here’s an interview with Snowflake CEO where he talks about the strategic alliance with Salesforce, how Covid (I had managed to not talk about it until now…) is impacting business and all of that. Nothing really jaw-dropping in here, but still interesting.