Going Remote + Attribution Modeling + Testing Stack (PAN #30)
Edition #30 - March 23, 2020 Originally sent via Mailchimp
Good morning product analytics friends 👋
How is everybody? Are you all hanging in there? It’s been a crazy few weeks to say the least and I hope you are all taking care of yourselves and your closed ones.
I want to take this opportunity to thank everyone that is stepping up, sharing their knowledge, helping out, forging communities and just being there. I’ve seen it offline and online and all of those small actions add up to make this drastic change bearable and even enjoyable.
Also, and that might just be stating the obvious, but with kids, family and friends taking priority for the next couple of weeks, I might be a bit more irregular in the sending of this newsletter.
With that, keep calm and stay home, and on with the 30th edition of the Product Analytics newsletter!
What has been my highlight?
GitLab’s Company Handbook
Gitlab.com by @gitlab
GitLab is a large (if not the largest) 100% remote company out there. They have some really smart employees and a strong culture of openness and sharing knowledge to the community (which is greatly appreciated btw 🙏). And in this time of abrupt change for a lot of companies, their company handbook is a gold mine.
I’ve had the chance of working with multiple companies, some of them 100% on-site, some 100% remote and a lot somewhere in the middle. They all have their challenges, but I can’t say that the degree of “remoteness” has any impact on the success of a team. Culture is way more important and it shows when a company is highly-functioning vs one that’s just barely scraping by.
That said, how you treat and organize remote work has an impact on morale since those parameters will have an impact on how a team works together and if they are to be cohesive and break down in a bunch of “silos” that have limited interaction between each other. So being remote or not does not explain success, but how you deal with being remote might have an impact.
This handbook is to be consumed slowly. I’ve bookmarked it and caught myself referring to it often throughout the week to see how GitLab handles specific processes I know could be dealt with more effectively. There’s just so much to absorb and so many great ideas to make all teams more efficient and cohesive, and especially the ones that are currently moving to 100% remote. Enjoy!
Growing your product with the help of data.
Solid as a User Data Protocol
Schneier.com by @schneierblog
This is a bit removed from what I traditionally share in this section. It stems from that idea that users should own their data. It’s starting to be enforced by governmental laws, but how can those privacy ideas be taken further to empower users and have them be the owners of their data.
The whole conversation that surrounds that piece is probably more informative/interesting than the article. For example, how to enforce companies such as Facebook to adopt a common and more responsive protocol that would substantially cut into their revenues. Or just how to design the data pods to be easily manageable by users. Or how to enforce limited access to data and preventing it from being copied. Etc, etc.
Essentially, we’re far away from a good solution for this, but they are important issues and are worth the time to think them through. Especially in product analytics, there’s a balance in how we use data to empower users without breaching their trust. A fine line to walk on.
Factory operations to transform data into analytics.
Great Expectation Pipeline Tutorial
Github.com by @expectgreatdata
As I’ve shared in the past couple of issues, I’ve started experimenting with Great Expectations and although I did hit a few roadblocks along the way, I’ve started to get a handle of it. That means being able to connect to Snowflake, pull a sample of data and write expectations on top of that sample.
Next step is to automate it and that’s where things get a bit more fuzzy. The testing stack I want to put together would look something like the following:
- dbt to test source integrity and freshness
- dbt to build the data warehouse
- dbt to test the integrity of the data warehouse
- Great expectations to test out what the business expects the data to look like
This tutorial from Great Expectations is a welcome addition as it goes through a stack that does include dbt. I haven’t had a chance to experiment with it yet, but if you are interested in building a testing stack similar to what I described above, driven by Airflow, this could be a helpful resource. I would also really appreciate if you could share the results of your experimentation with me.
Deriving insights from your product’s data.
The Attribution Modeling Handbook
Here’s an excellent one from dbt community manager, Claire Caroll. It tackles the issue of attribution modeling. Frustration and anxiety (and pricey black box tools) usually accompany this problem. But with a modern analytical stack, event data and SQL (and a modeling layer such as dbt), you can experiment with attribution models, be completely transparent with business users as to what the model is, and easily change its parameters if needed.
We’re going to use an approach called positional attribution. This means, essentially, that we’re going to weight the importance of various touches (customer interactions with a brand) based on their position (the order they occur in within the customer’s lifetime).
I won’t go into the details of that approach here, but if you want a deep dive on the subject and elegant modeling approach to solving that attribution problem, this handbook just might be the sole resource you’ll need. And it comes out at the perfect time as I’m about to start a project with a client that will tackle that specific issue. Now I’m really looking forward to it! :)
What’s happening the product analytics market.
Fivetran New Pricing Model
Fivetran.com by @frasergeorgew
I’ve never had a chance to work on a project that used Fivetran, but have done a trial in the past for a client and was really impressed with its variety of sources and richness of data. That said, when came time to get a quote, it didn’t take long for that option to be discarded and for us to explore other avenues.
But now they’ve announced a change in their pricing model, which finally brings in a bit of transparency of how much it costs to use their product and also makes its use way more affordable. The pricing model is based on consumption, which is “defined as the number of monthly active rows”.
Even though their pricing model is more transparent, there are still some questions such as what exactly is a credit and “hidden costs” (platform fee). Such are the questions that were asked on the Locally Optimistic slack group. Looking at the Credit Consumption page, you have a table that explains what a credit amounts to in terms of rows processed, but also that there is a minimum amount of credits you need to purchase per month, which amounts to a minimal cost of 500$.
I can’t say that it’s still super clear how you could anticipate your monthly cost without going through a few months, but at least there is now some transparency in the model. And even at a minimal 500$ a month, this is becoming a more accessible alternative to other data loaders.