This is a compilation of daily posts that I’ve published throughout the week of December 4, 2017 on LinkedIn.
What is Tableau’s Project Maestro
Tableau is in the data visualization business, but they’re well aware that for most of us, actual data visualization is the final 20% of the whole journey to get there. 80% of the work we do mostly revolves around the mundane job of data preparation.
It’s no big surprise that Tableau wants to target that space, by providing a visual and flexible environment to do and automate your data preparation. And I agree, the more we can do directly within Tableau, the more time we’d save creating and maintaining sometimes complex data wrangling scripts. Although I love doing some SQL, R and Python programming, I prefer to reserve that to tasks other than data preparation.
So when Project Maestro was announced, I signed up for the beta program. And finally got my access last week. And this is what I plan on talking about this week.
Tableau provides a guided tour of Maestro where we’re joining multiple datasets of flights and of wildlife strikes. This is frequently used in demos and there’s a story about it here.
The first few steps are quite straightforward. Not unlike opening datasets in Tableau right now and joining them together. And although Tableau does now provide a little bit of data wrangling capabilities, Maestro takes it a mile further. For each source, you can control certain settings (such as encoding), filter the data, rename and select fields, and even sample large datasets (cool!).
Once you’ve loaded your data sources and shape it to your liking, you can start adding steps to your flow. Adding a step provides a 3-tier view with a description of the source’s content. You can explore data by clicking on the value of field (such as “Southwest Airlines” in the attached image) and it will highlight related values in other fields. You can even explore actual rows in the bottom pane.
Continuing the Maestro Tour
Continuing on our tour of Tableau’s Maestro project, a data wrangling addition to their powerful BI suite, we now look at how to clean up data to eventually join data sources together.
A Union requires all sources to have similar fields. Your sources might be structured a little bit differently and this is where Maestro can add functionalities that you wouldn’t have in Tableau directly. For example, you might have a field that contains information that could be split into their own distinct fields. For example, in the attached image, we split the Airlines Description field into 2 fields that matches the structure of all other data sources we’d like to join together.
It’s also worth noting that Project Maestro provides automations to accelerate your data preparation, such as automatic date parsing of a field that has multiple date formats (yeah!) and automatic data split. From the documentation, it does seem like this is the sort of magic Maestro wants to keep on sprinkling on your data preparation process. I love magic! 🙂
Completing Our Tour
Alright, so we’ve gone through the what is Tableau’s Project Maestro, how to load data and clean it up. Now we want to combine sources and package a final dataset to be used in Tableau.
As the image shows, we can do a bunch of Joins and Unions on our data sources to effectively combine our data and package our final dataset.
A cool feature when doing Unions is that Maestro will match fields from the different data sources, but you can also highlight the misaligned fields and drag one over the other if they are meant to be merged (this happens when the column names do not match).
Doing joins requires more information in regards to how data is matched between data sources. As you can see in the attached image, Maestro provides not only the ability to easily set the parameters of your join clause, but you can see a summary of that join, as well as the individual results of that join.
Once your data is combined, you can add aggregates to your flow, as seen in image, where you’d group rows together, do a Sum of values, etc. And then, tadam, you can finally add an output to all that beautiful data, run your flow and export a TDE file that can be used by Tableau itself.
The only real problem I encountered when messing around with Maestro was that at one point I was having fun doing multiple Joins and Unions on data sources. And although I was having fun, Maestro didn’t seem to care much for it. It ran and ran and ran, and even after removing all joins and unions, it was still painfully slow to resume. So I had to close Maestro and start again from where I left off.
Besides that, it was a very pleasant and promising experience!
One promise that I’m looking forward to, and that’s from their blog announcement, is that Maestro will “automatically show errors and outliers in data and use fuzzy clustering to help you with the common, repetitive tasks like fixing spellings errors or reconciling entities across data sources.” I did see a bit of that, but will try to focus more on that with the next release.
Finally, one big question that is looming is if there will be an extra fee required to use Maestro? My bet is yes, and it would make sense, but hopefully it will be an affordable addition to Tableau.
Looking forward to the next beta release (which I heard today is coming soon).
- Project’s official webpage and beta sign up form – https://www.tableau.com/project-maestro
- Blog release – https://www.tableau.com/about/blog/2017/11/now-beta-visual-data-prep-project-maestro-78601