Comment on 'Tuning Whatnot's Data Platform for Speed and Scale'

On starting with the problem and not the solution

May 06, 2022

Dropped a professional post this week: Tuning Whatnot’s Data Platform for Speed and Scale. If you only want one dose of me, go read that. This post is a brief comment on that one.

In previous gigs, I was the princeps civitatis of the data platform, either because I built it, inherited it, or rightfully won it in ritual combat. And because I felt ownership over it, I wanted to build The World’s Best Data Stack. If I could refactor the platform for a 5% performance gain, you better believe there was going to be a refactor.

At Whatnot, I'm no princeps or senator — I’m a plebe. And although the data world has its own complexities, every person I work with understands data at scale, often in ways I don’t. So I find myself being more critical than ever of which initiatives I want to push, and which to leave behind.

When you want to prioritize only the highest impact work, toy projects and test drives don’t move the needle. In fact, it’s an affront to the hard work of others.

So instead of starting from the solution, I’m seeing the value of iterating on top of the problem. Iteration is when you stop and ask, “Where are we now, and how do we get one step better?” It’s the opposite of playing bingo with data stack diagrams and ideating around the latest feature that dropped in the monthly release notes. You just have to know where you are, where you want to go, a few paths to take, and start walking.

The expected outcome of improving a stack — say, implementing CI/CD, adding staging environments, onboarding a new tool, hardening conventions, adopting infrastructure as code — is that over an extended period, the rate of value creation will increase. This can happen through unlocking new use cases (in the case of adding new tools), safely bringing on more stakeholders (conventions, processes), or preventing future slow-downs (infrastructure as code, proactive compliance).

In Whatnot data team lore, the platform has now gone through three iterations:

A rapid push to a minimum viable platform — to get machine learning models into production and rigorously define the core metrics for a business.
Invest in platform tooling and set up reusable patterns for ingestion and data transformation.
Harden processes around onboarding new users and data products through a system of data marts, clarifying ownership, and automation.

Each iteration was motivated by a particular need — a need to prove value, a need for reusable workflows, a need for consistency and automation. As someone who loves thinking about tooling, it’s been refreshing to see the world in terms of problems to tackle, and not solutions to implement.

Data People Etc.

Discussion about this post