Joe Reis has been doing some great stuff over at his Practical Data Modeling Substack lately. My favorite is his recent call for data professionals to start training up in the Mixed Modeling Arts.
This is a much-needed metaphor revamp from the “managing 7/11s” motif of data marts and a useful step away from the ivory tower of formal modeling frameworks. My (perhaps outdated) impression of this fine art is that of a toxic academic niche: competing schools claiming access to the Platonic forms of data modeling but all agreeing that the real world we live in, the one where bills are paid, is a sweaty, corrupted, shadowy swamp-cave.
On the last part, I agree. Data is a mess, and a lack of formal knowledge contributes to it. But so is everything digital; if the target is an elegant virtual world, then we live in the end times.
But that there’s a “right” way to model any given dataset, I’m not so sure. I’ve seen gross models power a business and clean models collect dust on the shelf. One of the skills of data modeling must be to know when to stop. That is, it should be treated with the appropriate respect. Neither too much nor too little.
That can be a hard balance to strike, though, because “well-modeled” is an abstract concept. Here, I’d like to try to make the stakes concrete.
A data model is a structured representation that organizes and standardizes data to enable and guide human and machine behavior, inform decision-making, and facilitate actions.
The most ubiquitous of these structures is the Table. This is literally true for relational databases, but it’s also true in spirit for spreadsheets, NoSQL collections, search indexes, and data lakes, even if the details diverge. Tables are collections of data records, with somewhat consistent attributes across each, that make accessing the specific values easy for machines or humans.
The “data” sub-discipline exists because tables have fascinating properties. They are at once static and fluid, fixed and flexible. Confer:
They have a fixed structure, but they aren’t infrastructure.
They are atomic, but they can be joined together to make new surfaces.
They are static by default, but actors can load or unload content onto them.
People organize themselves around tables.
People buy tables, make tables, sell tables.
Their usage is unbounded, and they are frequently reconfigured for new purposes outside their original scope.
Do you know what other object that describes?
Tables!
Physical tables, I mean. Maybe you’ve seen one before?
Kitchen tables are great for eating. Coffee tables are great for mugs and legs. Dining tables are a great place for a four-year-old to practice writing their name with scissors.
Tables are furniture. They furnish, or provide, a surface for people-like beings to do people-like activities, like eating, resting legs, or writing with scissors. No law mandates that every house has at least one table, but most do, and I’d be uneasy if I entered a home without one.
Furniture is neither the room nor the people, but it is a strong determinant of how the people can use the room.
“Room modeling” entails not just the objects themselves, either. The arrangement, the spacing, the minimalism (or lack thereof) affords actions. Furniture is a critical aspect of the greater design decision about a physical space.
Digital tables furnish digital spaces. They matter because many of us spend more conscious time in our digital offices than in our physical homes.
Optimizing a data model (or a set of them), then, is as important as optimizing office furniture. Its value is derivative from the activity that happens in the space. No activity, no value.
That is not a knock. Feng Shui can be a serious business. The furniture in a supply warehouse—shelves, belts, stairs, slides, desks, chairs, trolleys, lockers, and, yes, tables—are meticulously arranged to facilitate operations. Placing furniture haphazardly or in an inconvenient way will have massive implications on productivity, even if the toll is more chronic than acute. I’d expect systemic accidents and arthritis rather than an abrupt stoppage of operations.
A model in theory always looks worse in practice, because it does not support real activity. Just as an Ikea showcase room is a great way to get inspiration but a terrible place to live, idealized data models work best in classroom settings and interview tests. Real bedrooms will get messy.
Expert modelers own this. Of the classic Kimball books, probably 50% of the pages are devoted to specific, practical use cases rather than theoretical principles. Agile Data Warehousing involves iteration, whiteboarding, and talking to users. There are technical decisions based on use case requirements, but these are more of the sort solved by event planners: “Where do we put the sign-in tables? The buffet? The tip jar?”
The room (i.e., the database) sets the constraints. The people (i.e., the users) define the purpose. The furniture (i.e., the model) ought to be arranged as sensibly as possible to support that purpose.
Just as there are myriad ways to arrange furniture in a room, many magnificent tables to choose from, and many great events to be had over time, so too are there many ways to properly model data.
One room and one set of furniture can fit many purposes, but it can’t fit them all at once, no matter the “ideal” arrangement.
The upshot is that there’s no room for righteousness in the world of furniture moving or data modeling. Companies don’t need a Table Guy to move the desks for the office party; everyone rolls up their sleeves and does it.
Returning to the MMA analogy—the natural first question is, well, what are the specific data modeling arts? Dimensional modeling? Machine learning? RAG? Activity schema? One big table?
Not only is that a never-ending question (there’s always more), but it misses the best part of the analogy: the sport. The octagon. The athletes. The fight.
Modeling the world is the art of bending information to win an argument, build an efficient system, or quell dissent. It’s a struggle with tradeoffs. To do it well, you’ve got to want to win. To do it well, you’ve got to hate having your face in someone’s armpit.
This is the real reason Excel is immortal. The Excel user is a fighter, not an interior designer. He uses an Excel table not for dining, resting, or writing—but to slam it down on his opponent’s head.
You know the move. He swings the laptop around. Here are the numbers.
You got a problem, bro?
Duplicate column names? Manual color formatting? 0NF modeling?
Want to say that to my face?
I like this mental exercise of representing data tables as physical tables and by extension, furniture. Tables, and more generally tabular data are a great representational layer for most people when we reason about data. This is probably because we have been indoctrinated early and it’s easy to grasp. Is it the end-all for modeling the world or even an organization? I think it has limits here but it is such an entrenched way to think about data that it makes it hard to distinguish the furniture from the house (frame). I think when realtors want to sell a house (reality) they call it staging (furniture). It sells the house! Ok I’ve taken this as far as I can …
Great read. Mixed model arts treats tables as one form of data, among many others (semi-structured, unstructured (text, images, etc), metadata, ML artifacts, and graphs). And across different use cases - apps, analytics, and ML/AI. So, I think tables are perfectly fine for most analytical use cases, as you indicate. And tables exist in many variations for analytics - olap database tables, spreadsheets, dataframes, etc. And now you can use Python (and I think typescript?) on Excel, so Excel is now an IDE
They're probably not well suited for many others, but that depends on what you're trying to do. The core of MMA is to open people's minds to the variety and use cases of data out there, as that's the world we live in.