Data management is a moldy problem

Rinse that before you query it!

Mar 10, 2025

I have this recurring nightmare where I step on a bathroom tile, and it sinks a little, just enough for the caulking to come undone and a ceramic corner to shift up at me like a knife. I can’t leave it like that, so I get on my knees and pry it up a little, to see what’s going on in there, you know, to see why it would pop up like that?

A musky stench hits me like a sodden log. Under the tile is a jungle of rot. A black ecosystem of mold and moisture that horrifies me, both for what it is, and worse, for what it portends. In the dream, I run outside and set fire to the house. The reality would be much worse—months on months of a dreaded “house project.”

Mold problems are different than other homeowner issues. To fix a hole in the wall, I can patch it up; to repair an appliance, I can jigger a whatsit. Mold is systemic and silent, and once it has a foothold, you’ve got to eradicate it entirely, which means a miniature armageddon. You can only “fix” mold by not having it in the first place or containing it and forfeiting the space.

A company’s communication channels, and particularly its data channels, are a lot like the wooden scaffolding and nether spaces in houses. Mold, if not prevented, can be a major problem. But it’s uncommon to think this way.

Instead, most people view information systems through a lens of “rust logic”. Rust logic, according to

Timber Stinson-Schroff

in Mold, Rust and the Risk Society addresses risk through routine maintenance (i.e. changing the oil) or wholesale replacement (i.e. changing the engine). Rust problems form from an “interaction between the nature of an organization’s machinery and its environment.” You have to deal with it before it affects operations, and well before it affects structural integrity.

Blundercheck

Mold, Rust and the Risk Society

It used to be sabertooth tigers. Now it’s Candy Crush we have to worry about…

4 months ago · 3 likes · Timber Stinson-Schroff

In my experience, this is one of the major differences between software engineers and data professionals. Software engineers tend to see the world as full of rust problems, while data professionals see mold spores.

The tension is between rust logic’s focus on maintenance or replacement versus mold logic’s requirements for controlled sanitization or managed overgrowth. Refactoring code is rust logic. Data quality tests are mold logic. Microservices are rust logic. A single source of truth is mold logic. Performance optimization is rust logic. Dimensional modeling is mold logic.

Neither view is “right,” nor are they incompatible. One of the things I look for as a mark of a great data professional is awareness of the consequences of bad data infiltrating the system. Bad data is like mold spores; flexible schemas are like unsealed surfaces. Once they get in, you don’t know—that is, it is actually unknowable—where it might end up. Dashboard graveyards are not bad only because of the clutter they create, but because they decompose into a feeding trough for parasites.

Timber says that, “mold logic tends to be about ideas, stories, people, memes, cultures… rather than structures and processes.” These are the soft spots in an organization. You would expect failures of these systems to show up in a boardroom conversation, not the boiler room.

When a database falls over, we use words like break, crash, fail. Legacy code becomes harder to work with, and less interoperable—but fundamentally, it works or it doesn’t. Legacy data goes sour. It elicits disgust long before it triggers alarms—the strawberry that’s more green than red. It may have once been ripe, but no more. It’s molded over.

Interestingly, a natural solution to the mold problem in agriculture is siloing. Keeping goods in a sealed, dry storehouse protects infestation and limits contamination. Similarly, when data is contained within an application, the mold problem mostly can be ignored. While misuse is possible, it is less likely. The problems of bad data modeling will most likely manifest as mechanical issues around performance.

It’s only when our focus shifts from the farmer to the market that we start thinking in terms of the data biosphere. Silos trades off protection against distribution and reusability; reusability increases the likelihood of infection. Instead, the data needs to be sanitized through modeling, accountability, quality guarantees.

Even then, it’s only possible to do so much. As more data gets created, the mold problem becomes akin to forest management, or gardening. Too much sanitization leads to sterility, and in an ecosystem, sterility means death. According to Timber, an alternative approach is to “manage overgrowth”:

Managed overgrowth is like gardening. Or in medical terms, palliative care. You allow the mold to persist, but limit its growth rate or the area in which it grows. Decay is treated as an inevitable but natural part of the process, where molded-over species or subsystems will eventually have to be replaced by more hardy varieties.

Managing overgrowth is as apt a description of data management as I’ve come across yet. It’s a skillset that is hard to test for—it certainly isn’t screened for in a Leetcode interview. When I look at the information ecosystem like a garden, Big O Complexity is just one species of weed to control; I still need to fence out the rabbits and use chili spray to deter pests. It’s not just dead plants that may cause issues, but new invasive ones as well.

There’s nothing quite like biting into a fresh, garden-sourced tomato, though, to make one feel at one with the world. Nor anything so alarming as finding a fuzzy grey core in the center.

Data People Etc.

Discussion about this post