Note: This is part 3 in a series: Knowledge isn’t power. Pt. 1 | Pt. 2
Knowledge transfer is the fundamental problem for data professionals.
Absent knowledge, data are bits and bytes, grist for the computational mill. The cloud data warehouse is loved by data engineers but built by regular ones.
Absent transfer, data is a linguistic crutch, an incidental aspect of doing-business, like how classes taught in English are not therefore “English” classes. Amongst the business, obsessing about data is a sign of disorder, not virtue.
But if you aim to turn data into knowledge and disseminate it broadly — now, you enter the proper domain of data.
The problem becomes neither the technology nor its application, but its lineage: its ancestry, context, reliability, fidelity, centrality. Those who work in this area learn that data is never wrong — only misunderstood.
It’s like a riverside village: everyone drinks the water, but no one thinks about where it flows from or where it goes, only whether they have enough. But for the data professional, the river, not the water, is the mystery.
Every data problem is a knowledge transfer problem, and every knowledge transfer problem can be formalized as a graph. Therefore, every data problem can be formalized as a graph.
Extract-load-transform is a graph. The multi-layer frameworks of Kimball and Inmon are graphs. dbt projects are graphs. Data mesh is a graph. Tableau dashboards are graphs. DAGs, semantic webs, data fabrics — graphs, graphs, graphs.
Each node in these graphs holds knowledge — it’s more than simply information. Each edge is the flow of that knowledge elsewhere. Knowledge comes from somewhere. It’s expressed. It’s perpetuated.
Knowledge is borne not just in datasets, but in brains, objects, and processes. People are knowledge hubs: they are taught, think, and teach. Frameworks are knowledge scaffolds: they are imagined, tested, and applied. Reports are knowledge vessels: they are written, read, and shared.
The problem of data, then, is the problem of optimizing this graph.
Knowledge graphs aren’t contained within organizational walls.
Remember that simple thought experiment about determinism? First, you assume everything has a physical cause. Then you take some action you did — say, failing a test — and trace it back to the ancient past — say, the Big Bang. (Optionally, you may then absolve yourself of all moral responsibility.)
The thinking goes, if every effect has a cause, and every cause has a cause, then there must be only one original cause — the Prime Mover.
Knowledge has a similar dynamic. An omnipotent being could trace an impulse of information from node to node along the web, for eternity.
The web has structure, but it is not static and it is not acyclic. It’s complex but not deterministically so: patterns emerge then fade. The web is self-referential, the web has history. Every node has a relationship with its ancestors and successors; to the past, present and future; to itself; to others’ expectations; to the limits of our physical world.
Let’s call this boundless graph of graphs the Übergraph.
Every community owns a slice of the Übergraph. It manifests as tribal knowledge, as trusted relationships, as direct messages and inside jokes. It is, basically, its own little übergraph.
Your corporate übergraph encompasses every table, model, chart, analyst, pipeline, quality check, API endpoint, fumbling intern, third-party data provider, browser plugin, careless executive, Google Sheet, etc., etc. in the company. You, if you are knowledgeable and influential, are a key node in your company’s übergraph.
A data lineage graph is a modest approximation of its übergraph. It won’t capture social and emotional dynamics, but it reveals logic, authority and domains quite well.
Structured information flow indicates organizational integrity.
In the übergraph, trust is authority — as soon as a group becomes less trusted, information will flow from elsewhere. The waters get muddy. They may still be powerful, but they are not authoritative, in the knowledge sense. This is true whether the organization looks like a slime mold or military.
Accountability requires history. We expect leaders to be “data-informed”, not because of math, but because data is traceable to a person, process, or place. We want to know that the head is not detached from the body.
The Übergraph tends towards entropy.
Like a garden left untended, the übergraph is naturally overrun by unprincipled communication. As the network expands, there are more nodes from which knowledge can flow. Questions of reliability and authority emerge organically, like weeds.
Source-of-truth management, then, is the natural entrypoint into übergraph cultivation. The first commandment: “Thou shalt love me, the Lord your God, with all thy mind.” It’s not such a hard thing to implement, but it’s very challenging to maintain.
For example, there’s a push to make data producers more responsible for the data they emit. Excellent. The domain creating the data has the most context, authority, and accountability structure for ensuring data is high quality before anyone uses it.
At the same time, there’s a push to make data consumers more capable of fast analysis and decision-making. Bravo. The domain making decisions has the most context, authority and accountability for leading strategically.
So you implement the data mesh, get rid of silos, abolish ELT and implement trusted point-to-point interfaces. You still can’t stop time, which means you can’t stop change, which means you can’t stop history or miscommunication or the teenagers from coopting language for their subversive purposes.
You can’t stop the Übergraph. You can only hope to ride it.
The night sky — that’s what the Übergraph is like.
There’s no question the night sky exists. It’s a physical thing that can be studied and cataloged and explained. It has stars, and rocks, and gravity wells, and black holes, and all of that. It has an app. It’s so obvious and so ubiquitous it’s boring.
But the night sky is also a mystery.
It’s an infinite expanse of space that gets projected through our atmosphere and onto our retinas. It’s galaxies stacked on top of galaxies. It’s ordered, musical: we learn from it, even as we can’t know it, not in any real sense. The night sky is fiery spheres that we perceive as minor blips of information — that our ancestors perceived in the exact same way.
We make stories up about the night sky — stories we know are artificial. Yet, we smile at them, at each other. We make up our own stories. We name the stars after ourselves. It’s something we share, even though we don’t know each other.
The night sky is not unreachable or unchangeable. We’ve launched rocks, dogs, humans up into orbit. And we dream of doing more, of terraforming pieces of it. Of managing it, like we do our lives, our houses, our cities. But I’m not sure we’ll ever be free of the mystery, for those who are open to it.
The Übergraph, too, is physical: you can point to specific things that taught you something dear — an influential book, a watershed moment, a loving parent. You can make up stories connecting these things — toy models, if you want to be an engineer about it — and get others excited about them.
But behind each of these stories presses an always-expanding mass of conversations and knowledge and consciousness that can only be appropriately treated with wonder.
great content Stephen. Get out of my head! Seriously, how you connect the dots here is awesome. The imagery for an über anything is tough to tackle and this is great.