Note: This is part 1 of 4 of a series: Knowledge Isn’t Power.
So they want us to be lawyers now.
We tried being scientists, but they were scared of the equations. We tried to be engineers, but they scoffed at our tests (both of them). We tried to be purple hybrid unicorn people, but payroll got confused, and so did our parents. We tried to be decision-makers, but nope, actually, that’s fine, they can keep that one to themselves.
So now here we are, data lawyers. Writing data contracts. Drawing up domain deeds of trust. Litigating semantic disputes.
It’s a good idea: complex businesses, like complex societies, have complex conflicts. There ought to be a pool of beautiful, talented, loquacious, well-paid mercenaries that the company can lean on to mediate those conflicts. Enter data people.
And the similarities between real lawyers and us are striking: both groups worship proper evidence and procedures. Both straddle an arcane body of knowledge and dynamic political landscapes. Both are rife with masochistic liberal arts majors who write copiously for work and play.
And data people have plenty of experience arbitrating disputes. For example, here’s a case I litigated back in the day, even before we were officially lawyers:
Case 1337-801: Head of Data vs Everyone In The World, et al.
On September 30, Head of Data (henceforth PLAINTIFF) was accused of harboring erroneous data by the Marketing team (henceforward DEFENDANT A). Upon investigation, PLAINTIFF discovered that Engineering (hereon DEFENDANT B) had pushed a change to a critical user analytics event. While addressing the issue, PLAINTIFF was notified that the Finance team (thencetowards DEFENDANT C) had already bypassed said event directly in their analytics tool, and Sales (whenceabout DEFENDANT D) had created an Excel lookup table to fetch the “real” values already. DEFENDANT D attested they never knew a warehouse existed.
Root cause analysis determined that a developer (quid pro quo DEFENDANT E) had made the initial commit in error, when a gust of wind (quod erat demonstratum DEFENDANT F) blew open his window, forcing him to drop his computer, and merging the pull request. The breeze itself was traced back to a Kenyan butterfly’s first flight (c’est la vie DEFENDANT G). With the commit merged, repercussions felt, and new quarterly priorities, the CEO (cogito ergo sum DEFENDANT H) asked all parties to get on with new initiatives.
PLAINTIFF seeks damages in the amount of mandatory subscriptions to her new Substack for all members of accused teams.
Street’s full of cases like this. Usually it’s a simple solution: sit everyone down, hammer out a data contract, enroll ‘em all in a 6-month online course on Protobuf and Thrift servers, and send them on their way. Boom. Yacht time until the next case.
I’ll be honest: I’ve seen the lawyer thing coming for a while now, but the contract angle caught me by surprise. I always figured the community’s push into legal battles would come from, you know, landmark international laws mandating strict controls on the collection, storage, management, and proper use of data. But I guess the real lawyers have that one under control…?
But it’s the money of course. We are spending a whole lot of money on data, and that means a whole lot less tolerance for error. But why data contracts?
One interpretation is that data producers need to be more aware of how data consumers use data, and a contract sets up a formal interface for both parties to rely upon. Coupled with efficient CI/CD processes, this allows the producer to be confident that changes are made efficiently and that downstream consumers won’t break.
So far, so good. But under this interpretation, the contract really boils down to Very Serious Documentation that indicates to all parties this is Very Serious Data we are working with and we should treat changes to it Very Seriously.
But let’s take another, less techno-centric interpretation of the contract. Let’s cast ourselves back to middle school, back to English class, back to the first — gulp — group assignment of the year. Consider:
Four students — Star, Alice, Bob, and Chad — have a group assignment to turn in on Friday. Work is allocated evenly, but Star is ambitious: she knows that anything less than a 102% plus smiley face will wreck her shot at State U. She trusts Alice and Bob to pull their weight, but Chad — that lumbering oaf of a football player has been practicing his signature all class period.
Star needs Chad to do his part in the group presentation, unless — unless she can convince the teacher that Chad willfully shirked his duties as team member. If Chad can be held demonstrably in violation of the group norms, then she may be able to sever her own destiny from his.
So Star writes up a schedule of duties, along with due dates, and gets everyone to sign. She then coyly runs it by the teacher, “Here’s what I’m thinking, Ms. Smith, is there anything missing from this plan?”
And there she has it: an objective plan, registered with the authorities, that has Chad on the hook. If Chad delivers, she’s a leader; if he doesn’t, Star has recourse. She may lose the smiley face, but she can effectively lobby for the 102%.
The contract is not an instruction manual. It is a yoke. Its Seriousness flows not only from its detail, but from its consequences. It is not only, “This is the agreement we are entering into,” but also, “If I fail to meet its requirements, I concede my house, or my car, or my dog, or my grades, or my dignity.”
Here we see the difference between our task and that of real lawyers.
Real contracts have punitive consequences: reputational harm, crushing fines, jail time. The law has a long arm, and it inflicts pain. That pain is in service of a greater goal: justice.
When a data contract is breached, we data lawyers wring our hands and update downstream dbt models to account for it. If we can take action, it is with a borrowed authority — an executive looking to “clean up the Sheets” or a product manager who can herself succeed only with more convincing data — not a systemic one.
It’s an unsatisfying situation, for all involved. Everyone, I think, from producers to consumers to stewards to executives, all sense that somewhere out there, some great goal is being violated, some data goddess is weeping. But we don’t know her well enough yet. She has no temple, no shrines, no name. She is some cousin to Reason, to Truth, to Science. Intelligence, maybe.
And while the formality of a contract is new, data people have been prosecuting these issues for years. Single sources of truth, statistical significance, data-driven decision-making — all of these are rituals invoking this goddess that we have tried to bring to the business. But how many of them are able to take hold?
Say you could prove that someone made a contract change or decision in bad faith — what would we do? If the confrontation was in a crowded boardroom, perhaps the data lawyer could pin them down dramatically, get them to squirm and admit guilt. But then what — do we fire the guy? No — more likely we kick off a quarterly project to fix the issue.
We lack the enforcement, we lack the courts, we lack the constitution. No — if we want data to permeate the human world of our businesses in the same way the law permeates our societies, we would do well to equip our own goddess like Justice: scales in the left hand, sword in the right.
I don’t mean to get mystical here, setting up deities and all. I’m just saying that if we’re going to be lawyers, we need to think hard about the economy we operate in. Do we expect to go door-to-door soliciting potential clients, because no one realizes they could benefit from our services, or do we expect clients to seek us out, because not having a data lawyer is a known existential risk?
There’s a great story from Steve Yegge, an early Amazon and then Google employee, where he shares with his Google colleagues the key decision that separated AWS from its rivals. Observe how Bezos, as described here, doesn’t just set a course for the company, but upends the entire incentives structure.
So one day Jeff Bezos issued a mandate. He's doing that all the time, of course, and people scramble like ants being pounded with a rubber mallet whenever it happens. But on one occasion -- back around 2002 I think, plus or minus a year -- he issued a mandate that was so out there, so huge and eye-bulgingly ponderous, that it made all of his other mandates look like unsolicited peer bonuses.
His Big Mandate went something along these lines:
All teams will henceforth expose their data and functionality through service interfaces.
Teams must communicate with each other through these interfaces.
There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team's data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
It doesn't matter what technology they use. HTTP, Corba, Pubsub, custom protocols -- doesn't matter. Bezos doesn't care.
All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.
Anyone who doesn't do this will be fired.
Thank you; have a nice day!
Ha, ha! You 150-odd ex-Amazon folks here will of course realize immediately that #7 was a little joke I threw in, because Bezos most definitely does not give a shit about your day.
#6, however, was quite real, so people went to work.
This is the context a data lawyer wants to work in: “If you break backwards compatibility, you do not get to cut that release.” Or, “If you add a new table to the warehouse, you must support it forever.” Or, “If you copy and paste data into a Google Sheet, you’re fired.” Under those conditions, people need consultants and expertise, advisors and best practices. Contracts are an implementation detail, not an end to themselves.
It’s an uncomfortable thought for nice people like us. We data people are great at relationships. We are welcome visitors, traveling from domain to domain, picking up the local languages, telling stories of far-off lands and leaving candies for the kids. We want to be welcomed, not feared or slandered.
But much as we love the domains we visit, we also recognize that the liminal spaces we travel are lawless frontiers. There’s little oversight, no process, bare ownership. If we want them to turn around, to be safe for the average family, it will take more than a gold star, a six-shooter, and an unsigned data contract.
So yeah, the data legal system is feeble, and I think our contracts are doomed to fail. But those aren’t reasons us to stop this lawyer thing.
First, representing domains is easy. If engineering wants to bring me in to decrease risk exposure, I’ve got the perfect advice: emit all data as strings, impose no formatting guarantees. Make as few schema decisions as possible. Shift that liability downstream, baby. Make the consumer hold the bag. People, we can build a whole schools on top of this schema risk mitigation stuff.
Second, we have an endless supply of work. We all know that data’s a hydra: every time a schema is codified, two new crop up to replace it. We tried to support the business directly by building the middle layer, and they pointed fingers at us. If domains think they can take ownership directly and they just need mediators — well, I’m happy to make the case for why they’re right. For a price, of course.
Finally, and best of all, there’s no Bar to pass. If you can compile a dbt project, you’re licensed!
So data people: buckle up. Take your hourly rates and double them. Print out some glossy new business cards. Primp up for your billboard headshots. And please, tell your colleagues — if they or anyone they know have been injured by schema issues, they may be eligible for recompense. Just have them call my office at any number conforming to the schema {"type": "string", "format": "itu-e123", "permitted-values": {"regex": "1-888-816-6474"}}
.
I thought this couldn’t get any more perfect until the last line 😂😂😂 A certain lawyer named after a resurrecting bird was a key component of my childhood and illustrates (see what I did there) this point perfectly. Notably, he did have a psychic to help him out. Wonder if that’ll be the next expected evolution for Data People. 😉
Delightful. I had already started googling what you call the predatory lawyers mechanising flight delay compensation for my comments contribution, but alas I give away my reading style.
> DEFENDANT D attested they never knew a warehouse existed.
Cuts deep.
Another thought - this legal battle only can occur later in the lifecycle of the pioneer-settler-planner paradigm, once the the Courthouse is built (ie the "Data Centre of Excellence" round table)
ps - feels like I need a Loom style emoji reacontion for sections that I 😂'd at