Discover more from Data People Etc.
Symposium hangovers, Substack Notes, etc.
HAIR OF THE DOG
And so, my friends, we close the first symposium. We have reached no consensus. We have solved no problems. We have advanced our field in no appreciable way.
But, we have — hup — consumed the wine, and we have celebrated the vices of chaos, boredom, and laziness. Now, let me try to piece the night back together.
The orchestrator is boring. In down times, boring is good — boring is basic, and basic is essential, like bread and water. The data ecosystem has been noisy lately. If you dial it all down, you’re left with the essentials: a database, an orchestrator, and a BI tool. Simple. Effective.
But boring can also leave you forgotten. We heard our share of grievances with DAG management and the total lack of value that orchestrating delivers to the business. The orchestrator is boring in the right ways, and the wrong ones, too.
The current class of orchestrators has gaps. On the business side, users ask — “What is the fuss about? A whole system designed to run things on a schedule or on demand? Sounds like an intern — let’s spend as little as we can on that, please and thank you.”
Catalog tools are happy to pick up the coffee at the curb and cozy up with the decision-makers. They might be onto something, too, if they can ever push the BI tool off the slide deck.
Even if the business doesn’t get it, coordinating these systems is no easy task. Maybe there’s still enough magic there, maybe the orchestrator can be a pipeline-granting genie that converges our wished world onto the real one. Data applications be deployed like Terraform:
dagster apply, and let the system figure it out. That would reduce the data engineer’s ETL burden and free up plenty of time for naps.
There are technical constraints here, though. With more teams bringing on at least some event-driven workloads, the orchestrator’s reach becomes necessarily smaller, the batch architecture feels less and less appropriate. Can the current class or tools ever meaningfully orchestrate off an event stream? Can it find an ergonomic way to support asynchronous processing?
To support these workloads, engineers have been migrating applications to serverless architectures keyed off message queues. These can perform complex directed processes and scale out reactively with data volumes. Cloud data warehouses are no stranger to this, either: tools like Snowpipe are faster than orchestrators and are gobbling up premier use cases. If the orchestrator wants to compete, it will need to get faster, and perhaps, embrace chaos, not control.
All of this leaves the status of the orchestrator still in question. As we descend further into the hell of ever-replicating microservices, there are fewer and fewer tools positioned to help us make sense of the world. The data orchestrator could be that tool.
But, if all the data orchestrators vanished today, life would find a way. There are other options.
There is this stubborn fact that orchestration, the coordination of multiple computer systems, is not going anywhere for the foreseeable future. The business demands new data, the business demands new services, the business demands new innovations: streaming ingestion, real-time databases, external data shares, data mesh, synthetic data, LLMs.
Though different, they are all connected, bound together by the constraints of time and lineage. Thus,
The capability that is orchestration is an essential, undeniable one. It must and will live on. The only question is where.
Thanks for coming everyone. Be careful not to step onon your way out the door.
I expected the symposium to last a week and to consist of me frantically writing three pseudonymous screeds to save face. Instead, it lasted 6 weeks and enticed 8 guest authors to submit outstanding and original essays.
It’s rare that experiments like this are fun, and it’s even rarer that they are successful. This symposium was both. I plan to do it again.
There will be tweaks: compression, mainly. While the editorial process was smooth due to outstanding contributors — really, why are you all so professional, it makes me look bad — I heard that the reader experience was challenging, given the length of time, the volume of writing, and the unclear duration. All of these are fixable.
I’m targeting early July for the next symposium. I’ll limit the run to 3-5 essays and publish it in a week.
While I won’t make promises, I’d love to hear suggestions on the next topic. What are you interested in reading and writing about?
A NOTE ON NOTES
I can’t quite figure out what sort of purpose Substack Notes is going to serve for writers and readers, but I can tell you what I’d like to start using it for.
One of my goals this year is to share more in-progress writing. (calls this "beta" work in various essays.) Working in a sort of public beta does two things, one for me and one for you.
Publishing early helps me get feedback on ideas early, without wasting unnecessary time refining bad ones. This only really matters so far as I have more ideas than I can refine, but, so far at least, I generate about three essay ideas for every one I publish. Notes can be a great way to give these some life, and to decide whether they should become something longer.
For example, with 28 Dogs Later, I wrote entire sections that I thought were funny and interesting — Mad Max at a BMV, John Connor in Driver’s Ed, that type of stuff. But it didn’t fit in the end, so it got discarded. Maybe Notes can give those passages life.
Anyway, here’s a first test of the format coming out of a short discussion on orchestration I led with some CoRise students (shout out to Dennis!):