<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Data People Etc.: Symposia]]></title><description><![CDATA[Community-sourced essays on selected topics]]></description><link>https://stkbailey.substack.com/s/symposia</link><image><url>https://substackcdn.com/image/fetch/$s_!w6ij!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6db0c302-f8d1-4f24-a275-c062cdec3a55_1024x1024.png</url><title>Data People Etc.: Symposia</title><link>https://stkbailey.substack.com/s/symposia</link></image><generator>Substack</generator><lastBuildDate>Mon, 01 Jun 2026 08:25:37 GMT</lastBuildDate><atom:link href="https://stkbailey.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Stephen Bailey]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[stkbailey@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[stkbailey@substack.com]]></itunes:email><itunes:name><![CDATA[Stephen Bailey]]></itunes:name></itunes:owner><itunes:author><![CDATA[Stephen Bailey]]></itunes:author><googleplay:owner><![CDATA[stkbailey@substack.com]]></googleplay:owner><googleplay:email><![CDATA[stkbailey@substack.com]]></googleplay:email><googleplay:author><![CDATA[Stephen Bailey]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Orchestration isn’t going anywhere]]></title><description><![CDATA[Whether it resides in many silos or a single plane, orchestration is inescapable]]></description><link>https://stkbailey.substack.com/p/orchestration-isnt-going-anywhere</link><guid isPermaLink="false">https://stkbailey.substack.com/p/orchestration-isnt-going-anywhere</guid><dc:creator><![CDATA[Nick Schrock]]></dc:creator><pubDate>Mon, 24 Apr 2023 14:34:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!V867!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a95ca9-4d29-4d62-91e1-5c047b07ea74_2388x1668.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This is the 9th and final essay in the Symposium on <a href="https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator">Is the Orchestrator Dead or Alive</a>? You can read more from Nick and the Dagster team on the <a href="https://dagster.io/blog">Dagster blog</a>.</em></p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!V867!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a95ca9-4d29-4d62-91e1-5c047b07ea74_2388x1668.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V867!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a95ca9-4d29-4d62-91e1-5c047b07ea74_2388x1668.png 424w, https://substackcdn.com/image/fetch/$s_!V867!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a95ca9-4d29-4d62-91e1-5c047b07ea74_2388x1668.png 848w, https://substackcdn.com/image/fetch/$s_!V867!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a95ca9-4d29-4d62-91e1-5c047b07ea74_2388x1668.png 1272w, https://substackcdn.com/image/fetch/$s_!V867!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a95ca9-4d29-4d62-91e1-5c047b07ea74_2388x1668.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V867!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a95ca9-4d29-4d62-91e1-5c047b07ea74_2388x1668.png" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/08a95ca9-4d29-4d62-91e1-5c047b07ea74_2388x1668.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4738571,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!V867!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a95ca9-4d29-4d62-91e1-5c047b07ea74_2388x1668.png 424w, https://substackcdn.com/image/fetch/$s_!V867!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a95ca9-4d29-4d62-91e1-5c047b07ea74_2388x1668.png 848w, https://substackcdn.com/image/fetch/$s_!V867!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a95ca9-4d29-4d62-91e1-5c047b07ea74_2388x1668.png 1272w, https://substackcdn.com/image/fetch/$s_!V867!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a95ca9-4d29-4d62-91e1-5c047b07ea74_2388x1668.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>This symposium asks the wrong question</strong></h3><p>When Stephen asked me to participate in this symposium, my first reaction was &#8220;Are you sure?&#8221; After all, I am the founder of a company whose sole purpose is to build a system that many would call an &#8220;orchestrator.&#8221; I'm not exactly impartial.</p><p>The question assumes there is an agreed-upon definition for the <em>orchestrator</em>. If that definition is &#8220;the system whose sole responsibility is scheduling and ordering of tasks in production and nothing else,&#8221; then my answer is: it&#8217;s alive, but no one likes it, and it probably deserves to die.</p><p>If that&#8217;s my answer, then why am I doing what I am doing with my life? Worry not, this symposium has not caused some sort of existential crisis. Instead, I think &#8220;Is the orchestrator dead or alive?&#8221; asks the wrong question. Instead, we ought to ask about <em>orchestration</em>, not the <em>orchestrator</em>.</p><p><em>Orchestration </em>is an essential capability and isn&#8217;t going anywhere. The right question is: what is the future of orchestration?</p><h3><strong>Orchestration versus orchestrator</strong></h3><p>Modern organizations build data products and assets to power analytics, ML, and their production applications. To do this, data practitioners use a variety of technologies to create data pipelines. From the first pipeline to the full data platform, you need orchestration.</p><p>It&#8217;s worth defining orchestration:</p><blockquote><p>Orchestration is the coordination and management of multiple computer systems, applications and/or services, stringing together multiple tasks in order to execute a larger workflow or process. &#8212; <em><a href="https://www.databricks.com/glossary/orchestration">Databricks</a></em></p></blockquote><p>Data will, for the foreseeable future, be stored and computed in many storage systems and runtimes. All the data within an organization is not going to live in a single cloud data warehouse, executing on a single compute substrate. Organizational dynamics, economic realities, and technical constraints will not allow it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bZTd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0612745d-9004-42a6-a247-4b3925c81610_600x389.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bZTd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0612745d-9004-42a6-a247-4b3925c81610_600x389.png 424w, https://substackcdn.com/image/fetch/$s_!bZTd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0612745d-9004-42a6-a247-4b3925c81610_600x389.png 848w, https://substackcdn.com/image/fetch/$s_!bZTd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0612745d-9004-42a6-a247-4b3925c81610_600x389.png 1272w, https://substackcdn.com/image/fetch/$s_!bZTd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0612745d-9004-42a6-a247-4b3925c81610_600x389.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bZTd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0612745d-9004-42a6-a247-4b3925c81610_600x389.png" width="416" height="269.70666666666665" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0612745d-9004-42a6-a247-4b3925c81610_600x389.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:389,&quot;width&quot;:600,&quot;resizeWidth&quot;:416,&quot;bytes&quot;:65356,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bZTd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0612745d-9004-42a6-a247-4b3925c81610_600x389.png 424w, https://substackcdn.com/image/fetch/$s_!bZTd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0612745d-9004-42a6-a247-4b3925c81610_600x389.png 848w, https://substackcdn.com/image/fetch/$s_!bZTd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0612745d-9004-42a6-a247-4b3925c81610_600x389.png 1272w, https://substackcdn.com/image/fetch/$s_!bZTd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0612745d-9004-42a6-a247-4b3925c81610_600x389.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://twitter.com/mullinsms/status/1615910353204047872">Twitter</a></figcaption></figure></div><p>But that definition of orchestration is too narrow in the data domain. Even with a single system, there is a dependency graph of data assets that constitute a data pipeline, as all data must come from somewhere and go somewhere. Computations to produce that data must be ordered and scheduled.&nbsp;</p><p>If you are a data practitioner, you are orchestrating whether you call it that or not. If you are writing a data notebook that processes a file dropped in S3 and manually running it once a day, you&#8217;re orchestrating. If you are using dbt on a single warehouse, you are orchestrating: somewhere in the bowels of that codebase, there is a topological sort determining the order of model execution and launching compute into the data warehouse. If you have set up cron jobs in Fivetran, Snowflake Tasks, and Hightouch to flow data through your platform, you are orchestrating.&nbsp;</p><p>Overlapping cron jobs in the modern data stack SaaS tools has become a popular way to avoid orchestration tools.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rXpN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cacc3c6-13d3-4c68-80ff-ec2931162d72_879x661.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rXpN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cacc3c6-13d3-4c68-80ff-ec2931162d72_879x661.png 424w, https://substackcdn.com/image/fetch/$s_!rXpN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cacc3c6-13d3-4c68-80ff-ec2931162d72_879x661.png 848w, https://substackcdn.com/image/fetch/$s_!rXpN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cacc3c6-13d3-4c68-80ff-ec2931162d72_879x661.png 1272w, https://substackcdn.com/image/fetch/$s_!rXpN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cacc3c6-13d3-4c68-80ff-ec2931162d72_879x661.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rXpN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cacc3c6-13d3-4c68-80ff-ec2931162d72_879x661.png" width="412" height="309.8202502844141" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9cacc3c6-13d3-4c68-80ff-ec2931162d72_879x661.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:661,&quot;width&quot;:879,&quot;resizeWidth&quot;:412,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rXpN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cacc3c6-13d3-4c68-80ff-ec2931162d72_879x661.png 424w, https://substackcdn.com/image/fetch/$s_!rXpN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cacc3c6-13d3-4c68-80ff-ec2931162d72_879x661.png 848w, https://substackcdn.com/image/fetch/$s_!rXpN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cacc3c6-13d3-4c68-80ff-ec2931162d72_879x661.png 1272w, https://substackcdn.com/image/fetch/$s_!rXpN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cacc3c6-13d3-4c68-80ff-ec2931162d72_879x661.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>However, it&#8217;s just distributed orchestration, and that leads to operational silos. Good luck debugging an upstream data pipeline that breaks the ML team in a totally separate stack. It results in an operationally fragile data platform that leaves everyone in a constant state of confusion about what ran, what's supposed to run, and whether things ran in the right order.&nbsp;</p><h3><strong>Control plane, not an orchestrator</strong></h3><p>We at <a href="https://dagster.io/">Dagster</a> conceptualize ourselves as a data management tool. All production data assets in an organization are represented, in software, within Dagster. As they are software artifacts, change management is done with software engineering processes. Keeping those data assets up to date is an essential responsibility of Dagster, and so orchestration is a core capability.</p><p>This is a team- and tool-spanning <em>layer</em>, not a <em>silo. </em>Keeping a graph of assets up to date for stakeholders is the core function of any data, analytics, or ML engineering team. And those teams are stakeholders with respect to each other. Sharing a control plane, while bringing their own transformation and domain-specific tooling, is proper and natural.</p><p>This control plane also has an active metadata layer. Dagster streams information about the assets into an immutable, structured event log. Users can plug in their own metadata as well. This serves as a ledger for the data platform usable for many purposes: versioning, quality, and others. Directly within the system, users can schedule based on activity in this ledger. This can take the form of policy or explicit scheduling.</p><p>By its nature, a system of record of production assets combined with metadata naturally should and will incorporate lineage, data quality, cataloging, observability, and governance. With well-defined APIs, this will integrate an entire ecosystem of tools that provide specialized, higher-level functionality in all of those domains.</p><h3><strong>Here comes the iPhone/iOS analogy</strong></h3><p>What I am describing here is a rebundling dynamic. Try as I might, I cannot help but reach for the analogy to the iPhone &#8211; it is too apt.</p><p>Sending emails on the go, digital photography, contact management, texting, and voice communication are essential <em>capabilities. </em>That does not mean that Blackberries, mass-market digital cameras, PDAs, and flip phones deserved to survive as standalone <em>devices</em>.</p><p>Consolidating those capabilities into a single device was a watershed moment in personal computing, on the order of the broad adoption of the original PC.</p><p>And it wasn&#8217;t just the iPhone, it was iOS. All of those capabilities needed organization, coherence, and rules, or else the user would live in an untrusted, chaotic world. iOS did that. The grid of applications on your home screen is the manifestation of that ordered heterogeneity. The user can organize and catalog myriad capabilities within a single, trusted, coherent experience.</p><p>In our vision, the asset graph is data&#8217;s in-product manifestation of that ordered heterogeneity. Assets computed by any runtime and stored in any system conform to a common protocol. The graph they reside in is not a post hoc observation of your assets, but a system of record that is alive.</p><p>So just like the iPhone is still a &#8220;phone&#8221;, we might still end up calling the new type of orchestrator an orchestrator. But it will in no way resemble the orchestrators of the past. And any system that claims to be an orchestrator without these capabilities will be viewed as woefully deficient.</p><h3><strong>Conclusion</strong></h3><p>While the standalone orchestrator might be going out of fashion, orchestration is alive and well.&nbsp;</p><p>The &#8220;orchestrator&#8221; will fade in the same way the Blackberries, digital cameras, palm pilots, and flip phones did. But capabilities and tools are very different things. The capability that is orchestration is an essential, undeniable one. It must and will live on. The only question is <em>where</em>. And the answer to that question will be one of the most consequential in data infrastructure in the next decade.</p>]]></content:encoded></item><item><title><![CDATA[28 Dags Later]]></title><description><![CDATA[Orchestration in the end times]]></description><link>https://stkbailey.substack.com/p/28-dags-later</link><guid isPermaLink="false">https://stkbailey.substack.com/p/28-dags-later</guid><dc:creator><![CDATA[Stephen Bailey]]></dc:creator><pubDate>Tue, 18 Apr 2023 18:38:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5u3c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb02ad3-7fec-4660-818f-13c6af728cab_2379x1453.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This is essay #8 in the Symposium on <a href="https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator">Is the Orchestrator Dead or Alive</a>? </em></p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5u3c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb02ad3-7fec-4660-818f-13c6af728cab_2379x1453.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5u3c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb02ad3-7fec-4660-818f-13c6af728cab_2379x1453.png 424w, https://substackcdn.com/image/fetch/$s_!5u3c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb02ad3-7fec-4660-818f-13c6af728cab_2379x1453.png 848w, https://substackcdn.com/image/fetch/$s_!5u3c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb02ad3-7fec-4660-818f-13c6af728cab_2379x1453.png 1272w, https://substackcdn.com/image/fetch/$s_!5u3c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb02ad3-7fec-4660-818f-13c6af728cab_2379x1453.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5u3c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb02ad3-7fec-4660-818f-13c6af728cab_2379x1453.png" width="1456" height="889" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1cb02ad3-7fec-4660-818f-13c6af728cab_2379x1453.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:889,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5279381,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5u3c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb02ad3-7fec-4660-818f-13c6af728cab_2379x1453.png 424w, https://substackcdn.com/image/fetch/$s_!5u3c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb02ad3-7fec-4660-818f-13c6af728cab_2379x1453.png 848w, https://substackcdn.com/image/fetch/$s_!5u3c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb02ad3-7fec-4660-818f-13c6af728cab_2379x1453.png 1272w, https://substackcdn.com/image/fetch/$s_!5u3c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb02ad3-7fec-4660-818f-13c6af728cab_2379x1453.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">survivors</figcaption></figure></div><p>Hell is other services.</p><p>If you&#8217;re building a system today, you live in hell, and you know it. Your eyes water at the scorched wasteland of internal applications. Your face burns as you stumble through a miasma of cloud infrastructure. Your skin sloughs off when you sift through some startup&#8217;s Swagger sewage.</p><p>There is no god in hell. I miss him, that emergent phenomenon of server racks, impenetrable protocols, and Cheeto-fingered priests. He used to dwell among us. We placed him in arks in office basements, in rooms 20 cubits high by 14 cubits deep. You could beseech God for assistance, and he would reply: &#8220;You are a sinner, full of error. Repent and resubmit.&#8221;</p><p>But he&#8217;s gone now, stuffed into data centers, and squeezed through tubes as we write letters through his stewards. Any chance at a personal relationship is gone, forever.</p><p>It&#8217;s a virus, this services thing. It dehumanizes everything and everyone it touches. Your face is an endpoint, your organs a module, your appendages a series of plugins to interface with supported integrations.</p><p>I have my own theories on how we got here: a secret AWS lab, a slipped hand, a cracked beaker, a drop of serum, an exposed toe. Regardless: everything is now unbundled, a hive mind of services backed by the same four companies, connected by cables to a room inside the earth&#8217;s molten core, where demons plug and unplug them intermittently to make humans dance between Twitter, Slack, and stale status pages.</p><p>We are survivors of a plague. This wasteland is our new home.</p><p>It&#8217;s changed us. If there&#8217;s one thing survivors don&#8217;t count on, it&#8217;s tomorrow. We need our infrastructure to rise and fall on demand, a virtual FEMA camp. We don&#8217;t get attached: yesterday&#8217;s architecture is today&#8217;s scrap heap, salvaged for only the most essential parts.</p><p>Survivors depend on the services they hate. They ambush them in alleys, plunge a needle into their intracranial space, and withdraw as much value as they can. Stitching services together into a soulless gesture at our cultural past is the only way to survive, in these end times.</p><div><hr></div><p>The orchestrator, though, is a special kind of service.</p><p>Humans fashioned the orchestrator in their own image: it is an extension of the human mind, like another set of hands, like an infinite set of hands. The orchestrator, true to its name, is about turning the ideas of one into the actions of many. Orchestration is how psychological energy becomes kinetic energy.</p><p>The orchestrator&#8217;s essential role is this: <em>run the thing. </em>Run it now, run it later, run it after that, run it again if it fails, run it tomorrow, run it like this, run it over here, run it, run it now, run it again, again, again.</p><p>Yet as things have fragmented, the data orchestrator has found itself becoming not more important, but less. Like civilization itself, the data orchestrator was unbundled &#8212; the catalog-as-orchestrator, the warehouse-as-orchestrator, the sql-formatter-as-orchestrator. Instead of asking the survivors to <em>run the things</em>, the <em>things</em> are running themselves.</p><p>Data survivors, therefore, are even more adrift than others. One response has been to call for <em>more</em> <em>control</em> over the hellscape. Services are out of control. Data sprawl is unmanageable. We need better ways to coordinate. We need stoplights. Air traffic controllers. Bus schedules. Zoning ordinances. Crossing guards. Traffic lines. Segues.</p><p>These calls make sense &#8212; if you live in a civilization. But survivors do not sit on town councils. They do not go to the BMV and get driver&#8217;s licenses. They do not slow down for children at play.</p><p>Getting all these services, all this activity, to <em>make sense</em>, is a daydream of a time lost. It will not happen &#8212; not here, not today. But that doesn&#8217;t mean all hope is lost.</p><p>What if we built an orchestrator, not for architects, but for survivors?</p><div><hr></div><p>Chaos. We would embrace chaos.</p><p>We would build tooling that is, above all, versatile and fast. It could be triggered from anywhere at any moment. It would have no opinions. It would require no thinking, no planning, no setup. It would be dangerous.</p><p>Hellscape tooling <em>is</em> dangerous. We have plenty of examples.</p><p>Adopting <em><strong>Segment </strong></em>is like handing out an assault rifle to every man, woman, and child in your cult. Without warning, you can go from having no user behavior events to having millions, billions of events flowing into Segment and thence into seven other systems. The startup cost is next to zero: you click some buttons, turn off the safety, and press the trigger whenever a user touches your product.</p><p>Using Segment <em>will</em> result in pain, agony, and analysts writhing on the floor, their legs shorn off. Segment does not protect you. Segment is not your friend.&nbsp;</p><p>But Segment works, because Segment is fast. <em><strong>Datadog</strong></em>, similarly, is fast, and it also penetrates your systems, nestling into their squishy insides to feed and grow. Datadog is valuable with just a couple metrics, but every additional sacrifice provides more new correlations, more value within the same interface.</p><p>Datadog <em>will</em> balloon into an unmanaged mess of millions of metrics. It doesn&#8217;t matter, though, not really &#8212; the first concern is to survive, and to survive, you need to hit your target. You don&#8217;t say, &#8220;I wish Datadog had fewer metrics,&#8221; you say, &#8220;I wish Ripley never learned Kubernetes.&#8221;</p><p>On the orchestration front itself, we have examples: <em><strong>GitHub Actions</strong></em>, for example, is not a control plane &#8212; the only useful part of its UI is a button that generates a status snippet you can embed in the README. It has no local development story. If you are creating a workflow from scratch, you will experience nothing but pain.</p><p>Github Actions thrives because it eschews any meta-narrative around what it <em>should</em> be used for, who <em>should</em> use it. It&#8217;s just Actions. It runs things. And with a marketplace of over 18,000 actions, chances are that you won&#8217;t even know how awful the developer experience is. It&#8217;s just ready for you, ready for action, ready to <em>run code</em>.</p><p>Here&#8217;s the through line for hellscape tooling: they cater to the individual, not the organization. They spread out across survivor clans and create long-distance dependencies between people who don&#8217;t know each other. And they are fast, requiring no thinking or training, just an initial thirty-second trial and error period to find the most ergonomic way to fire the thing.</p><div><hr></div><p>The system emerges from activity; it&#8217;s a bottoms-up phenomenon. But this also means that the system&#8217;s survival is tied to its penetration into the culture. Its goal is not to evolve a single perfect organism, but to spawn an infectious alien growth that occasionally metastasizes into massive new fauna.</p><p>Simplicity and speed. This is where current data orchestrators fail. They make you <em>think</em>, not just about their opinionated frameworks &#8212; directedness, acyclicity, node/edge graph distinctions &#8212; but about whether they are even the right way to do a task.</p><p>Data ingestion &#8212; one process an orchestrator <em>must</em> own &#8212; illustrates this perfectly. Managed tools are easier for the simple case; managed resources such as Snowpipe are more effective in the advanced case.</p><p>The Snowpipe case is informative because it highlights the architectural limits of current orchestrators. A Snowpipe executes SQL whenever a new file lands in storage. It uses a message queue to trigger the compute, which updates the target table within seconds.</p><p>The Snowpipe doesn&#8217;t have good ergonomics. The monitoring sucks, it requires substantial infra work, and troubleshooting it is awful. But it is, above all, fast and simple. If a Snowpipe works for a use case, it&#8217;s probably the best solution &#8212; bypass the orchestrator.</p><p>There is a broader post-apocalyptic trend toward serverless pipelines. The DAG itself has been stuffed into a little box that&#8217;s run on ephemeral compute: event-driven, cross-platform, single-step executions, with fast code execution.</p><div><hr></div><p>This leaves the orchestrator in a dissatisfying middle ground &#8212; neither dominant enough to bring every workflow to heel, nor flexible enough to spread virulently across teams.&nbsp; But what if the <em>Snowpipe</em> model was the core of the platform? What if, instead of DAGs, we littered the world with even more tiny services mediated by a message bus &#8212; pouring fuel on the fire, rather than trying to put it out?</p><p>It would be a push towards a distributed, business-level task manager. The chief adaptation would be to move to an entirely activity-driven model, and a consequence would be that the orchestrator would become flat. Not tall buildings that form permanent infrastructure, but a flotilla of boats drifting on a sea of activity. Nimble, mobile, discardable.</p><p>How many applications in the unbundled data stack could you power with a message bus, a Lambda function, a database, and access to a shared resource configuration?</p><p>All of them?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y56n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf4e5302-b345-46f0-b061-e6240aa0d793_2388x1668.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y56n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf4e5302-b345-46f0-b061-e6240aa0d793_2388x1668.png 424w, https://substackcdn.com/image/fetch/$s_!Y56n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf4e5302-b345-46f0-b061-e6240aa0d793_2388x1668.png 848w, https://substackcdn.com/image/fetch/$s_!Y56n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf4e5302-b345-46f0-b061-e6240aa0d793_2388x1668.png 1272w, https://substackcdn.com/image/fetch/$s_!Y56n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf4e5302-b345-46f0-b061-e6240aa0d793_2388x1668.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y56n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf4e5302-b345-46f0-b061-e6240aa0d793_2388x1668.png" width="662" height="462.39972527472526" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf4e5302-b345-46f0-b061-e6240aa0d793_2388x1668.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:662,&quot;bytes&quot;:528714,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Y56n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf4e5302-b345-46f0-b061-e6240aa0d793_2388x1668.png 424w, https://substackcdn.com/image/fetch/$s_!Y56n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf4e5302-b345-46f0-b061-e6240aa0d793_2388x1668.png 848w, https://substackcdn.com/image/fetch/$s_!Y56n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf4e5302-b345-46f0-b061-e6240aa0d793_2388x1668.png 1272w, https://substackcdn.com/image/fetch/$s_!Y56n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf4e5302-b345-46f0-b061-e6240aa0d793_2388x1668.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The orchestrator that&#8217;s useful is the one plugged into everything. That&#8217;s the interesting play from the data catalog &#8212; in a reactive environment, <em>awareness</em> is as important as skill. The orchestrator is losing today because it&#8217;s not fast enough to onboard the use cases the engineers want, and it&#8217;s not simple enough to justify displacing or running managed services through it. It needs to do both.</p><p>I&#8217;ve sketched out aspects of this <a href="https://stkbailey.substack.com/p/product-sketch-segment-for-metadata">chimeric product</a> before &#8212; it&#8217;s part Segment, part Zapier, part Airflow, part Lambda function, and part Datadog. It doesn&#8217;t have to manage the compute itself &#8212; who cares whose EC2 instance it is &#8212; but it needs to have its tentacles in everything.</p><p>Here are the essential elements:</p><p><em>Event Bus: </em>Ingest every potentially relevant triggering event. Every single object change in S3, every Snowflake user query, every test failure, every new hire, every first kiss, every lame icebreaker. It all goes in, or at least essential derivatives do. This is your potential energy, the reptile sensory brain, the primordial soup from which all resulting process emerges.</p><p><em>Environment</em>: A bespoke orchestrator has over a cloud provider its ability to adapt to a <em>particular</em> context. The orchestrator speeds up development by making the environment easy to manipulate. It should take two minutes &#8212; literally two minutes &#8212; to write a task that queries Snowflake and sends a Slack message with a summary of the results every hour.</p><p><em>History &amp; Context</em>: The orchestrator embraces chaos, but only of a certain kind: ordered chaos, not wasteful chaos. If you tell it to fire at anything in its periphery, it will do that &#8212; then stop. This requires some meta-awareness, retries, task ids, and the like.</p><p><em>Artifact Catalog: </em>Passing data between tasks is one of the key elements that drive DAGs. But artifacts are simply a subset of the environment. They are, essentially, an address and an object type. That&#8217;s not to say they&#8217;re simple &#8212; but as dbt has shown, if they are properly cataloged, passing them between operations is simply a matter of having the right context and name available.</p><p><em>Results and Monitoring: </em>The orchestrator feeds itself, an Ourobouros of data processes. It&#8217;s directed, but it&#8217;s not deterministic, not necessarily. As each process completes, it adds results to the activity stream, which triggers downstream processes. But that&#8217;s not all: high-level systemic monitoring must be a primary concern to support all the chaos that&#8217;s being generated.</p><p>In the event-driven world, the graph exists, but it emerges from the architecture, rather than being the primary concern. You don&#8217;t <em>control</em> it, you create it, much in the same way you don&#8217;t truly control the activity patterns of users or the correlations of metrics in your infrastructure. You build funnels and piece together the processes after the fact.</p><p>And that&#8217;s the wedge the orchestrator can latch onto: the more chaotic the landscape, the more useful intelligence is. Compute is the content; the orchestrator is the feed.</p><div><hr></div><p>Chaos or control?</p><p>It boils down to activity or graphs.&nbsp;</p><p>A desire to control and unify leads you to <em>dbt</em>: a comprehensive description of the ideal state, an approximation of the <a href="https://stkbailey.substack.com/p/in-search-of-the-ubergraph">&#252;bergraph</a>. This is a convergence problem. You have an idea of the world, and the challenge is to make it a reality.</p><p>There&#8217;s a place for this in our world, even today: graphs are thunderdomes, the last remnant of real human culture, a sealed<a href="https://stkbailey.substack.com/p/in-search-of-the-ubergraph">-</a>off sphere where contests can be fought and arguments can be settled.</p><p>But thunderdomes are tiny, and the wasteland is enormous. There are so many teams out there who want to <em>run so many things</em>. Many of these are terrible ideas that might result in catastrophe.</p><p>The orchestrator should make catastrophe easy.</p><p>Our descent into services hell is the perfect opportunity for the orchestrator to become the authority on the environment, to connect &#8212; and disconnect &#8212; services, to arm survivors with the intelligence they need to react to new threats and to monitor the sliver of reality they care about.</p><p>The orchestrator does not have to settle arguments or broker consensus. It doesn&#8217;t have to choose between truth and lies, monoliths or meshes, good or bad quality&#8212; why not do all of it? Its natural role is as an arms dealer, distributing compute, secrets, data.</p><p>Existing tools are moving towards a less controlled model, but not fast enough. Astronomer&#8217;s push-button deployments, Dagster&#8217;s declarative scheduling, Prefect&#8217;s UI-driven flow registry, Datahub&#8217;s stream-based metadata ingestion, Atlan&#8217;s Zapier-like metadata actions.</p><p>How much of this boils down to an event stream, trigger logic, and a Lambda function? Make it lighter! Make it faster! Make it simpler!</p><p>But don&#8217;t make it weak, leaving engineers in the cold. Give survivors the tools they need to do ludicrous things. They don&#8217;t want to wait at bus stops, they want to jerry-rig ATVs and sky-hook onto passing choppers and wave chainsaws at zombies.</p><p>Man&#8217;s descent into chaos positions the orchestrator to become one of the few services we trust, a bulwark against the neverending hordes of point solutions. The orchestrator should applaud every new category, every new cloud provider, every bit of fragmentation, because each one adds the cacophony that can only be managed by a service that welcomes it.</p><p>There is no longer any direction, any acyclicity, any distinction between nodes and edges. Not anymore, not for us who live in these dark times. We only have the present, the finger on the trigger, the running of things, an infinite number of things.&nbsp;</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://stkbailey.substack.com/p/28-dags-later?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">import orchestrator; while True: orchestrator.run(&#8220;spread_the_virus&#8221;)</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://stkbailey.substack.com/p/28-dags-later?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://stkbailey.substack.com/p/28-dags-later?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div>]]></content:encoded></item><item><title><![CDATA[Data Materialization is a Convergence Problem]]></title><description><![CDATA[We've spent years shoving a square peg into a round hole]]></description><link>https://stkbailey.substack.com/p/data-materialization-is-a-convergence</link><guid isPermaLink="false">https://stkbailey.substack.com/p/data-materialization-is-a-convergence</guid><dc:creator><![CDATA[Alex Rasmussen]]></dc:creator><pubDate>Mon, 10 Apr 2023 23:57:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ojpS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffeadc9e-47c2-4f9f-9d0d-e95e1b85303a_1600x1011.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This is essay #7 (of 10) in the Symposium on <a href="https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator">Is the Orchestrator Dead or Alive</a>? You can read more from Alex on his professional website, <a href="https://bitsondisk.com">Bits On Disk</a>.</em></p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ojpS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffeadc9e-47c2-4f9f-9d0d-e95e1b85303a_1600x1011.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ojpS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffeadc9e-47c2-4f9f-9d0d-e95e1b85303a_1600x1011.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ojpS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffeadc9e-47c2-4f9f-9d0d-e95e1b85303a_1600x1011.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ojpS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffeadc9e-47c2-4f9f-9d0d-e95e1b85303a_1600x1011.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ojpS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffeadc9e-47c2-4f9f-9d0d-e95e1b85303a_1600x1011.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ojpS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffeadc9e-47c2-4f9f-9d0d-e95e1b85303a_1600x1011.jpeg" width="1456" height="920" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ffeadc9e-47c2-4f9f-9d0d-e95e1b85303a_1600x1011.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:920,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ojpS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffeadc9e-47c2-4f9f-9d0d-e95e1b85303a_1600x1011.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ojpS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffeadc9e-47c2-4f9f-9d0d-e95e1b85303a_1600x1011.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ojpS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffeadc9e-47c2-4f9f-9d0d-e95e1b85303a_1600x1011.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ojpS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffeadc9e-47c2-4f9f-9d0d-e95e1b85303a_1600x1011.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image by <a href="https://www.flickr.com/photos/johnwilliamsphd/4313022022">johnwilliamsphd on Flickr</a>; used under the CC BY-NC-SA 2.0 license</figcaption></figure></div><p>If you&#8217;ve already got a workflow orchestrator, it&#8217;s very tempting to treat data materialization &#8211; the act of turning abstract data models, typically defined in SQL, into relational tables in your data warehouse / lake / lakehouse / swamp / outhouse / etc. &#8211; as an orchestration problem. After all, the problem sounds like a straightforward set of related, imperative tasks: make a job DAG where every task represents a model and edges represent data dependencies between models, long-poll data sources until they become &#8220;fresh&#8221;, freshen models in data-dependent order, repeat frequently enough to meet your recency SLOs.</p><p>Easy, right?&nbsp;</p><p>Unfortunately, models are rarely that well-behaved. Models change all the time, data dependencies between models can be complex, and SLOs change as business demands shift. Backfills, data deletion requests, and run-of-the-mill bugs in query logic happen all the time. The more complex and dynamic your data becomes, the more shoehorning is required to fit all that dynamism into the workflow orchestrator&#8217;s imperative, scheduled, task-oriented abstraction. All too often, the result is a giant shambling mess of interdependent job DAGs that are tedious to operate, difficult to reason about, and dangerous to change.</p><p>I&#8217;ve become increasingly convinced that we&#8217;ve been trying to shove a square peg into a round hole by treating data materialization as a workflow orchestration problem. Data materialization isn&#8217;t an orchestration problem, it&#8217;s a <strong>convergence</strong> <strong>problem</strong>, and we need a new system to handle it.&nbsp;</p><h1>Why Convergence?</h1><p>I&#8217;m far from the first person to have longed for a convergence-based solution to data materialization. Benn Stancil wrote a post on his blog in August of last year called <a href="https://benn.substack.com/p/down-with-the-dag">&#8220;Down with the DAG&#8221;</a>, where he laments the current state of workflow orchestration in the (ugh) Modern Data Stack. The entire post is worth a read, but this excerpt sums up his main point:</p><blockquote><p>I don&#8217;t actually want to think about when to run jobs, how to define DAGs, or to manually orchestrate anything. I just want my data to be fresh&#8212;where I can declare what fresh means&#8212;and to know when it&#8217;s not.</p></blockquote><p>Earlier in this symposium, <a href="https://stkbailey.substack.com/p/nobody-should-write-etl">Vinnie</a> echoed the same sentiment:</p><blockquote><p>Let&#8217;s all just stop writing ETL altogether. Let&#8217;s declare the outputs of our pipelines and let our systems figure out the rest. If humans don&#8217;t wanna deal with it, let the machines do it, at least while we can still command them.</p></blockquote><p>In a perfect world, we&#8217;d tell the system what models we have, when we need them, and maybe what queries we&#8217;re running on them, and let some kind of data auto-materialization system do the grunt work of turning those models into tables in our warehouse. Instead, we&#8217;re left telling a workflow orchestrator both what we want and how to get it, and having to do a lot of undifferentiated toil when either of those things change.</p><p>In the abstract, the job of this hypothetical data auto-materialization system is to continually compare the state of the warehouse against the desired state of the models and their SLOs and make modifications to the warehouse until the models are materialized and the SLOs remain satisfied. We want the reality of the data in our warehouse to continuously <em>converge</em> toward our model definitions as those definitions change.</p><p>The software world is littered with solutions to convergence problems. Infrastructure-as-Code tools like Terraform and CloudFormation are convergence-based tools that read a representation of the desired end state, compare that desired state to the current state of the infrastructure, and construct a plan to converge the current state with the desired one. Kubernetes is also a convergence-based tool: its controller-manager compares the set of declarative resource specifications with the current state of the cluster and makes changes to the cluster (adding and removing cluster nodes, starting and stopping containers, etc.) until all the desired resources are running.</p><p>In a convergence-based view of data materialization, a &#8220;materialization controller&#8221; could periodically evaluate the state of all models, inspect their materialized counterparts, and issue queries to the warehouse to partially or completely (re-)materialize them as necessary. If a few of a model&#8217;s partitions need to be recomputed, the user could simply drop the impacted partitions and allow the controller to notice their absence and re-populate them. The controller could choose how to materialize a model by comparing the costs of each materialization method and adjusting the materialization method in the background as workloads or model sizes change. It could also handle prioritizing materialization queries based on the warehouse&#8217;s current load and the models&#8217; SLOs, automatically shifting costly materializations that can tolerate some delay to off-hours. All the incidental complexity that was once the domain of human operators disappears, leaving data teams to focus more on the data and less on the nitty-gritty technical minutia of getting it there.</p><p>Applying convergence to data materialization appears to be an idea whose time has come. Dagster is already taking a crack at this idea with their notion of software-defined assets. What I&#8217;m calling &#8220;convergence&#8221; here, they&#8217;re calling &#8220;reconciliation&#8221;, but we&#8217;re both essentially hitting the same high points. I haven&#8217;t played with Dagster&#8217;s implementation enough to form an opinion on it yet, but what I&#8217;ve seen so far looks promising (though defining assets as Python functions make the declarative purist in me a little itchy).</p><p>If the benefits of this approach are so apparent and work is already underway, you might wonder why we&#8217;re not already swimming in competing materialization controller implementations. I think it&#8217;s because building one is going to be really hard.</p><h1>The Trouble with Convergence</h1><p>The industry has enough collective production experience and battle scars with convergence-based systems to know that they&#8217;re not as easy to build or operate as they might first appear. In particular, we need to be wary of three big problems with building and operating convergence-based systems: explainability, over-eagerness, and drift.</p><p>The first big problem with convergence-based systems centers around explainability. Everything looks like magic in a convergence-based system when things are going well, but if something goes wrong it can be difficult to understand why. If you&#8217;ve ever gotten Kubernetes stuck trying to launch a pod, you&#8217;ll know what I&#8217;m talking about. Detailed machine- and human-readable logs would go a long way toward making the system easier to troubleshoot and could also serve as a rich piece of lineage information that could be used elsewhere. Those logs would need to be built into the system as a first-order architectural concern to be truly useful, though. Bolting logging onto the side once the system is already built won&#8217;t cut it.</p><p>The second big problem with convergence-based systems is that it&#8217;s easy for an over-eager controller to spend a lot of money and/or resources either doing too much or doing it too fast. The controller will need some guardrails to prevent it from trying to rebuild the universe from scratch in reaction to a misconfiguration or a hastily done refactor. These guardrails aren&#8217;t just necessary for emergencies or controller meltdowns, however. Even in normal operation, operators will need to be able to tell the controller to either stop or slow down sometimes, if only to keep the system from live-locking itself when a lot of models change at the same time.</p><p>The third and arguably thorniest problem with convergence-based systems is drift. If the controller were the only thing changing the warehouse, it could keep a cache of the warehouse&#8217;s state and use that cache to decide what actions to take. With that local cache as its source of truth, it could iterate faster and lighten the query load on the warehouse itself.&nbsp; Unfortunately, the controller is rarely the only thing manipulating the warehouse, and its state cache can get stale very quickly. Even if you could prevent anything but the controller from modifying the warehouse, you probably wouldn&#8217;t want to; you need some way to break the proverbial glass and bypass the controller in an emergency.&nbsp;&nbsp;</p><p>There are a few ways you might be able to keep the controller&#8217;s cache warm in the presence of drift. The controller could hook into some kind of notification system so that it&#8217;s aware of changes relatively quickly. Data warehouses are good at exposing what&#8217;s being done to them via query logs, and we can attach post-commit hooks to version control systems, so both systems are ready sources of notifications. Even with that notification system in place, there will still be some latency between a change happening and the controller becoming aware of that change. To account for that latency,&nbsp; you&#8217;ll need to assume some amount of staleness and limit the controller to operations that are &#8220;safe&#8221; if the controller&#8217;s view of the world is out-of-date. Restricting the controller to either idempotent or reversible actions seems like a good way of doing this, but that feels like it&#8217;ll be one of the trickier parts to get right.</p><h1>I, For One, Welcome our Convergent Overlords</h1><p>While there are a lot of challenges facing the implementers of a materialization controller, I&#8217;m cautiously optimistic that it can be done. We&#8217;re in such an early stage that we can and should be bringing all the lessons that we&#8217;ve learned from prior convergence-based systems into the development of this one. I&#8217;m hopeful that we&#8217;re gaining clarity on not just what&#8217;s breaking with workflow orchestrators, but why it&#8217;s breaking, and that we&#8217;ll use that clarity and a healthy dose of past experience from adjacent parts of software engineering to build a new generation of even better tools.</p>]]></content:encoded></item><item><title><![CDATA[Life After Orchestrators ]]></title><description><![CDATA[Orchestration is a time killer.]]></description><link>https://stkbailey.substack.com/p/life-after-orchestrators</link><guid isPermaLink="false">https://stkbailey.substack.com/p/life-after-orchestrators</guid><dc:creator><![CDATA[Benjamin Djidi]]></dc:creator><pubDate>Thu, 06 Apr 2023 15:23:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AEjL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fd31458-94fe-4f93-b0f3-3be2fa2bd086_947x366.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This is essay #6 in the Symposium on <a href="https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator">Is the Orchestrator Dead or Alive</a>? You can read more essays from Ben</em> <em>on <a href="https://medium.com/@bdjidi">Medium</a>.</em> </p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AEjL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fd31458-94fe-4f93-b0f3-3be2fa2bd086_947x366.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AEjL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fd31458-94fe-4f93-b0f3-3be2fa2bd086_947x366.png 424w, https://substackcdn.com/image/fetch/$s_!AEjL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fd31458-94fe-4f93-b0f3-3be2fa2bd086_947x366.png 848w, https://substackcdn.com/image/fetch/$s_!AEjL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fd31458-94fe-4f93-b0f3-3be2fa2bd086_947x366.png 1272w, https://substackcdn.com/image/fetch/$s_!AEjL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fd31458-94fe-4f93-b0f3-3be2fa2bd086_947x366.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AEjL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fd31458-94fe-4f93-b0f3-3be2fa2bd086_947x366.png" width="947" height="366" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7fd31458-94fe-4f93-b0f3-3be2fa2bd086_947x366.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:366,&quot;width&quot;:947,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AEjL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fd31458-94fe-4f93-b0f3-3be2fa2bd086_947x366.png 424w, https://substackcdn.com/image/fetch/$s_!AEjL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fd31458-94fe-4f93-b0f3-3be2fa2bd086_947x366.png 848w, https://substackcdn.com/image/fetch/$s_!AEjL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fd31458-94fe-4f93-b0f3-3be2fa2bd086_947x366.png 1272w, https://substackcdn.com/image/fetch/$s_!AEjL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fd31458-94fe-4f93-b0f3-3be2fa2bd086_947x366.png 1456w" sizes="100vw" loading="lazy" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Does this look like a value-added job?</figcaption></figure></div><p>How much time have you spent managing tasks on an orchestrator? Hundreds of hours? Thousands? Whichever it was, those were some low-value activities, and you probably felt that way too.</p><p>The root problem is that the orchestrator tries to control the data operations, rather than letting the operations stem from the data.</p><h3><strong>Why do we have orchestrators?</strong></h3><p><em>tl;dr: legacy.</em></p><p>As<a href="https://stkbailey.substack.com/p/will-active-metadata-eat-the-orchestrator"> Louise wrote</a>, the data orchestrator is &#8220;a software solution or platform responsible for automating and managing the flow of data across different systems, applications, and storage locations.&#8221; I would supplement it with the idea that, in the end, an orchestrator is a stateful solution to manage the execution of scripts.</p><p>Looking back, the need for data orchestration stemmed from the complexity of triggering logic, where simple script scheduling failed to account for dependencies. One script depended on another, and there needed to be a stateful software piece in the middle to not only trigger the entire process but to ensure individual steps were started only after all their upstream dependencies had successfully completed.&nbsp;</p><p>As a standalone brick, the orchestrator serves multiple purposes, the main ones being:</p><ul><li><p><strong>Keep state:</strong> keeping tabs on what has happened and what hasn&#8217;t is likely the core feature of the orchestrator.</p></li><li><p><strong>Trigger stuff:</strong> the actual mechanism of executing the script, often using a set of parameters set at the task or job level.</p></li><li><p><strong>Enforce dependency constraints (DAG):</strong> sort out which script should be executed before which and make sure there is no loop.</p></li><li><p><strong>Surface logs:</strong> provide a comprehensive view of task statuses that serves as a troubleshooting entry point.</p></li></ul><h3><strong>What&#8217;s the issue?</strong></h3><p><em>Hint: it&#8217;s the orchestration.</em></p><p>I could ramble on about the maintenance time spent managing deep dependency graphs (high number of successive tasks), the latency they introduce by forcing workflows to adjust to the slowest task, the cost they bear in an already resource-hungry practice, their poor interoperability with modern tools (or utter incompatibility like with stream processing), the burden they put on data consumers who are forced to depend on engineering work for the tiniest updates&#8230;</p><p>But, in the interest of keeping it efficient, the orchestrator&#8217;s problem boils down to two things: they&#8217;re just another brick in the stack, and, most importantly, new data technologies don&#8217;t need them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0JlG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b03a55-38ad-458f-85dd-86204f9890b4_1456x546.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0JlG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b03a55-38ad-458f-85dd-86204f9890b4_1456x546.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0JlG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b03a55-38ad-458f-85dd-86204f9890b4_1456x546.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0JlG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b03a55-38ad-458f-85dd-86204f9890b4_1456x546.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0JlG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b03a55-38ad-458f-85dd-86204f9890b4_1456x546.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0JlG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b03a55-38ad-458f-85dd-86204f9890b4_1456x546.jpeg" width="1456" height="546" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d1b03a55-38ad-458f-85dd-86204f9890b4_1456x546.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:546,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0JlG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b03a55-38ad-458f-85dd-86204f9890b4_1456x546.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0JlG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b03a55-38ad-458f-85dd-86204f9890b4_1456x546.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0JlG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b03a55-38ad-458f-85dd-86204f9890b4_1456x546.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0JlG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b03a55-38ad-458f-85dd-86204f9890b4_1456x546.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Definitely over-simplified.</figcaption></figure></div><h2><strong>What are the post-orchestrator people doing?</strong></h2><p><em>Just-in-time data ops.</em></p><p>Displacing the orchestrator means removing synchronization overhead, and the chief way to do this is to try and make the time of execution matter less. Today, this means implementing either Asynchronous Processing or High-Frequency batches.</p><p><em><strong>High-frequency batches</strong></em></p><p>Let&#8217;s cover high-frequency batches first because I see it as an aberration.</p><p>High-frequency batching is the idea of running processing steps at very high frequency (per batching standards) &#8211; for instance, every 5 min or so. In doing so, one can <em>almost </em>treat every independent task as asynchronous. This helps make individual tasks run independently, without the need for a stateful orchestrator at the helm, but it forces you down the darker path of one of the hardest trade-offs: agility for cost.</p><p>Anyone who has tried running high-frequency batches at scale has seen two challenges. First,&nbsp; latency adds up fast - it doesn&#8217;t take a deep dependency graph before high-frequency tasks stop correlating with high-availability datasets. And second, pricing hurts. Most data solutions aren&#8217;t designed for this usage for a simple reason: running a LEFT JOIN every 5 min means redundantly scanning a ton of data.&nbsp;</p><p><em><strong>Asynchronous processing</strong></em></p><p>Asynchronous processing, or continuous systems &#8211; think incremental engines like Stream Processors, Real-Time Databases, and to a certain extent HTAP Databases &#8211; do not require an orchestrator since they are continuous in nature and data is self-updating: a change is instantly reflected in consumer systems and downstream views/tables.</p><p>For anyone who isn&#8217;t too familiar with streaming yet, that&#8217;s the difference between Pull and Push data operations: the former requires you to constantly trigger the actions you would like performed (like in the ETL / ELT model for instance) while the latter computes and propagates data points <em>passively </em>and <em>incrementally</em>. No orchestration! Each task is almost like a microservice, a data service of sorts.</p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:1552674,&quot;name&quot;:&quot;Benjamin Djidi&quot;,&quot;logo_url&quot;:null,&quot;base_url&quot;:&quot;https://djidi.substack.com&quot;,&quot;hero_text&quot;:&quot;Co-founder, CEO @Popsink&quot;,&quot;author_name&quot;:&quot;Benjamin Djidi&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:null,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://djidi.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><span class="embedded-publication-name">Benjamin Djidi</span><div class="embedded-publication-hero-text">Co-founder, CEO @Popsink</div></a><form class="embedded-publication-subscribe" method="GET" action="https://djidi.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><p>Unfortunately, <a href="https://stkbailey.substack.com/p/good-data-engineers-are-lazy">Benoit already beat me to quoting</a> the &#8220;<a href="https://airflow.apache.org/docs/apache-airflow/stable/index.html#why-not-airflow">Why not Airflow</a>&#8221; documentation, so I&#8217;ll spare you that much. But as mentioned earlier, one of the core difficulties of orchestrators is interoperability: you simply can&#8217;t orchestrate everything. It introduces complexity and latency into your data operations and is generally a very efficient way to obliterate many weekly hours into an activity that ultimately shouldn&#8217;t exist.&nbsp;</p><p>Imagine if everything you run in prod <em>had </em>to be orchestrated, that would be the death of agile. Really, why is this still a thing in the data world? (Rhetorical question, please refer to the first paragraphs).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XRp9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d81032e-d606-410f-be58-502c88177838_1456x546.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XRp9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d81032e-d606-410f-be58-502c88177838_1456x546.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XRp9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d81032e-d606-410f-be58-502c88177838_1456x546.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XRp9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d81032e-d606-410f-be58-502c88177838_1456x546.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XRp9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d81032e-d606-410f-be58-502c88177838_1456x546.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XRp9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d81032e-d606-410f-be58-502c88177838_1456x546.jpeg" width="1456" height="546" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7d81032e-d606-410f-be58-502c88177838_1456x546.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:546,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XRp9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d81032e-d606-410f-be58-502c88177838_1456x546.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XRp9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d81032e-d606-410f-be58-502c88177838_1456x546.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XRp9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d81032e-d606-410f-be58-502c88177838_1456x546.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XRp9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d81032e-d606-410f-be58-502c88177838_1456x546.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Surprisingly efficient.</figcaption></figure></div><h3><strong>What&#8217;s the new toolbox then?</strong></h3><p><em>It&#8217;s already here, sort of.</em></p><p>If you&#8217;re spending time fixing an Airflow DAG, sorry for saying this but that&#8217;s unlikely to be your greatest contribution to your company. Fortunately, the alternatives are out there, so it&#8217;s already down to your organization&#8217;s techno-political leadership to make that choice. With regards to spinning up data services, all the asynchronous processing options mentioned earlier are already live and running at scale in an enterprise near you.</p><p>What&#8217;s really missing is some form of reusable Control Plane with basic enforcements: ensure that the Acyclic and Directed aspects of dependency rules are met, and surface operational commands and statuses at a central level for convenience. This is more of a generic DataOps tool that inherits some of the Orchestrator responsibilities, especially on metadata and command exploitation, albeit without meddling in the task lifecycle &#8211; hence no longer a mandatory piece of the stack. That&#8217;s where the &#8220;<a href="https://stkbailey.substack.com/p/will-active-metadata-eat-the-orchestrator">active metadata</a>&#8221; trend could be an interesting play by acting as a passive observer of the pieces involved, rather than intruding on the data operations by having an active role.</p><p>At Popsink we have no extra brick for orchestration, and it saves us time and money. We did end up building our own control plane for convenience, partly because some of the open standards today are incompatible with the idea of not having an orchestrator (like OpenLineage which is built on the foundation that tasks have a &#8220;start&#8221; and an &#8220;end&#8221;).</p><p>Yet it works great: jobs are now asynchronous constructs with passive consumers that do not need to pull the data when predefined conditions are met. To use a metaphor I like: we&#8217;re working with pipes instead of buckets. There&#8217;s a lot less lifting so you can sit back and watch the data flow.</p><h3><strong>Final words</strong></h3><p><em>You made it!</em></p><p>In many organizations, data products are still the output of a long line of data manufacturing that requires a plethora of orchestration. But it doesn&#8217;t have to be this way, the solutions are already out there, like the<a href="https://stkbailey.substack.com/i/106968790/streaming-workflow-builders"> Streaming Workflow Builders</a> Hubert mentioned or what we&#8217;re up to at<a href="https://www.popsink.com/"> Popsink</a>.</p><p>With the ongoing shift to continuous operations, it&#8217;s worth exploring what the post-orchestrator world feels like, if not for your sanity then for your wallet. I also grew pretty fond of the term &#8220;data services&#8221; as it sits well with the idea of on-demand data subscription as opposed to off-the-shelf &#8220;data products&#8221;.</p><p>It&#8217;s going to take a bit more time to reach full feature parity and for good reusable standards to materialize, especially as current active metadata and lineage systems are deeply rooted in orchestrators, but the capacities are already here. </p><p>It&#8217;s time to move on with your life.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://stkbailey.substack.com/p/life-after-orchestrators?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Feel the joy of asynchronous task submission.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://stkbailey.substack.com/p/life-after-orchestrators?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://stkbailey.substack.com/p/life-after-orchestrators?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div>]]></content:encoded></item><item><title><![CDATA[Limits of the Event-Driven Orchestrator]]></title><description><![CDATA[Don't use them for stream processing]]></description><link>https://stkbailey.substack.com/p/limits-of-the-event-driven-orchestrator</link><guid isPermaLink="false">https://stkbailey.substack.com/p/limits-of-the-event-driven-orchestrator</guid><dc:creator><![CDATA[Hubert Dulay]]></dc:creator><pubDate>Mon, 03 Apr 2023 14:01:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ZWWN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e6f463-be87-44d6-9154-f01bcb3ddda9_1600x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This is essay #5 in the Symposium on <a href="https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator">Is the Orchestrator Dead or Alive</a>? You can read more posts from </em>Hubert <em>on his Substack,</em> <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Hubert&#8217;s Substack&quot;,&quot;id&quot;:1265530,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/hubertdulay&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2dd7210-d9c1-4002-b66c-ab22dcd47635_718x718.png&quot;,&quot;uuid&quot;:&quot;66fe3eb7-7d11-4dbb-b97e-dc3d2e6406b9&quot;}" data-component-name="MentionToDOM"></span> </p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZWWN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e6f463-be87-44d6-9154-f01bcb3ddda9_1600x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZWWN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e6f463-be87-44d6-9154-f01bcb3ddda9_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!ZWWN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e6f463-be87-44d6-9154-f01bcb3ddda9_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!ZWWN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e6f463-be87-44d6-9154-f01bcb3ddda9_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!ZWWN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e6f463-be87-44d6-9154-f01bcb3ddda9_1600x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZWWN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e6f463-be87-44d6-9154-f01bcb3ddda9_1600x800.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/89e6f463-be87-44d6-9154-f01bcb3ddda9_1600x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZWWN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e6f463-be87-44d6-9154-f01bcb3ddda9_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!ZWWN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e6f463-be87-44d6-9154-f01bcb3ddda9_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!ZWWN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e6f463-be87-44d6-9154-f01bcb3ddda9_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!ZWWN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e6f463-be87-44d6-9154-f01bcb3ddda9_1600x800.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In a <a href="https://open.substack.com/pub/hubertdulay/p/openlineage-with-streaming-data?r=46sqk&amp;utm_campaign=post&amp;utm_medium=web">previous post</a>, I talked about how data lineage can be difficult for stream processing workflows. The <em><strong>asynchronous</strong></em> tasks in streaming make it hard to conceptualize existing data lineage solutions. It&#8217;s because streaming tasks don&#8217;t have a start or an end. Lineage tools are mainly built for batch processing which expects <em><strong>synchronous</strong></em> tasks which do have a defined start and end. This same issue exists for workflow orchestrators.</p><p>Workflows are everywhere. If you&#8217;ve ever viewed a<a href="https://spark.apache.org/docs/3.3.2/web-ui.html#jobs-detail"> Spark DAG</a> or a<a href="https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/execution/execution_plans/"> Flink plan</a>, you&#8217;ll know that those are workflows. If you&#8217;ve ever explained a<a href="http://www.postgresonline.com/journal/index.php?/archives/27-Reading-PgAdmin-Graphical-Explain-Plans.html"> SQL statement</a> in a popular database, you&#8217;ll also see a workflow. When implementing a<a href="https://microservices.io/patterns/data/saga.html"> SAGA</a> pattern for your microservices, that is also a workflow. These workflows contain business logic which is always being updated for optimization to save cost and improve user experience. Workflows need to be agile. We need tools that enable this agility to quickly react to improved changes to the business.</p><h2><strong>Event-driven applications and (a)synchronous tasks</strong></h2><p>To better understand asynchronous applications, we&#8217;ll need to understand Event Driven Architecture (EDA). EDA is an acronym that categorizes solutions that are triggered by events. An example of an event is a file appearing in a directory or a message appearing in a queue. EDAs are always running and listening for an event to occur. Then they act on that event by either running a task or processing data.</p><p>EDA applications do not require the producer of the event to know about the consumer of the event. Producer and consumer are not communicating synchronously like request-response RESTful APIs. Therefore EDA applications are asynchronous.</p><h2><strong>Event-driven orchestration vs stream processing</strong></h2><p>Orchestration involves using a tool like<a href="https://airflow.apache.org/"> Airflow</a> or <a href="https://dagster.io/">Dagster</a> to execute tasks in a workflow. These tools make it easy to build complex workflows, schedule, and monitor them. Workflows in these tools are called DAGs (directed acyclic graphs) and are executed on a schedule. Event-driven orchestration follows EDA semantics and can be done with these same tools. Instead of running on a schedule, they are triggered based on an event.</p><p>Stream processing is also an EDA. Stream processing involves listening for events that have occurred in a data store or messaging system and processing those events while the data is in motion.&nbsp;</p><p>The main difference is that the events in an event-driven orchestration don't hold business data while the events in a stream processor do hold business data. Examples of business data would be account records, user records, product records, etc.</p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:1265530,&quot;name&quot;:&quot;Hubert&#8217;s Substack&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2dd7210-d9c1-4002-b66c-ab22dcd47635_718x718.png&quot;,&quot;base_url&quot;:&quot;https://hubertdulay.substack.com&quot;,&quot;hero_text&quot;:&quot;\&quot;Streaming Data Mesh\&quot; OReilly - Supporting Monolithic Data Engineers&quot;,&quot;author_name&quot;:&quot;Hubert Dulay&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#ffffff&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://hubertdulay.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!yMXU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2dd7210-d9c1-4002-b66c-ab22dcd47635_718x718.png" width="56" height="56" style="background-color: rgb(255, 255, 255);"><span class="embedded-publication-name">Hubert&#8217;s Substack</span><div class="embedded-publication-hero-text">"Streaming Data Mesh" OReilly - Supporting Monolithic Data Engineers</div><div class="embedded-publication-author-name">By Hubert Dulay</div></a><form class="embedded-publication-subscribe" method="GET" action="https://hubertdulay.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><h2><strong>Can orchestrators do stream processing?</strong></h2><p>Both Airflow and Dagster have the ability to subscribe to an event to trigger a workflow. In Airflow and Dagster they are called<a href="https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/sensors.html"> sensors</a>. These components allow the orchestrator to listen for events wherever they may occur. Some examples of this would be listening for a file to appear, waiting for an HTTP request, or even consuming from a Kafka topic.</p><p>Stream processing can also read from Kafka. So can event-driven orchestration support stream processing? In order to answer this question, we need to understand the differences between batch processing and stream processing.</p><p>Older ETL/ELT processes required writing their data into a database because they didn&#8217;t have a way to perform complex transformations like joins and aggregations. When you persist your data in a database, you're forced into batch processing semantics which is why most older ETL/ELT processes needed to use scheduling tools like CRON. Airflow and Dagster are similar to CRON with additional enhancements.</p><p>With this understanding of orchestrators, I&#8217;ll try to use another event-driven orchestrator called Prefect.</p><h3><strong>Prefect</strong></h3><p><a href="https://www.prefect.io/opensource/">Prefect</a> is an open-source orchestration platform that looks promising. It natively has a way to consume events. In the code below, line 42 is an annotation that decorates a function. When this Python script is run, it will be submitted to a running Prefect server for execution. Line 9 contains the configuration to connect to Kafka. Line 21 contains the code that consumes data from Kafka and then processes the data by just printing it out to the console.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zXPU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc99a4c0-f1f0-4e4a-a35c-1b6ef01e598a_1201x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zXPU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc99a4c0-f1f0-4e4a-a35c-1b6ef01e598a_1201x1600.png 424w, https://substackcdn.com/image/fetch/$s_!zXPU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc99a4c0-f1f0-4e4a-a35c-1b6ef01e598a_1201x1600.png 848w, https://substackcdn.com/image/fetch/$s_!zXPU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc99a4c0-f1f0-4e4a-a35c-1b6ef01e598a_1201x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!zXPU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc99a4c0-f1f0-4e4a-a35c-1b6ef01e598a_1201x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zXPU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc99a4c0-f1f0-4e4a-a35c-1b6ef01e598a_1201x1600.png" width="1201" height="1600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc99a4c0-f1f0-4e4a-a35c-1b6ef01e598a_1201x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1600,&quot;width&quot;:1201,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zXPU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc99a4c0-f1f0-4e4a-a35c-1b6ef01e598a_1201x1600.png 424w, https://substackcdn.com/image/fetch/$s_!zXPU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc99a4c0-f1f0-4e4a-a35c-1b6ef01e598a_1201x1600.png 848w, https://substackcdn.com/image/fetch/$s_!zXPU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc99a4c0-f1f0-4e4a-a35c-1b6ef01e598a_1201x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!zXPU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc99a4c0-f1f0-4e4a-a35c-1b6ef01e598a_1201x1600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At first glance, this code looks like it may be the solution we need, but can it replace the popular stream processing platforms available today? Let&#8217;s see what we actually need for stream processing.</p><h2><strong>Stream processing requirements</strong></h2><p>Stream processing is an alternative way of transforming data that keeps your data in motion. This maintains real-time semantics but there are a few requirements orchestrators need to be able to handle stream processing.</p><ul><li><p>Need to be able to hold state.</p></li><li><p>Need to distribute tasks to execute in parallel.</p></li><li><p>Need to shuffle data between tasks.</p></li></ul><p>State enables stream processors to perform aggregations like counting records. Otherwise, it will not be able to remember the current count. These orchestrators do not have this ability.</p><p>Streams also need to evenly distribute their tasks across multiple workers so that they can process large amounts of data in parallel. Processing TBs of data through a machine will take a very long time. In a high throughput use case, the Kafka topic may have many partitions to allow for horizontal scaling. These orchestrators don&#8217;t take advantage of these partitions. They don&#8217;t have a way to distribute the data across multiple workers to execute parallel tasks. They deploy only one task runner which will need to subscribe to all partitions and process them synchronously.</p><p>Shuffling data is the way distributed databases move around data when performing a join. Shuffling and state enable stream processors their ability to perform complex transformations. Since orchestrators are neither distributed nor hold state, they cannot perform complex transformations.</p><p>So no, orchestrators cannot replace today&#8217;s popular stream processors. So what is event-driven orchestration supposed to do? The answer is really notifications. Event-driven orchestrations are really notification consumers. They only trigger tasks and do not define the task itself. The tasks that orchestrators invoke could be a batch process that can handle large volumes of data but that data wouldn&#8217;t come from Kafka. Batch data comes from data stores that hold data at rest.</p><h2><strong>Drivers are like DAGs</strong></h2><p>In Spark and Flink, you define your workflow logic in a driver just as you would in an Airflow DAG. Drivers are written in either Java, Scala, or Python and follow a functional programming (FP) paradigm. FP is beyond the scope of this post but to summarize, FP treats functions as first-class citizens. You can serialize a function passing them to other workers in a distributed system to run in parallel. These drivers define the logic in the workflow without actually running the logic locally. Instead, the functions get executed remotely to distribute the workload and scale out horizontally.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c8xf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d286d40-e551-4002-992b-8fd876e50d79_996x617.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c8xf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d286d40-e551-4002-992b-8fd876e50d79_996x617.png 424w, https://substackcdn.com/image/fetch/$s_!c8xf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d286d40-e551-4002-992b-8fd876e50d79_996x617.png 848w, https://substackcdn.com/image/fetch/$s_!c8xf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d286d40-e551-4002-992b-8fd876e50d79_996x617.png 1272w, https://substackcdn.com/image/fetch/$s_!c8xf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d286d40-e551-4002-992b-8fd876e50d79_996x617.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c8xf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d286d40-e551-4002-992b-8fd876e50d79_996x617.png" width="566" height="350.62449799196787" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9d286d40-e551-4002-992b-8fd876e50d79_996x617.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:617,&quot;width&quot;:996,&quot;resizeWidth&quot;:566,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!c8xf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d286d40-e551-4002-992b-8fd876e50d79_996x617.png 424w, https://substackcdn.com/image/fetch/$s_!c8xf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d286d40-e551-4002-992b-8fd876e50d79_996x617.png 848w, https://substackcdn.com/image/fetch/$s_!c8xf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d286d40-e551-4002-992b-8fd876e50d79_996x617.png 1272w, https://substackcdn.com/image/fetch/$s_!c8xf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d286d40-e551-4002-992b-8fd876e50d79_996x617.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These drivers aren&#8217;t orchestrators like you would think of Airflow, Dagster, and Prefect. The main difference is DAG orchestrators define task workflows but do not implement the logic in the task. Drivers do implement the workflow logic.</p><h2><strong>Streaming workflow builders</strong></h2><p>There are nice streaming workflow builders in the market today but they are not orchestrators. Confluent&#8217;s<a href="https://www.confluent.io/product/stream-designer/"> stream designer</a> and<a href="https://streamsets.com/"> StreamSets</a> are examples. These builders can build a streaming data pipeline from source to sink.</p><p>These builders have the ability to put tasks together like puzzle pieces using schemas. Schemas define the shape of the data. Stream builders use schemas to put tasks together with the same schema. If the schemas don&#8217;t match between the tasks, then the builder cannot put them together.</p><h2><strong>Is Closing The Gap Possible?</strong></h2><p>I really liked using Prefect as a way to trigger a workflow. It has a beautiful dashboard for monitoring your event-driven workflows. Plus, it&#8217;s open source which is even better. I&#8217;d say it&#8217;s a &#8220;prefect&#8221; orchestration tool. But it&#8217;s event-driven, not stream processing. If you need to do real-time processing of data, today you&#8217;ll need to leverage the stream processing tools available today like Apache Flink, Apache Spark, ksqlDB, etc.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ri6W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd96dfbb9-b807-4057-8ee6-9edde086aafa_1280x850.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ri6W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd96dfbb9-b807-4057-8ee6-9edde086aafa_1280x850.png 424w, https://substackcdn.com/image/fetch/$s_!ri6W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd96dfbb9-b807-4057-8ee6-9edde086aafa_1280x850.png 848w, https://substackcdn.com/image/fetch/$s_!ri6W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd96dfbb9-b807-4057-8ee6-9edde086aafa_1280x850.png 1272w, https://substackcdn.com/image/fetch/$s_!ri6W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd96dfbb9-b807-4057-8ee6-9edde086aafa_1280x850.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ri6W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd96dfbb9-b807-4057-8ee6-9edde086aafa_1280x850.png" width="524" height="347.96875" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d96dfbb9-b807-4057-8ee6-9edde086aafa_1280x850.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:850,&quot;width&quot;:1280,&quot;resizeWidth&quot;:524,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Toward Dataflow Automation - Prefect&quot;,&quot;title&quot;:&quot;Toward Dataflow Automation - Prefect&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Toward Dataflow Automation - Prefect" title="Toward Dataflow Automation - Prefect" srcset="https://substackcdn.com/image/fetch/$s_!ri6W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd96dfbb9-b807-4057-8ee6-9edde086aafa_1280x850.png 424w, https://substackcdn.com/image/fetch/$s_!ri6W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd96dfbb9-b807-4057-8ee6-9edde086aafa_1280x850.png 848w, https://substackcdn.com/image/fetch/$s_!ri6W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd96dfbb9-b807-4057-8ee6-9edde086aafa_1280x850.png 1272w, https://substackcdn.com/image/fetch/$s_!ri6W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd96dfbb9-b807-4057-8ee6-9edde086aafa_1280x850.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But can orchestrators do stream processing?</p><p>Not really. Orchestrators assemble discrete tasks in a workflow. To the orchestrator, tasks can be arranged in any order like square blocks. Tasks in stream processing are arranged like puzzle pieces and must follow the input and output of each subsequent task. Think of stream processors as a large SQL statement with many <a href="https://www.sqlshack.com/sql-server-common-table-expressions-cte/">CTEs</a> that feed into one another sequentially. Then at the end of the statement, there is one data set that is stored somewhere. That is simply not what workflow orchestrators do.</p>]]></content:encoded></item><item><title><![CDATA[Good data engineers are lazy]]></title><description><![CDATA[Airflow's neighborhood must be razed]]></description><link>https://stkbailey.substack.com/p/good-data-engineers-are-lazy</link><guid isPermaLink="false">https://stkbailey.substack.com/p/good-data-engineers-are-lazy</guid><dc:creator><![CDATA[Benoit Pimpaud]]></dc:creator><pubDate>Thu, 30 Mar 2023 13:38:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pQ6j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ba6c94-ee95-443a-a180-e1deb61ebcd0_1081x1600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This is essay #4 in the Symposium on <a href="https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator">Is the Orchestrator Dead or Alive</a>? You can read more posts from </em>Benoit <em>on his Substack, </em><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;From An Engineer Sight&quot;,&quot;id&quot;:256742,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/fromanengineersight&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/aa1fb566-11bb-440e-8967-9104b1b75049_256x256.png&quot;,&quot;uuid&quot;:&quot;496eba03-a72c-44d4-b1f2-7dd7c4f4c807&quot;}" data-component-name="MentionToDOM"></span> .</p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pQ6j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ba6c94-ee95-443a-a180-e1deb61ebcd0_1081x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pQ6j!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ba6c94-ee95-443a-a180-e1deb61ebcd0_1081x1600.png 424w, https://substackcdn.com/image/fetch/$s_!pQ6j!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ba6c94-ee95-443a-a180-e1deb61ebcd0_1081x1600.png 848w, https://substackcdn.com/image/fetch/$s_!pQ6j!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ba6c94-ee95-443a-a180-e1deb61ebcd0_1081x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!pQ6j!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ba6c94-ee95-443a-a180-e1deb61ebcd0_1081x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pQ6j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ba6c94-ee95-443a-a180-e1deb61ebcd0_1081x1600.png" width="437" height="646.8085106382979" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03ba6c94-ee95-443a-a180-e1deb61ebcd0_1081x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1600,&quot;width&quot;:1081,&quot;resizeWidth&quot;:437,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pQ6j!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ba6c94-ee95-443a-a180-e1deb61ebcd0_1081x1600.png 424w, https://substackcdn.com/image/fetch/$s_!pQ6j!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ba6c94-ee95-443a-a180-e1deb61ebcd0_1081x1600.png 848w, https://substackcdn.com/image/fetch/$s_!pQ6j!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ba6c94-ee95-443a-a180-e1deb61ebcd0_1081x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!pQ6j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ba6c94-ee95-443a-a180-e1deb61ebcd0_1081x1600.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The Airflow&#8217;s House</figcaption></figure></div><blockquote><p><em>The goal of engineers is to be as lazy as possible. If your system doesn't allow you to sit and wait for completion, you're missing something.</em></p></blockquote><p>That was probably the day I knew I wanted to be an engineer.</p><p>The IT teacher was writing some wizardry on a very desaturated terminal projected on a big white wall. He said this sentence while I was on the point to get a "wow effect" on one of my first Python code snippets. It&#8217;s now carved in my memory.</p><p>Though a bit sarcastic, I really think he highlighted something that day.</p><p>What's our job - data engineers - if not trying to reach end-to-end automation?</p><p>I once wrote that <a href="https://towardsdatascience.com/you-dont-need-an-orchestrator-6517b243dece">we don't need orchestrators</a>. We need orchestrators, <a href="https://news.ycombinator.com/item?id=32317558">but not ones that end up in spaghetti</a>, not ones that need Italian data engineer chefs to maintain its codebase.</p><p>The orchestrator is often our best friend when speaking about automation, but I think we still lack something. <a href="https://medium.pimpaudben.fr/data-engineer-is-a-transitional-job-ed0074c89646">We are to a certain extent just a replacement for old BI tools... We should drive greater value</a>.</p><p>Data engineers are not used to their full potential. Writing the same Python code over and over, dealing with the same data issues, migrating systems to new ones, etc. This is not really an engineering job, is it?</p><p>The outcomes we should look for are automation at its full potential, laziness at its climax, and optimized engineering costs. Our main duty should be to design architecture, to build systems that allow us to be lazy, not manage all this stuff we have built.</p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:256742,&quot;name&quot;:&quot;From An Engineer Sight&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faa1fb566-11bb-440e-8967-9104b1b75049_256x256.png&quot;,&quot;base_url&quot;:&quot;https://fromanengineersight.substack.com&quot;,&quot;hero_text&quot;:&quot;A periodic about data, engineering and design&quot;,&quot;author_name&quot;:&quot;Benoit Pimpaud&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#ffffff&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://fromanengineersight.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!5ENi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faa1fb566-11bb-440e-8967-9104b1b75049_256x256.png" width="56" height="56" style="background-color: rgb(255, 255, 255);"><span class="embedded-publication-name">From An Engineer Sight</span><div class="embedded-publication-hero-text">A periodic about data, engineering and design</div><div class="embedded-publication-author-name">By Benoit Pimpaud</div></a><form class="embedded-publication-subscribe" method="GET" action="https://fromanengineersight.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><h2>How to clean up our city?</h2><p>As Antoine de Saint-Exupery said, "a designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."</p><p>We definitely missed something with the modern data stack. <a href="https://hbr.org/2014/06/how-to-succeed-in-business-by-bundling-and-unbundling">Unbundling and bundling  cycles, some would say</a>. Or we just got blinded by marketing and hoped that our problem could be solved by a myriad of new tools.</p><p>The orchestrator&#8217;s neighborhood is not a pretty sight. It&#8217;s like a continuation of gray buildings trying to grow wherever they can while one wonderful house keeps standing here in the middle of the street. There are shadows on this house. (<a href="https://mad.firstmark.com/card#recjmMtyR7rIP6L1F">There are actually several houses</a>.)</p><p>What we might just want is to bring some sunshine back to the city. To remove some of those ugly buildings and keep the ones that really support us. And why not at the same time raze some big houses nobody really wants to buy anymore?</p><p>Yes, the Airflow house is great. I loved Airflow&#8217;s garden, like any front-end developer who loved jQuery lighting at some point. But now the house is too big to keep all our furniture and decorations tidy.</p><p>&nbsp;<a href="https://airflow.apache.org/docs/apache-airflow/stable/index.html">&#8220;Airflow was not built for infinitely-running event-based workflows&#8221;</a>.</p><p>Playing with words here, but even if that myriad of tools didn&#8217;t replace Airflow, they still highlighted something: we need a new central control plane to deal with our event-based reality and its infinitely running data flows.</p><p>Fortunately, we are starting to catch up.&nbsp;</p><p>Yes, we have and use better tools: dbt, <a href="https://github.com/kestra-io/kestra">new declarative orchestrators</a>, Terraform, duckDB, CI/CD, etc... The declarative paradigm is nudging every part of the data stack.</p><p>But as disruptive as those new tools are, we still need vision, automation, and architecture design. We need the mayor to wake up and stop wondering if building a new pool will allow him to be re-elected.</p><p>Tools and codebases are often the elements of the debate. And while they support our concrete daily stuff, it's rare to take the same amount of time to think about the underlying system architecture they question and the legacy they will create.</p><p>One key to uncovering some vision and paving the way toward persistent roads is to ask ourselves probing questions :</p><ul><li><p>Should my orchestrator be in the central space of my data stack? Does it fit with my business needs?</p></li><li><p>Do I really need to pay for a tool to extract data from one place to another?</p></li><li><p>Should I ask pricy engineers to write the same low-value Python code over and over?</p></li><li><p>What's my codebase vision? Does my codebase need a vision? Or can I trash some code easily?</p></li><li><p>Is maintaining custom abstraction on top of abstraction a good idea (we often see custom Airflow codes in companies, but we have to remember that Airflow is already a layer of abstraction)?</p></li><li><p>Do my high-level execs understand what the game we are playing for?</p></li><li><p>Do my data analysts focus on business intelligence while understanding the need for good software engineering practices?</p></li><li><p>Do they consider themselves as "coders" even if their tools tell them to drag and click?<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p></li></ul><p>Again, no straight answers. Those questions are only here to suggest a way. To bring forward that idea, that small trick we are looking for while solving our technologies and organizational issues.</p><p>Like a city, a data system is not something neat. It's growing and changing constantly.</p><p>Our orchestrator is like the city's traffic controller, responsible for coordinating the movement of data through the city's streets and ensuring that everything is running smoothly. But we have to be at the city council level. Where we create a vision for our city's future and design the infrastructure to support it.</p><p>And like any city council, we have to be lazy. Don&#8217;t take me wrong here, it&#8217;s being lazy in the right way: we have to be bored in advance to deal with angry citizens, solve traffic jams, find resources from the government, etc.</p><p>There will always be unforeseen challenges, changes in the environment, and new technologies to integrate. And while the traffic controller has a major role to play here, it's our role to build solid architecture and a clear vision to guide our entire city's development.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Every click, every drag is translated to code at some point. Declarative isn't just a trend, it's the realization that consistency and efficiency can't be done without a proper DSL. Drag &amp; drop isn't bad at all, on the contrary. It allows speed and a great user experience. But it fails in automation and consistency. The declarative paradigm somewhat tackles this issue.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Will active metadata eat the orchestrator?]]></title><description><![CDATA[Leave orchestration to the engineers, and let the business feast on its metadata]]></description><link>https://stkbailey.substack.com/p/will-active-metadata-eat-the-orchestrator</link><guid isPermaLink="false">https://stkbailey.substack.com/p/will-active-metadata-eat-the-orchestrator</guid><dc:creator><![CDATA[Louise de Leyritz]]></dc:creator><pubDate>Mon, 27 Mar 2023 13:01:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!whFw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013921df-f1c9-47e6-9c86-c102c31984cd_1600x842.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!whFw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013921df-f1c9-47e6-9c86-c102c31984cd_1600x842.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!whFw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013921df-f1c9-47e6-9c86-c102c31984cd_1600x842.png 424w, https://substackcdn.com/image/fetch/$s_!whFw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013921df-f1c9-47e6-9c86-c102c31984cd_1600x842.png 848w, https://substackcdn.com/image/fetch/$s_!whFw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013921df-f1c9-47e6-9c86-c102c31984cd_1600x842.png 1272w, https://substackcdn.com/image/fetch/$s_!whFw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013921df-f1c9-47e6-9c86-c102c31984cd_1600x842.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!whFw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013921df-f1c9-47e6-9c86-c102c31984cd_1600x842.png" width="727" height="382.4739010989011" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/013921df-f1c9-47e6-9c86-c102c31984cd_1600x842.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:766,&quot;width&quot;:1456,&quot;resizeWidth&quot;:727,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!whFw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013921df-f1c9-47e6-9c86-c102c31984cd_1600x842.png 424w, https://substackcdn.com/image/fetch/$s_!whFw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013921df-f1c9-47e6-9c86-c102c31984cd_1600x842.png 848w, https://substackcdn.com/image/fetch/$s_!whFw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013921df-f1c9-47e6-9c86-c102c31984cd_1600x842.png 1272w, https://substackcdn.com/image/fetch/$s_!whFw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013921df-f1c9-47e6-9c86-c102c31984cd_1600x842.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This is essay #3 in the Symposium on <a href="https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator">Is the Orchestrator Dead or Alive</a>? You can read more posts from Louise on <a href="https://medium.com/@louise.de.leyritz">Medium</a>.</em></p><div><hr></div><p>The world of data management is evolving rapidly, and one of the latest trends gaining traction is <strong>active metadata</strong>.</p><p>Active metadata refers to a dynamic approach to managing and using metadata, allowing it to flow across the different parts of the data stack, fetching people in the tools they are already using. Active metadata is actively updated processed, and integrated into various tools and workflows. This notion is often opposed to the idea of static metadata, or the traditional process of storing metadata in a static data catalog.</p><p>Now, I&#8217;ve heard the rumor that <strong>active metadata is a play by data catalogs to replace the orchestrator.</strong></p><p>Where is this coming from?</p><p>Well, a data orchestrator is a software solution or platform responsible for automating and managing the flow of data across different systems, applications, and storage locations. Sounds familiar to the active metadata description? Indeed. Is active metadata trying to steal the fame?</p><p>I think not. Why is that? Active Metadata has better things to do. It&#8217;s hungry for something else.</p><p>In this piece, I&#8217;ll explore the extent to which Active Metadata could eat the orchestrator. Turns out, Metadata can eat half of the orchestrator, but it cannot eat it entirely. But more importantly, Active Metadata should not want to eat the orchestrator. In fact, it&#8217;s hungry for business users.</p><h2><strong>What Exactly is Active Metadata Eating?</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N-8d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4b84806-e25c-448b-91a9-91b1c04b9952_1600x980.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N-8d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4b84806-e25c-448b-91a9-91b1c04b9952_1600x980.png 424w, https://substackcdn.com/image/fetch/$s_!N-8d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4b84806-e25c-448b-91a9-91b1c04b9952_1600x980.png 848w, https://substackcdn.com/image/fetch/$s_!N-8d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4b84806-e25c-448b-91a9-91b1c04b9952_1600x980.png 1272w, https://substackcdn.com/image/fetch/$s_!N-8d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4b84806-e25c-448b-91a9-91b1c04b9952_1600x980.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N-8d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4b84806-e25c-448b-91a9-91b1c04b9952_1600x980.png" width="599" height="366.9697802197802" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d4b84806-e25c-448b-91a9-91b1c04b9952_1600x980.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:892,&quot;width&quot;:1456,&quot;resizeWidth&quot;:599,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N-8d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4b84806-e25c-448b-91a9-91b1c04b9952_1600x980.png 424w, https://substackcdn.com/image/fetch/$s_!N-8d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4b84806-e25c-448b-91a9-91b1c04b9952_1600x980.png 848w, https://substackcdn.com/image/fetch/$s_!N-8d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4b84806-e25c-448b-91a9-91b1c04b9952_1600x980.png 1272w, https://substackcdn.com/image/fetch/$s_!N-8d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4b84806-e25c-448b-91a9-91b1c04b9952_1600x980.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Data orchestration involves managing complex workflows and ensuring that the right data is available, transformed, and integrated as needed.</p><p>This process can be broken down into two interconnected components: the &#8220;<strong>triggering mechanism</strong>&#8221; and the "<strong>optimization intelligence</strong>."</p><p>The triggering mechanism is in charge of starting data orchestration workflows based on factors like time, events, or changes in the data.</p><p>The optimization intelligence, on the other hand, is responsible for optimizing the triggering mechanism. This includes deciding which workflows to execute, the order of execution, and how to transform and route the data by analyzing the metadata.</p><p>Active metadata, with its real-time updates and integration into various tools and workflows, has the potential to <strong>replace the triggering mechanism</strong> aspect of data orchestrators.</p><p>It can automate the initiation of workflows, provide real-time updates for the efficient triggering of data processes, and create context-aware workflows that dynamically adapt to changes in the data landscape.</p><p>By replacing the triggering mechanism component of data orchestrators, active metadata can offer enhanced automation, better integration with existing tools, and more efficient collaboration between teams. Isn&#8217;t this lovely?</p><p>However, active metadata cannot completely replace the second component of the orchestrator: <strong>optimization intelligence</strong>.</p><p>This is because data orchestrators still play a vital role in handling complex data processing, transformation, and integration tasks that require their expertise and capabilities.</p><p>Optimization intelligence is essential in determining how to process and move data between different systems and applications, ensuring that the right data is available in the right format at the right time.</p><p>Active metadata, while powerful, cannot fully replicate the sophisticated decision-making and processing abilities of the orchestrator.</p><p>The latter remains an integral part of the orchestrator's role, as it handles the more complex and nuanced tasks of data processing, transformation, and integration.</p><p>This means that while active metadata can streamline certain aspects of data orchestration, the orchestrator continues to be crucial in managing the overall data workflows and transformations.</p><p>So, active metadata cannot replace the orchestrator. But the real question is: should it want to? The answer is no.</p><h2><strong>Active Metadata is hungry for something else</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tIGr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5cc4f0f-6b00-43e5-97ec-49ca53a84abf_1600x792.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tIGr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5cc4f0f-6b00-43e5-97ec-49ca53a84abf_1600x792.png 424w, https://substackcdn.com/image/fetch/$s_!tIGr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5cc4f0f-6b00-43e5-97ec-49ca53a84abf_1600x792.png 848w, https://substackcdn.com/image/fetch/$s_!tIGr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5cc4f0f-6b00-43e5-97ec-49ca53a84abf_1600x792.png 1272w, https://substackcdn.com/image/fetch/$s_!tIGr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5cc4f0f-6b00-43e5-97ec-49ca53a84abf_1600x792.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tIGr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5cc4f0f-6b00-43e5-97ec-49ca53a84abf_1600x792.png" width="727" height="360.0048076923077" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5cc4f0f-6b00-43e5-97ec-49ca53a84abf_1600x792.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:721,&quot;width&quot;:1456,&quot;resizeWidth&quot;:727,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tIGr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5cc4f0f-6b00-43e5-97ec-49ca53a84abf_1600x792.png 424w, https://substackcdn.com/image/fetch/$s_!tIGr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5cc4f0f-6b00-43e5-97ec-49ca53a84abf_1600x792.png 848w, https://substackcdn.com/image/fetch/$s_!tIGr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5cc4f0f-6b00-43e5-97ec-49ca53a84abf_1600x792.png 1272w, https://substackcdn.com/image/fetch/$s_!tIGr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5cc4f0f-6b00-43e5-97ec-49ca53a84abf_1600x792.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Active metadata is a technology that can tackle different use cases. And we do not think its true value lies in replacing the orchestrator.</p><p>At<a href="https://www.castordoc.com"> Castor</a>, we believe that active metadata should pursue a much more rewarding goal: <strong>empowering business users.</strong> And it can do so by providing them with data context right where they are working.</p><p>Active metadata shouldn&#8217;t seek to eat the orchestrator. Instead, it should strive to become a <strong>personal data assistant</strong> that enhances collaboration, decision-making, and overall data literacy within the organization.</p><p>Active metadata can help business users make more informed decisions by <strong>offering necessary data contex</strong>t within their existing tools and workflows.</p><p>This context includes data definitions, lineage, relationships, and quality indicators, enabling users to better understand the data and make decisions based on accurate and up-to-date information.</p><p>Furthermore, active metadata can foster better <strong>collaboration and communication</strong> across the organization by integrating metadata into the tools and platforms that stakeholders already use as part of their workflow.</p><p>This seamless integration can <strong>bridge the gap between technical and non-technical team members</strong>, creating a more efficient and cohesive data-driven culture where everyone has access to the information they need to work effectively.</p><p>Finally, active metadata can play a crucial role in supporting <strong>data governance and compliance</strong>. By automating tasks such as data lineage tracking and data quality monitoring, active metadata can help organizations maintain control over their data and reduce the risk of non-compliance with regulatory requirements.</p><p>For these reasons, we sustain that the true value of active metadata lies in fostering innovation and growth within organizations.</p><p>By making it easier for employees to access, understand, and analyze data, active metadata empowers them to identify new opportunities, optimize processes, and develop innovative solutions to business challenges.</p><p>With a better understanding of data context and more efficient collaboration, organizations can leverage active Metadata to drive innovation, stay competitive, and achieve their strategic objectives.</p><p>In summary, even though active metadata can potentially influence the triggering mechanism aspect of data orchestrators, its real value is in its ability to act as a personal data assistant that enhances data literacy, collaboration, and decision-making within the organization.</p><p>By focusing on these more valuable goals, active metadata can bring tangible business value by empowering users to make the most of their data assets. We think the end game of active metadata is far more powerful than simply replacing orchestration, as it helps organizations unlock the full potential of their data and drive meaningful results.</p><h2><strong>Conclusion</strong></h2><p>In the ever-evolving landscape of data management, active metadata has emerged as a powerful force, offering new possibilities for streamlining data processes and empowering business users. While it may appear that active metadata has the potential to replace certain aspects of the orchestrator, particularly the triggering mechanism component, it should be put at the service of a different use case.</p><p><strong>Active metadata's ultimate goal should be to serve as a personal data assistant, enhancing data literacy, collaboration, and decision-making across the organization.</strong> By focusing on these valuable objectives, active metadata not only complements the role of the orchestrator but also brings tangible business value by enabling users to maximize the potential of their data assets.</p><p>By embracing the possibilities offered by active metadata and leveraging its unique strengths, businesses can stay ahead of the curve, foster a data-driven culture, and achieve their strategic objectives in an increasingly competitive landscape.</p><p>So, while active metadata might have an appetite for some aspects of data orchestration, <strong>its true hunger lies in empowering business users</strong> to make better use of data.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://stkbailey.substack.com/p/will-active-metadata-eat-the-orchestrator?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Activate Data People Etc.&#8217;s metadata</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://stkbailey.substack.com/p/will-active-metadata-eat-the-orchestrator?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://stkbailey.substack.com/p/will-active-metadata-eat-the-orchestrator?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div>]]></content:encoded></item><item><title><![CDATA[Nobody Should Write ETL]]></title><description><![CDATA[Let our systems figure it out, while they still listen to us]]></description><link>https://stkbailey.substack.com/p/nobody-should-write-etl</link><guid isPermaLink="false">https://stkbailey.substack.com/p/nobody-should-write-etl</guid><dc:creator><![CDATA[Vinnie Dalpiccol]]></dc:creator><pubDate>Thu, 23 Mar 2023 14:00:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_4vN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda9a200c-ac14-4a91-b296-1ce1fb5d8c5c_1024x1024.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_4vN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda9a200c-ac14-4a91-b296-1ce1fb5d8c5c_1024x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_4vN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda9a200c-ac14-4a91-b296-1ce1fb5d8c5c_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!_4vN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda9a200c-ac14-4a91-b296-1ce1fb5d8c5c_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!_4vN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda9a200c-ac14-4a91-b296-1ce1fb5d8c5c_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!_4vN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda9a200c-ac14-4a91-b296-1ce1fb5d8c5c_1024x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_4vN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda9a200c-ac14-4a91-b296-1ce1fb5d8c5c_1024x1024.webp" width="449" height="449" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da9a200c-ac14-4a91-b296-1ce1fb5d8c5c_1024x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:449,&quot;bytes&quot;:104148,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!_4vN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda9a200c-ac14-4a91-b296-1ce1fb5d8c5c_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!_4vN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda9a200c-ac14-4a91-b296-1ce1fb5d8c5c_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!_4vN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda9a200c-ac14-4a91-b296-1ce1fb5d8c5c_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!_4vN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda9a200c-ac14-4a91-b296-1ce1fb5d8c5c_1024x1024.webp 1456w" sizes="100vw" loading="lazy" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Courtesy of DALL-E, of course</figcaption></figure></div><p><em>This is essay #2 in the Symposium on <a href="https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator">Is the Orchestrator Dead or Alive</a>? You can read more posts from Vinnie on his Substack, </em><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Ask Vinnie&quot;,&quot;id&quot;:1261833,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/askvinnie&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/9d0012da-9a30-440a-9e27-f1557ec9ac80_1280x1280.png&quot;,&quot;uuid&quot;:&quot;6425df71-db99-4423-82b0-49e8410ac5f4&quot;}" data-component-name="MentionToDOM"></span> .</p><div><hr></div><p>We&#8217;ve already accepted that <a href="https://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/">engineers shouldn&#8217;t write ETL</a>. &#8220;It&#8217;s a hot potato,&#8221; we said, so let&#8217;s &#8220;give people end-to-end ownership of the work they produce.&#8221; Let Data Scientists write ETL, let Data Analysts ingest their own data. Engineers don&#8217;t have time for that, they&#8217;re busy writing vendor glue and fixing the SaaSpool known as &#8220;modern data stack&#8221; after dbt published a breaking change in their most recent release, breaking all 37 connectors.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><p>I say this is a good step, but it&#8217;s a band-aid solution. All of the problems inherent with ETL workflows are still there, just someone else&#8217;s responsibility. Why don&#8217;t we just do away with it altogether?</p><p>Maybe I am taking the argument in the article too literally after the wine I was contractually obligated to drink for this symposium (thanks for fuelling my alcoholism, Stephen), but let&#8217;s run with it for a second. Let&#8217;s assume that the platform team is just keeping packages updated and writing more decorators we can all import into our scripts. We can not only <code>@time</code>, our functions, we can also <code>@count</code> how many times they&#8217;re called, automatically <code>@send_telemetry_to_infra_team</code> and beg them to <code>@accept_this_pr</code>. I promise this time the code is <code>@tested</code> and <code>@stable</code>.</p><p>We have our connectors. We have our Fivetrans, Stitchs,<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> Airbytes, and that one pesky SDK some team wrote to access that one internal API.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a></p><p>The Data Analytics team is happy, they can click here and there and get metrics from Snowplow into Snowflake. They don&#8217;t even have to model the data themselves, <a href="https://hub.getdbt.com/dbt-labs/snowplow/latest/">it&#8217;s already done</a>.</p><p>And let the Data Science team do the same. End-to-end ownership means end-to-end silos. Those analysts could never begin to understand the sheer magnitude of <code>import tensorflow as tf</code>. Why use their data, it&#8217;s probably badly modeled anyway. <a href="https://github.com/dbt-labs/snowplow">We found our own, better package</a> and can hook it up ourselves. Now if only the Airflow instance doesn&#8217;t crash when we push our new DAGs to prod.</p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:1261833,&quot;name&quot;:&quot;Ask Vinnie&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9d0012da-9a30-440a-9e27-f1557ec9ac80_1280x1280.png&quot;,&quot;base_url&quot;:&quot;https://askvinnie.substack.com&quot;,&quot;hero_text&quot;:&quot;Musings on tech, data, philosophy, and the absence thereof.&quot;,&quot;author_name&quot;:&quot;Vinnie Dalpiccol&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#f5f5f5&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://askvinnie.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!RghW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9d0012da-9a30-440a-9e27-f1557ec9ac80_1280x1280.png" width="56" height="56" style="background-color: rgb(245, 245, 245);"><span class="embedded-publication-name">Ask Vinnie</span><div class="embedded-publication-hero-text">Musings on tech, data, philosophy, and the absence thereof.</div><div class="embedded-publication-author-name">By Vinnie Dalpiccol</div></a><form class="embedded-publication-subscribe" method="GET" action="https://askvinnie.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><p>Is it just me, or does this seem terribly inefficient? End-to-end ownership sounds great and the autonomy <em>feels</em> great, but even the author of the original piece says that &#8220;[they] are sacrificing technical efficiency for velocity and autonomy. It is important to recognize this as a deliberate trade-off.&#8221; It makes no mention of unnecessary duplication.</p><p>I guess I&#8217;ve been in one too many organizations where multiple teams were building the exact same thing without knowledge of one another (well, it&#8217;s only happened once, but I think it&#8217;s once too many) to accept this as a good solution. I may be too lost in abstractionland, but at a meta-level, most data work really isn&#8217;t that unique (for the sake of the argument, we&#8217;re not talking real-time or streaming processes), and as a matter of fact, it&#8217;s only the shape of the final exposure (yes, this could be another dbt pun) that really matters. Everything else going into it can, and should, be reused.</p><p>But if engineers aren&#8217;t writing ETL, and nobody <em>really likes to do it</em>, do we just accept this duplication? Do we just keep filling Snowflake&#8217;s pockets?</p><p>I think there&#8217;s a better way, and orchestration can be the answer. But not any orchestration, asset-aware orchestration.</p><p>Let&#8217;s all just stop writing ETL altogether. Let&#8217;s declare the outputs of our pipelines and let our systems figure out the rest. If humans don&#8217;t wanna deal with it, let the machines do it, at least while we can still command them.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a></p><p>We declare our data assets in code. Our data platform knows how to create and maintain them. The engineers already wrote fancy abstracted vendor glue anyway, they might as well write a YAML config for it. The consumers, be it the data scientists or analysts, can stop importing the <code>@fancy_new_decorator </code>and instead import the source asset into their project. That&#8217;s materialized once and always kept up to date. They tell the system their downstream use case depends on it, and then go to sleep. Or to a bar, or to the beach, and everyone lives happily ever after.</p><p>Maybe the very notion of ETL is wrong. In 2023, our data warehouse is not the final destination, it&#8217;s just another step in the journey of the data. ETL, and its cousin Reverse ETL (AKA <code>&#8220;ETL&#8221;[::-1]</code>) were helpful as concepts a few years ago, but it&#8217;s time we start thinking about processes differently.</p><p>The world I wanna live in is one in which some team, be it the engineers or the data producers themselves, declare source assets, including metadata and contracts about its shape, and consumers can hook up to them, the system being able to intelligently tell where they&#8217;re saved and how often they&#8217;re updated, and taking care of keeping all downstream dependencies up to date. Maybe this whole article is just about solving the technical part of &#8220;data mesh&#8221;,<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> but when we're still trying to hold our Data Scientist's hand (they might understand <code>import tensorflow as tf</code>, but they would never understand the Liskov Substitution Principle),<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a> we might as well go all the way and take back ownership for the sources. We all know they weren&#8217;t writing unit tests for them anyway.</p><p>And yes, this whole article is just another way for me to shill for <a href="https://askvinnie.substack.com/p/now-youre-thinking-with-assets">Dagster&#8217;s software-defined assets and declarative scheduling</a>. Guilty as charged.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Any similarities with related incidents in your organization is purely coincidental.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Or &#8220;Stitches,&#8221; one of the many unanswered questions in the modern data stack.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Which may or may not have been updated since they first launched it in 2017.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>Yes, I am scared about GPT-4.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>Which happens to be the easier part, of course.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>Not that I have, to be fair.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Data in a Downturn]]></title><description><![CDATA[A return to boring]]></description><link>https://stkbailey.substack.com/p/data-in-a-downturn</link><guid isPermaLink="false">https://stkbailey.substack.com/p/data-in-a-downturn</guid><dc:creator><![CDATA[Boring Dan]]></dc:creator><pubDate>Mon, 20 Mar 2023 16:55:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1Qks!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb4502bf-8fb8-427f-a31f-43288cd9d902_2388x1668.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Say what you want about the data space but it is never boring. Or at least it hasn&#8217;t been boring lately.</p><p>I remember how very green I was when I joined my first data team. Excited to be given the opportunity because&#8230; well it was boring and no one else wanted that job. These were the days when the scope of data teams was minimal, before data scientists were called the sexiest job of the 21st century. I remember reading that headline at the time and feeling confused. My day to day responsibilities were decidedly not sexy. Our main responsibility was to ensure everything ran in a predictable fashion with as little excitement as possible. But that was over a decade ago! Jumping forward to the end of 2022 I had almost forgotten those unsexy times.</p><p>The last few months have been tough for tech. Luckily data teams have not been hit too hard (based on my entirely anecdotal evidence). But I can&#8217;t say the same about the data stack. Teams are abandoning many of the tools they had added. We have all found out the complexity of the modern data stack does not mirror the maturity of the business or the number or people on the team, as much as it reflects the economy.</p><p>From my conversations, it is interesting to see what tools have been making the cut. High level it seems it has been a lot of last in, first out. Not many people are giving up on their data warehouse, orchestration or ETL (turns out databases are recession-proof). But some of the new tools have been sidelined or put on pause.</p><p>There are a few reasons some tools have not been able to differentiate themselves from critical components. First of all, the closer you get to the datastore, the harder it is to pivot. Once companies get their data in one place, they generally don&#8217;t like to move it somewhere else unless they have a very good reason. Even moving to similar technologies risks unintended changes and cascading issues that can add months of development time.</p><p>Databases are still the bread and butter of the data stack which everything else revolves around. It follows that there is a lot less need for data metadata analysis tooling without an actual database to point it at.</p><p>Also, data teams, not surprisingly, have a better understanding of the systems they have been dealing with for most of their careers. dbt&#8217;s biggest contribution to the community was not an overly elegant framework but a common language for people on small teams to communicate with other data folks working on similar projects. Everyone working in the industry before dbt had built their own version of it or tried to build one. They were rarely that successful but analysts at least understood what they were trying to accomplish.</p><p>As the original responsibilities of data teams became &#8220;solve problems&#8221;, the community moved onto other challenges especially as it became more insular. But here it became a little bit tricker. What should come next? Do you build a feature store? (I don&#8217;t mean to pick on this one example) There are many legitimate use cases for a feature store but not as many smaller data teams have experience building one or needing one. Luckily there were a growing number of offerings allowing you to jumpstart that journey. But jumping into the deep end of a complex project with a complex solution does not solve much of anything. Yes it makes things easier to get started but it quickly showcases the deficiencies of data teams.</p><p>A recent trait of the data community is to put a new name to an existing software practice and proclaim something new has just been discovered. As the goals of data applications became more ambitious and their scope expanded, there were more needs to support. For example, you now need someone or some tool to help with safe and consistent deployment. You might be thinking DevOps would fill that role but that would be wrong. The correct answer is you need DataOps&#8230; or MLOps&#8230; or AIOps&#8230; the point is it never existed before and it was up to the data team to develop this competency.</p><p>The increase in specialization in data teams while ignoring expertise from existing, non-data, teams led to more and more pressure to layer in new increasingly specific tooling. At its worst, data teams would just build parallel engineering orgs. This could help the agility of pure data applications but started to betray the goal of more complicated applications.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Qks!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb4502bf-8fb8-427f-a31f-43288cd9d902_2388x1668.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Qks!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb4502bf-8fb8-427f-a31f-43288cd9d902_2388x1668.png 424w, https://substackcdn.com/image/fetch/$s_!1Qks!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb4502bf-8fb8-427f-a31f-43288cd9d902_2388x1668.png 848w, https://substackcdn.com/image/fetch/$s_!1Qks!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb4502bf-8fb8-427f-a31f-43288cd9d902_2388x1668.png 1272w, https://substackcdn.com/image/fetch/$s_!1Qks!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb4502bf-8fb8-427f-a31f-43288cd9d902_2388x1668.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Qks!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb4502bf-8fb8-427f-a31f-43288cd9d902_2388x1668.png" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cb4502bf-8fb8-427f-a31f-43288cd9d902_2388x1668.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2775977,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1Qks!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb4502bf-8fb8-427f-a31f-43288cd9d902_2388x1668.png 424w, https://substackcdn.com/image/fetch/$s_!1Qks!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb4502bf-8fb8-427f-a31f-43288cd9d902_2388x1668.png 848w, https://substackcdn.com/image/fetch/$s_!1Qks!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb4502bf-8fb8-427f-a31f-43288cd9d902_2388x1668.png 1272w, https://substackcdn.com/image/fetch/$s_!1Qks!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb4502bf-8fb8-427f-a31f-43288cd9d902_2388x1668.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In a lot of ways, the new tools added to the data stack were an excuse to not communicate outside of data. Data teams have always had an inconsistent home. Is it in Product, Eng, DevOps? But ambiguity is not a reason to start doubling up on responsibilities and tools. The truth is if you treat data applications as entirely separate from engineering applications on the critical path, those data applications will never be trusted outside of data.</p><p>Building any level of trust means going beyond yourself. It may mean holding the data team to some of the standards of engineering and not just doing everything that makes the data team feel the most comfortable. Not every member of the team needs to become a software engineer but it would be good to try and learn some SWE habits. One of which is the importance of working within parameters. You may not always make the perfect decision but you need to know enough to not paint yourself into a corner. Part of maintaining a system means you can&#8217;t rewrite it whenever you hit a wall.</p><p>So what will the next year look like in data tools? I think we are going to go back to behaving like boring practitioners who work with confines. We may have to use tools not entirely catered to us. There will still be analytics tools but I think they will behave in ways that are recognizable to both data and engineering teams. This will allow a number of applications to be built on top of them and help break down silos.</p><p>We should encourage looking across teams to see what we can leverage. We can also try to return the favor. On data teams, I have often found the most successful tools draw interest from outside of data. When rolling out orchestration it was not over when analytics jobs had been migrated but when engineers started to move their own jobs onto it. When data applications start to resemble any other application, you can build some really interesting things and some things you didn&#8217;t intend at the beginning. You may also find that when something has the approval of the entire organization, it is much harder to cut when times get tough.</p><div><hr></div><p><em>This is essay #1 in the DPE Symposium on <a href="https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator">Is the Orchestrator Dead or Alive?</a></em></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://stkbailey.substack.com/p/data-in-a-downturn?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Bore your friends!</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://stkbailey.substack.com/p/data-in-a-downturn?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://stkbailey.substack.com/p/data-in-a-downturn?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div>]]></content:encoded></item><item><title><![CDATA[Symposium: Is the orchestrator dead or alive?]]></title><description><![CDATA[Longform techno-philosophical jabberwocky? Hold my wine.]]></description><link>https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator</link><guid isPermaLink="false">https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator</guid><dc:creator><![CDATA[Stephen Bailey]]></dc:creator><pubDate>Mon, 13 Feb 2023 15:48:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!OI7z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de17545-75d3-4a24-8d03-5ea3e1853f09_2045x1361.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This page features running commentary and introductions of Symposium posts, and it will be continuously updated throughout the symposium.</em></p><div><hr></div><h1>Essay Index</h1><p>Symposium posts were released between March and April 2023.</p><ul><li><p><a href="https://open.substack.com/pub/stkbailey/p/data-in-a-downturn?r=a3d32&amp;utm_campaign=post&amp;utm_medium=web">Data in a Downturn</a> by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Boring Dan&quot;,&quot;id&quot;:1508912,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/boringdan&quot;,&quot;photo_url&quot;:null,&quot;uuid&quot;:&quot;9b0ed064-ec88-4e1d-ab23-035733987d6d&quot;}" data-component-name="MentionToDOM"></span> </p></li><li><p><a href="https://stkbailey.substack.com/p/nobody-should-write-etl">Nobody should write ETL</a> by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Vinnie Dalpiccol&quot;,&quot;id&quot;:56389672,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f284fd7-2cb7-4e67-822e-f83625786fcc_3273x2697.jpeg&quot;,&quot;uuid&quot;:&quot;0b2fbb5f-360f-44b4-bac2-3fb044711398&quot;}" data-component-name="MentionToDOM"></span> </p></li><li><p><a href="https://open.substack.com/pub/stkbailey/p/will-active-metadata-eat-the-orchestrator?r=a3d32&amp;utm_campaign=post&amp;utm_medium=web">Will active metadata eat the orchestrator</a> by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Louise from Castor&quot;,&quot;id&quot;:38218705,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Febdfcc3b-535e-4d11-b7db-779534e281e8_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;043dced6-1ba5-4861-9aa1-84a81152a724&quot;}" data-component-name="MentionToDOM"></span> </p></li><li><p><a href="https://open.substack.com/pub/stkbailey/p/good-data-engineers-are-lazy?r=a3d32&amp;utm_campaign=post&amp;utm_medium=web">Good data engineers are lazy</a> by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Benoit Pimpaud&quot;,&quot;id&quot;:23621089,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1700bc2d-b494-49e4-b282-f061f189382a_2883x2883.jpeg&quot;,&quot;uuid&quot;:&quot;9c0a7f52-ccfd-4afc-b4c1-0c1bd81b37df&quot;}" data-component-name="MentionToDOM"></span> </p></li><li><p><a href="https://stkbailey.substack.com/p/limits-of-the-event-driven-orchestrator">Limits of the Event-Driven Orchestrator</a> by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Hubert Dulay&quot;,&quot;id&quot;:7035644,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F913363b8-e9b2-45f8-b5e5-456e5b022ea4_572x574.jpeg&quot;,&quot;uuid&quot;:&quot;86a2ba9c-9317-400d-9e83-1beba5a82fd4&quot;}" data-component-name="MentionToDOM"></span> </p></li><li><p><a href="https://stkbailey.substack.com/p/life-after-orchestrators">Life After Orchestrators</a> by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Benjamin Djidi&quot;,&quot;id&quot;:8759244,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56e36be9-7d69-4ba8-aaf7-e1e0703e9a32_636x636.jpeg&quot;,&quot;uuid&quot;:&quot;e0edcb28-f44e-4ffe-a888-a3883dc7960d&quot;}" data-component-name="MentionToDOM"></span> </p></li><li><p><a href="https://stkbailey.substack.com/p/data-materialization-is-a-convergence">Data Materialization is a Convergence Problem</a> by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Alex Rasmussen&quot;,&quot;id&quot;:4434805,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc67696b8-39e7-4626-ab78-5377af8bb601_3794x3794.jpeg&quot;,&quot;uuid&quot;:&quot;b69b212d-4f59-46ef-9de0-14189ea45004&quot;}" data-component-name="MentionToDOM"></span> </p></li><li><p><a href="https://stkbailey.substack.com/p/28-dags-later">28 Dags Later</a> by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Stephen Bailey&quot;,&quot;id&quot;:16953086,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6f80e7a3-c16e-476d-ba81-a6b7daa85547_1348x1348.jpeg&quot;,&quot;uuid&quot;:&quot;628347d6-36ac-46b9-a9ec-c1445365c05a&quot;}" data-component-name="MentionToDOM"></span> </p></li><li><p><a href="https://stkbailey.substack.com/p/orchestration-isnt-going-anywhere">Orchestration isn&#8217;t going anywhere</a> by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Nick Schrock&quot;,&quot;id&quot;:3107935,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/36ef010d-5387-4039-b04b-6dccf5822e09_824x824.png&quot;,&quot;uuid&quot;:&quot;efad1796-a2b0-44a5-a29c-2368a6e39888&quot;}" data-component-name="MentionToDOM"></span> </p></li></ul><div><hr></div><h1>The Symposium</h1><blockquote><p>For my own part, indeed, I take an immense delight in philosophic discourses, whether I speak them myself or hear them from others: whereas in the case of other sorts of talk -- especially that of your wealthy, money-bag friends -- I am not only annoyed myself but sorry for dear intimates like you, who think you are doing a great deal when you really do nothing at all.</p><p>From your point of view, I daresay, I seem a hapless creature, and I think your thought is true. I, however, do not think it of you: I know it for sure. </p><p>&#8212; Apollodorus, in Plato&#8217;s <em>Symposium</em></p></blockquote><p>Too many people approach writing with undue reverence. True, you must learn the craft and treat the audience (and their time) seriously. But I&#8217;ve found a lot of joy this past year in rebalancing between <em>constructive</em> and <em>expressive</em> modes of writing, and I&#8217;d like to create a space for others to try that out.</p><p>Enter the symposium: the drinking party. What&#8217;s not to love? If a <em>convention</em> is a coming-together of people for a certain purpose, a symposium is a <em>coming-apart</em> over the same. The drinking frames the discourse, the discourse drives the drinking. Cheers to that!</p><p>So, this year, I plan to experiment with hosting a few symposia &#8212; an open invitation to submit essays on a particular prompt. What do you know that others don&#8217;t? What irritations do you need to get off your chest? What&#8217;s worth putting your glass down and moving the conversation towards?</p><p>This is writing as basic philosophical art: bullshitting, seriously. Whoever you are, consider grabbing a pen, paper, and beverage, and writing in on the topic of:</p><div><hr></div><h1><strong>Is the orchestrator dead or alive?</strong></h1><p><strong>Complete your </strong><em><strong><a href="https://forms.gle/soh1QwgBHhDPLF2PA">Intention To Submit</a></strong></em><strong><a href="https://forms.gle/soh1QwgBHhDPLF2PA"> </a>by March 3, 2023</strong></p><p>To &#8220;unbundle&#8221; is to destroy something essential. The unbundled country is scarred by civil war. The unbundled house is empty from looting. The unbundled person is, well, luckier to be dead than undead.</p><p>So what is the unbundled orchestrator?</p><p>The orchestrator certainly has a <em>claim</em> to prominence in the data platform. It is the left-most tool, apart from the cloud provider. It knows your secrets. It has its tentacles everywhere. By the time it&#8217;s set up, it becomes wedged in a way comparable only to the database.</p><p>Yet, it&#8217;s squeezed on the left by simpler schedulers: Kubernetes cron jobs, Snowflake pipes, Amazon step functions. It&#8217;s pressed on the right by good-enough SaaS features: an unknowable wave of cron jobs.</p><p>Is there enough room for the orchestrator? Is it solving the right problems in 2023? Or should it wither and die, ceding its territory to the unbundled wilds.</p><p>I want to hear from others who are as obsessed with this question as I am. You don&#8217;t need authority to join a symposium &#8212; only spirit (and spirits). </p><p>And if <em>Dead or Alive</em> doesn&#8217;t summon your muse, remember it&#8217;s just the symposium&#8217;s opening salvo. Try out any of these other orchestrator-adjacent prompts:</p><ul><li><p>What sins did you commit to land in orchestration hell?</p></li><li><p>Have you built your own orchestrator? Why? Are you okay?</p></li><li><p>Is &#8220;active metadata&#8221; a play by the data catalog to eat the orchestrator?</p></li><li><p>Is the &#8220;data platform OS&#8221; just a data-aware orchestrator?</p></li><li><p>Is ML orchestration yet-another-pipeline, or is it sufficiently different to deserve its own niche?</p></li><li><p>Are data engineers condemned to forever be ritually sacrificed to Jeff Bezos&#8217; pet minotaur in the labyrinth of network policies, Terraform, Python, Docker, and managed services? Is the orchestrator the hero they need?</p></li><li><p>Are cron schedules real? Is time an illusion?</p></li></ul><p>Whatever your thoughts, I want to read them. And so do others, probably. And if they don&#8217;t, they at least want to drink while they act like they&#8217;re reading them.</p><p>So follow the instructions below, so you can tell your grandchildren you were a part of the first Data People Etc. symposium.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Or, guilt a friend into writing!</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://stkbailey.substack.com/p/symposium-invitation-is-the-orchestrator?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OI7z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de17545-75d3-4a24-8d03-5ea3e1853f09_2045x1361.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OI7z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de17545-75d3-4a24-8d03-5ea3e1853f09_2045x1361.png 424w, https://substackcdn.com/image/fetch/$s_!OI7z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de17545-75d3-4a24-8d03-5ea3e1853f09_2045x1361.png 848w, https://substackcdn.com/image/fetch/$s_!OI7z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de17545-75d3-4a24-8d03-5ea3e1853f09_2045x1361.png 1272w, https://substackcdn.com/image/fetch/$s_!OI7z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de17545-75d3-4a24-8d03-5ea3e1853f09_2045x1361.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OI7z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de17545-75d3-4a24-8d03-5ea3e1853f09_2045x1361.png" width="1456" height="969" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9de17545-75d3-4a24-8d03-5ea3e1853f09_2045x1361.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:969,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1709172,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OI7z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de17545-75d3-4a24-8d03-5ea3e1853f09_2045x1361.png 424w, https://substackcdn.com/image/fetch/$s_!OI7z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de17545-75d3-4a24-8d03-5ea3e1853f09_2045x1361.png 848w, https://substackcdn.com/image/fetch/$s_!OI7z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de17545-75d3-4a24-8d03-5ea3e1853f09_2045x1361.png 1272w, https://substackcdn.com/image/fetch/$s_!OI7z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de17545-75d3-4a24-8d03-5ea3e1853f09_2045x1361.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h1>How To Submit</h1><p>The authors behind Data People Etc. value doing as little work as possible. That means every submission will be reviewed,  but not deeply edited for content and quality. It will be accepted if it meets the bar of a professional making a good-faith effort to write an interesting essay.</p><p>There are two routes to publication:</p><p><strong>Published directly on Data People Etc.</strong></p><ol><li><p>You fill out the <a href="https://forms.gle/soh1QwgBHhDPLF2PA">Intent to Submit.</a></p></li><li><p>You complete your essay by the due date.</p></li><li><p>You coordinate with Stephen to create a guest author profile and a draft post.</p></li><li><p>Stephen publishes the essay on the arranged date.</p></li></ol><p><strong>Cross-posted to Data People Etc.</strong></p><ol><li><p>You fill out the <a href="https://forms.gle/soh1QwgBHhDPLF2PA">Intent to Submit</a>.</p></li><li><p>You write your essay on your Substack.</p></li><li><p>You publish your essay on the arranged date.</p></li><li><p>Stephen cross-posts your essay to Data People Etc.</p></li></ol><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://substack.com/refer/stephenbailey?utm_source=substack&amp;utm_context=post&amp;utm_content=undefined&amp;utm_campaign=writer_referral_button&quot;,&quot;text&quot;:&quot;Start a Substack&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption"></p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://substack.com/refer/stephenbailey?utm_source=substack&amp;utm_context=post&amp;utm_content=undefined&amp;utm_campaign=writer_referral_button&quot;,&quot;text&quot;:&quot;Start a Substack&quot;,&quot;hasDynamicSubstitutions&quot;:false}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://substack.com/refer/stephenbailey?utm_source=substack&amp;utm_context=post&amp;utm_content=undefined&amp;utm_campaign=writer_referral_button"><span>Start a Substack</span></a></p></div><div><hr></div><h1>What to Submit</h1><p>The literary symposium resembles the newspaper&#8217;s Op-Ed section. Articles should contain both argument and spirit &#8212; crankiness is an asset here. Other analogs: the folksy article on the last page of special interest magazines; the newspaper funnies in its golden age, circa 1992; scathing internal memos from early-stage hires that have finally had enough.</p><p>All you need is <em>one</em> <em>good idea</em>. It doesn&#8217;t have to be a <em>great</em> idea, and it doesn&#8217;t have to be a <em>big</em> idea. If you can breathe life into one good idea &#8212; and I know that you already have one, somewhere in you &#8212; then you can drive an essay worth reading.</p><p>Symposium essays can be as long as you want. Some DPE articles are long-form, registering 2500 words or more. They can also be short. Personally, I&#8217;d love to feature some truly weird writing &#8212; fiction, poetry, or songs. Whatever serves the idea.</p><p>Anyone can submit, including people affiliated with vendors. In that case, though, the author should respect the venue &#8212; present the ideas and the arguments, not the implementation details.</p><p>I&#8217;m also open to pseudonymous submissions. Is the orchestrator dead, but you are the CEO of a company that sells an orchestrator? Tell us more, Mr. Harry Flow.</p><p>The symposium is an opportunity to highlight <em>divergent</em> thinking. Be contrarian. Be elusive. Be metaphorical.</p><p>Or don&#8217;t. I don&#8217;t care, c'est la vie. I&#8217;ll be enjoying the wine.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://stkbailey.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">CHUG! CHUG! CHUG!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>