Fivetran, Stitch, and Airbyte don't provide all (niche SaaS tools) the connectors so eventually I end up writing a script for some tools and for most we use an ELT tool. So I guess ETL/ELT scripts are here to stay, what do you think?
Thanks for the comment! I think the idea of writing code that extracts data from somewhere, transforms it (either minimally or significantly) along the way, and saves it somewhere else is not going away, after all, that's what all the data work is. My argument relates more to the idea of task chaining vs. maintaining data assets and letting the systems figure out the workflows for us.
Sure! As I wrote in the final paragraph, this article turned out to be a thinly veiled excuse for me to shill for Dagster's declarative approach. Instead of chaining tasks into DAGs, I've started thinking about my pipeline in terms of the outputs it produces. This would be the "maintaining data assets" instead of chaining tasks.
synchronized sources of truth >> single sources of truth
Fivetran, Stitch, and Airbyte don't provide all (niche SaaS tools) the connectors so eventually I end up writing a script for some tools and for most we use an ELT tool. So I guess ETL/ELT scripts are here to stay, what do you think?
Thanks for the comment! I think the idea of writing code that extracts data from somewhere, transforms it (either minimally or significantly) along the way, and saves it somewhere else is not going away, after all, that's what all the data work is. My argument relates more to the idea of task chaining vs. maintaining data assets and letting the systems figure out the workflows for us.
Thanks for correcting my understanding.
I have some clarifying questions, what do you mean by task
1. chaining vs maintaining data assets?
2. letting the systems figure out the workflows for us? could you give an example maybe
Sure! As I wrote in the final paragraph, this article turned out to be a thinly veiled excuse for me to shill for Dagster's declarative approach. Instead of chaining tasks into DAGs, I've started thinking about my pipeline in terms of the outputs it produces. This would be the "maintaining data assets" instead of chaining tasks.
The example for the system figuring out the workflow is this great writeup by Sandy, who explains it better than I ever could: https://dagster.io/blog/declarative-scheduling
Thanks for explaining