Aug 2, 2022Liked by Stephen Bailey

I remember having this feeling a few years ago. What I realized is that airflow has taught us a few bad habits and also brought ahead an interesting paradigm of the vertical workflow engine.

I agree airflow is old, legacy and ideally folks should not use it, reality is there is a lot of pipelines already built with it - sadly. I think as a community we have to start moving away from it for more complicated problems.

Disclaimer: I created Flyte.org and heavily believe in decentralized development of DAGs and centralized management of infrastructure

Expand full comment
Aug 10, 2022·edited Aug 10, 2022Liked by Stephen Bailey

context : I wrote https://towardsdatascience.com/apache-airflow-in-2022-10-rules-to-make-it-work-b5ed130a51ad

yes airflow is NOT an ETL tool, but a scheduling tool

yes airflow 1 was buggy and super slow

yes airflow 2.3 is still not 100% stable


we should never confuse the airflow-operators and airflow itself

so many OSS operators are shitty and running transformations directly in airflow itself ( if not using the KubernetesExecutor or KubernetesCeleryExecutor )

Expand full comment
Aug 1, 2022·edited Aug 7, 2022Liked by Stephen Bailey

You should probably look into Flyte as well — as a remedy to all the Airflow-esque problems.

Expand full comment

In the end, i still don't understand what features author missed: dynamic dags, metadata management,data quality?

Expand full comment

This post got a lot of attention! I would encourage all readers to check out the conversation on Hacker News, which has a lot of great insights: https://news.ycombinator.com/item?id=32317558

Expand full comment

First 3 of mentioned problems can be solved with official airflow helm chart https://airflow.apache.org/docs/helm-chart/. The 4th one (The control plane can ingest metadata from across workspaces via a separate service) I did not understand tbh, but there is an API to change connections / variables, etc. Yes, there is not enough developer tools, but if the rest of the system was designed to have airflow as a scheduler it should not be a problem to do CI/CD, for example https://medium.com/@FunCorp/practical-guide-to-create-a-two-layered-recommendation-system-5486b42f9f63 (disclaimer: I'm the author of the article)

Expand full comment