When your organization grows to 20+ data science teams and 500+ ML engineers, vanilla Airflow quickly becomes a bottleneck rather than a solution. In this talk, I’ll share how we transformed Airflow into a scalable, secure, and user-friendly MLOps platform — reducing time to market and making data scientists actually enjoy working with orchestration.
I’ll cover why we chose Airflow as the foundation, and how we designed an architecture that allows launching pipelines across multiple Kubernetes clusters - all from a single UI (!) - using KubernetesPodOperator and per-cluster worker deployments. You’ll hear how we built a custom Vault integration to keep secrets out of Connections and Variables, enabled real-time logging with persistence to S3 before a task ends, and created a SparkSubmitOperator capable of running jobs on any Spark or Hadoop cluster inside K8s with Kerberos authentication. On top of that, we designed a streamlined developer experience: our users can generate a ready-to-use GitLab repository from a template and deploy a versioned, tag-based pipeline into production — all in under 5 minutes.
Whether you’re scaling Airflow or building an MLOps platform for a growing DS community, this talk offers practical takeaways, lessons learned, and architecture patterns you can apply right away.