Internship work for Apache Airflow - My third blog for Outreachy

Internship work for Apache Airflow - My third blog for Outreachy

Hello everyone! This is my third blog during the Outreachy Internship. In this, I will give a breakdown of the details about the current project that I am working on for Apache Airflow. I am working on "Converting Airflow Local Development environment (Breeze) from Bash-based to Python-based".

What is Apache Airflow?

Apache Airflow is a platform to programmatically author, schedule, and monitor workflows. You can write workflows, schedule them, and loop them to generate tasks in Pure Python. Anyone with Python knowledge can easily write their own workflows for any number of work like managing infrastructure, building ML models, transferring data. Airflow also provides easy plug-and-play integration with a wide number of operators like Google Cloud Platform, AWS, Microsoft Azure, and many other third-party services. It also has a useful UI where you can check the progress of your workflow, check the status and ongoing tasks too.

What is Breeze?

Breeze is an easy way to start setting up the development and testing environment for Apache Airflow. Apache Airflow has options to run in different python versions and with different backend storage like MySQL, Postgres, SQLite to hold all the data. With different combinations available, setting up the development environment to make contributions to the project is not easy. It involves pulling the different packages to support the backend, as well as pulling different packages for multiple providers supported in Airflow. When I am working on this project, the total number of providers is 75. Apache Airflow also has multiple processes running for triggerer, scheduler, web server, and front end. For a beginner setting up all this and starting contributing to this project would be a difficult process. To make it a breeze, developers created Breeze app that runs in shell and runs all the steps to bring up the airflow environment in docker containers under the hood with a few simple commands. Which otherwise would have taken multiple days to set up, Breeze made all this process simple with easy-to-follow instructions.

Problem statement in current Breeze:

Apache Airflow codebase in Python while Breeze is in bash script. This makes debugging and fixing issues difficult. Converting to Python also will make Breeze platform-independent.

What's my current work?

I initially started by exploring the current Breeze app and the multiple command-line options it provides to start the local development environment. Many of these options set the flag and start the docker build and execute the command inside the docker container( making it robust across all the platforms). My project is broken into tiny modules where I will help to migrate each unique command from bash-based to Python. Currently, I am working on a module that helps to run all the static checks in the local environment before pushing the code into the branch (There are over 80 static checks supported in Apache Airflow). By converting it into Python, we can enable the autocompletion and this will make it easy for users to check the multiple options supported with a tab click. Few other modules that I have worked on include building-image for airflow CI, ASCII art and cheatsheet toggling, updating the cache with the command line( they are not supported in bash, but added as a new feature in Python)

Halfway through my internship, slowly I could get the hang of working on an open-source project. One interesting thing in this migration project is I could learn about how the bash script is written and work on Python script too. I could appreciate the power of Docker too and see how much it can make our development work easier. Another interesting learning is the use of other open-source libraries like rich, Python click and how many projects in Open-source interdepend on each other and together create a rich ecosystem.

Open-source journey:

Though the internship is intended to happen for 3 months, I would like to volunteer in contributing to the project I am working on. By contributing to one project, we could understand and appreciate how this ecosystem works and encourage us to contribute more. Outreachy is the first milestone in my journey into Open-source. I am looking forward to many more! Thanks to my mentor Jarek Potiuk , Elad , Nasser Kaze .