Create Dynamic Workflow in Apache Airflow

Problem

For a long time I search a way to properly create a workflow where the tasks depends on dynamic value based on a list of tables content in a text file.

Context explanation through a graphical example

A schematic overview of the DAG’s structure.

                |---> Task B.1 --|
                |---> Task B.2 --|
 Task A --------|---> Task B.3 --|--------> Task C
                |       ....     |
                |---> Task B.N --|

The problem is to import tables from a db2 IBM database into HDFS / Hive using Sqoop, a powerful tool designed for efficiently transferring bulk data from a relational database to HDFS, automatically through Airflow, an open-source tool for orchestrating complex computational workflows and data processing pipelines.

Inserted data are daily aggregate  using Sparks job, but I’ll only talk about the import part where I schedule the Sqoop job to dynamically import data into HDFS.

Contact Us