Apache Airflow packaged by Data Science Dojo

Data Science Dojo

Apache Airflow packaged by Data Science Dojo

Data Science Dojo

Monitor and manage the data pipelines and complex workflows using Apache Airflow.

Data Science Dojo delivers data science education, consulting, and technical services to harvest the power of data.

Trademarks: This software listing is packaged by Data Science Dojo. The respective trademarks mentioned in the offering are owned by the respective companies, and use of them does not imply any affiliation or endorsement.

About the offer:

Apache Airflow is a powerful open-source platform for authoring, scheduling, and monitoring data and computing workflows. It is a highly scalable distributed system and can be connected to various sources making it flexible. These features allow it to be used efficiently in the orchestration of complex workflow and data pipelining problems. Users can represent their workflows in the form of graphs where the nodes of DAG(Directed Acyclic Graphs) represent the task. Airflow pipelines are defined in Python, allowing for dynamic pipeline generation. Many Organizations use Apache Airflow for their workflow monitoring.

Who benefits from this offer:

Following can benefit from Apache Airflow:

  • Teams of Developers
  • Data Scientists
  • Data Engineers
  • Machine Learning Engineers
  • DevOps
  • And anyone else interested in data science tools

What is included in this offer:
  • Python 3.8: Use Python programming language for dynamic pipeline generation
  • Rich User Interface: Helps in monitoring, managing, and visualizing the workflows
  • Executor: CeleryExecutor
  • Database: PostgreSQL
  • Message Broker: Redis
  • Robust Integrations
  • Code console: To view the DAGs code
  • Visualization: Tree, Graph, and Gantt chart visualization for DAGs
  • Security and Accessibility: Role-based actions
Technical Specifications:

Apache Airflow throughput improves with more CPU cores, more RAM, and faster disks

  • Minimum memory: 8GB RAM
  • Minimum vCPU: 2 vCPUs
  • Operating System: Ubuntu 20.04

Apache Airflow supports the following Databases and Executors:

  • SQLite
  • MySQL
  • PostgreSQL
  • Sequential Executor
  • Local Executor
  • Celery Executor
  • Kubernetes Executor

The default port Apache Airflow listens to is 8080. You can access the web interface at http://yourip:8080 using the credentials:

  • username:airflow
  • password:airflow

You can view the current configuration of Airflow from the tab 'Admin > Configuraions' in the top menubar. (Note: this can be disabled by your Airflow admin for security reasons)