Extract data from an API and load it to Postgres using Python, Git and Docker (passing custom environment variables to the container)

Source

yaml

id: postgres-s3-python-git
namespace: company.team

tasks:
  - id: wdir
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: clone_repository
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/scripts
        branch: main

      - id: get_users
        type: io.kestra.plugin.scripts.python.Commands
        taskRunner:
          type: io.kestra.plugin.scripts.runner.docker.Docker
        containerImage: ghcr.io/kestra-io/pydata:latest
        warningOnStdErr: false
        commands:
          - python etl/get_users_from_api.py

      - id: save_users_pg
        type: io.kestra.plugin.scripts.python.Commands
        beforeCommands:
          - pip install pandas psycopg2 sqlalchemy > /dev/null
        warningOnStdErr: false
        commands:
          - python etl/save_users_pg.py
        env:
          DB_USERNAME: postgres
          DB_PASSWORD: "{{ secret('DB_PASSWORD') }}"
          DB_HOST: host.docker.internal
          DB_PORT: "5432"

About this blueprint

pip Postgres Python Data Git

This flow clones a Git repository with two Python scripts:

The first one extracts data from an API and stores the results into a file users.json
The second Python script reads that extracted raw data file and loads it to Postgres using Python and Pandas. The benefits of this approach:

your orchestration logic (YAML) is decoupled from your business logic (here: Python) -- different tasks can be maintained by different team members without stepping on each other's toes
you can update your business logic without having to touch any of your orchestration code (unless you want to, e.g. to modify the Docker image or pip package versions)
each task can run in a separate Docker container, making dependency management easier; for instance the second task needs additional libraries for Postgres and Pandas, and those can be installed at runtime using the beforeCommands property.

Working Directory

Clone

Commands

Docker

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra