Manage Python Dependencies
Manage your Python dependencies inside of Kestra.
Managing Python Dependencies can be frustrating. There's 3 ways you can manage your dependencies in Kestra.
Install with pip using beforeCommands
Before your Script
and Commands
tasks, you can add a list of commands under the beforeCommands
property. This works well for installing packages with pip
or setting up a virtual environment:
id: beforecommands
namespace: company.team
tasks:
- id: code
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.core.runner.Process
beforeCommands:
- python3 -m venv .venv
- . .venv/bin/activate
- pip install pandas kestra
script: |
import pandas as pd
from kestra import Kestra
df = pd.read_csv('https://huggingface.co/datasets/kestra/datasets/raw/main/csv/orders.csv')
total_revenue = df['total'].sum()
Kestra.outputs({"total": total_revenue})
By using a Process Task Runner, we can speed up the execution time so that our task isn't pulling a container image to run the task inside of a container.
Set Container Image with Docker Task Runner
If we would prefer to run our task inside of a container, we can set our Task Runner to Docker and specify a container image with the appropriate dependencies bundled in. Our previous example used pandas
which is bundled into the ghcr.io/kestra-io/pydata:latest
available as one of the ready to go images on our GitHub.
id: container_image
namespace: company.team
tasks:
- id: code
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: ghcr.io/kestra-io/pydata:latest
script: |
import pandas as pd
from kestra import Kestra
df = pd.read_csv('https://huggingface.co/datasets/kestra/datasets/raw/main/csv/orders.csv')
total_revenue = df['total'].sum()
Kestra.outputs({"total": total_revenue})
Build Docker Image and set it with Docker Task Runner
If we can't find an image with the dependencies we need readily available, we can build our own using the docker.Build
task.
We can specify a Dockerfile that uses a python:3.10
image as the base, and then install our specific dependencies on top of that.
In the example below, we are using pip install
to install both kestra
and pandas
. Once our image has been built, we can reference it in an expression in our Python task:
id: container_image_build
namespace: company.team
tasks:
- id: build
type: io.kestra.plugin.docker.Build
dockerfile: |
FROM python:3.10
RUN pip install --upgrade pip
RUN pip install --no-cache-dir kestra pandas
tags:
- python_image
- id: code
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
pullPolicy: NEVER
containerImage: "{{ outputs.build.imageId }}"
script: |
import pandas as pd
from kestra import Kestra
df = pd.read_csv('https://huggingface.co/datasets/kestra/datasets/raw/main/csv/orders.csv')
total_revenue = df['total'].sum()
Kestra.outputs({"total": total_revenue})
Build Custom Packages
You can also build packages directly inside of Kestra and then use that package between different flows in the same namespace. This works for zip files and wheels.
Here's an example that generates a .tar.gz
package:
id: build_tar_gz
namespace: company
tasks:
- id: sync_code_to_kestra
type: io.kestra.plugin.git.SyncNamespaceFiles
disabled: true # already synced files
namespace: "{{ flow.namespace }}"
gitDirectory: .
url: https://github.com/anna-geller/python-in-kestra
branch: main
username: anna-geller
password: "{{ kv('GITHUB_ACCESS_TOKEN') }}"
- id: build
type: io.kestra.plugin.scripts.python.Commands
namespaceFiles:
enabled: true
beforeCommands:
- pip install build
commands:
- python -m build
outputFiles:
- "**/*.tar.gz"
- id: upload
type: io.kestra.plugin.core.namespace.UploadFiles
namespace: company.sales
filesMap:
"etl-0.1.0.tar.gz": "{{ outputs.build.outputFiles['dist/etl-0.1.0.tar.gz']}}"
The package can be used in a separate workflow:
id: install_from_zip
namespace: company.sales
inputs:
- id: date
type: STRING
defaults: 12/24/2024
displayName: Delivery Date
tasks:
- id: python
type: io.kestra.plugin.scripts.python.Script
namespaceFiles:
enabled: true
beforeCommands:
- pip install etl-0.1.0.tar.gz
script: |
import etl.utils as etl
out = etl.standardize_date_format("{{ inputs.date }}")
print(out)
Was this page helpful?