Blueprints

Clone a Git repository with PySpark code and run a Spark job using the Spark Submit CLI

Source

yaml
id: git-spark
namespace: company.team

tasks:
  - id: working_directory
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: clone_repository
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/scripts
        branch: main

      - id: spark_job
        type: io.kestra.plugin.spark.SparkCLI
        commands:
          - spark-submit --name Pi --master spark://localhost:7077
            etl/spark_pi.py

About this blueprint

CLI Data Git

This flow clones a git repository and runs a Spark job. Make sure to expose the port 7077 on your Spark master in order for this flow to work.

Working Directory

Clone

Spark CLI

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra