Run R script in a Docker container and output downloadable artifacts

Source

yaml

id: r-script
namespace: company.team

tasks:
  - id: r_script
    type: io.kestra.plugin.scripts.r.Script
    warningOnStdErr: false
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
    containerImage: ghcr.io/kestra-io/rdata:latest
    outputFiles:
      - women.parquet
      - women.csv
    script: |
      library(dplyr)
      library(arrow)

      data(women)

      women <- women %>%
        mutate(height_cm = height * 2.54,
              weight_kg = weight * 0.453592)

      print(head(women, 2))

      women_clean <- na.omit(women)
      df <- women_clean %>%
        summarise(mean_height_cm = mean(height_cm), 
                  median_height_cm = median(height_cm), 
                  mean_weight_kg = mean(weight_kg),
                  median_weight_kg = median(weight_kg))
      print(df)
      write_parquet(df, "women.parquet")
      write_csv_arrow(df, "women.csv")

About this blueprint

R Software Engineering

This flow runs R script in a working directory. It loads data, analyzes it using the dplyr package. Finally, it stores the result as both CSV and Parquet files, which both can be downloaded from the Execution Outputs tab. The R script is executed in a Docker container, providing isolated environment for the task and avoiding any dependency conflicts. All dependencies for the task are baked into a publicly available Docker image, maintained by Kestra: ghcr.io/kestra-io/rdata:latest. You can replace that image with your own, or install custom dependencies at runtime using the beforeCommands property, for example:

beforeCommands:

    - Rscript -e "install.packages(c('httr', 'RSQLite'))" > /dev/null 2>&1

Script

Docker

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra