Source
yaml
id: r-script
namespace: company.team
tasks:
- id: r_script
type: io.kestra.plugin.scripts.r.Script
warningOnStdErr: false
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: ghcr.io/kestra-io/rdata:latest
outputFiles:
- women.parquet
- women.csv
script: |
library(dplyr)
library(arrow)
data(women)
women <- women %>%
mutate(height_cm = height * 2.54,
weight_kg = weight * 0.453592)
print(head(women, 2))
women_clean <- na.omit(women)
df <- women_clean %>%
summarise(mean_height_cm = mean(height_cm),
median_height_cm = median(height_cm),
mean_weight_kg = mean(weight_kg),
median_weight_kg = median(weight_kg))
print(df)
write_parquet(df, "women.parquet")
write_csv_arrow(df, "women.csv")
About this blueprint
R Software Engineering
This flow runs R script in a working directory. It loads data, analyzes it using the dplyr
package. Finally, it stores the result as both CSV and Parquet files, which both can be downloaded from the Execution Outputs tab.
The R script is executed in a Docker container, providing isolated environment for the task and avoiding any dependency conflicts. All dependencies for the task are baked into a publicly available Docker image, maintained by Kestra: ghcr.io/kestra-io/rdata:latest
. You can replace that image with your own, or install custom dependencies at runtime using the beforeCommands
property, for example:
beforeCommands:
- Rscript -e "install.packages(c('httr', 'RSQLite'))" > /dev/null 2>&1