Schedule a CloudQuery data ingestion sync with kestra

Source

yaml

id: cloudquery-sync
namespace: company.team

tasks:
  - id: hn_to_duckdb
    type: io.kestra.plugin.cloudquery.Sync
    env:
      CLOUDQUERY_API_KEY: "{{ secret('CLOUDQUERY_API_KEY') }}"
    incremental: false
    configs:
      - kind: source
        spec:
          name: hackernews
          path: cloudquery/hackernews
          version: v3.0.13
          tables:
            - "*"
          destinations:
            - duckdb
          spec:
            item_concurrency: 100
            start_time: "{{ trigger.date ?? execution.startDate | dateAdd(-1, 'DAYS') }}"
      - kind: destination
        spec:
          name: duckdb
          path: cloudquery/duckdb
          version: v4.2.10
          write_mode: overwrite-delete-stale
          spec:
            connection_string: hn.db

triggers:
  - id: schedule
    type: io.kestra.plugin.core.trigger.Schedule
    cron: "@daily"
    timezone: US/Eastern

About this blueprint

CLI Ingest

This flow will start a batch job to sync data from HackerNews to DuckDB with CloudQuery. The sync process relies on CloudQuery source and destination plugins. The source plugin will extract data from HackerNews and the destination plugin will load it to DuckDB. This flow uses Kestra templating to dynamically configure the start time of the sync process to be 1 day before the scheduled date, allowing you to also backfill data for past intervals in small batches. Alternatively, you can toggle the incremental flag to true to enable incremental sync. During incremental syncs, Kestra will store the CloudQuery cursor in the Kestra backend, and will use it to always start the sync process from the last cursor position. To get started configuring CloudQuery sources and destinations, go to CloudQuery Integrations page. From here, you can select your desired source and destination, and you'll see the YAML configuration that you can copy and paste into your own file, or into Kestra flow. This page will also provide more detailed instructions and references about tables available in each plugin. Note that you can generate an API key to use premium plugins. You can add the API key as an environment variable:

yaml

  - id: hn_to_duckdb
    type: io.kestra.plugin.cloudquery.Sync
    env:
      CLOUDQUERY_API_KEY: "{{ secret('CLOUDQUERY_API_KEY') }}"

Sync

Schedule

More Related Blueprints

CLI Ingest

Extract data from HackerNews and store it in a local CSV file using CloudQuery CLI

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra