LoadFromGcs

yaml
type: "io.kestra.plugin.gcp.bigquery.LoadFromGcs"

Load data from GCS (Google Cloud Storage) to BigQuery

Examples

Load an avro file from a gcs bucket

yaml
id: gcp_bq_load_from_gcs
namespace: company.team

tasks:
  - id: http_download
    type: io.kestra.plugin.core.http.Download
    uri: https://huggingface.co/datasets/kestra/datasets/raw/main/csv/orders.csv

  - id: csv_to_ion
    type: io.kestra.plugin.serdes.csv.CsvToIon
    from: "{{ outputs.http_download.uri }}"
    header: true

  - id: ion_to_avro
    type: io.kestra.plugin.serdes.avro.IonToAvro
    from: "{{ outputs.csv_to_ion.uri }}"
    schema: |
      {
        "type": "record",
        "name": "Order",
        "namespace": "com.example.order",
        "fields": [
          {"name": "order_id", "type": "int"},
          {"name": "customer_name", "type": "string"},
          {"name": "customer_email", "type": "string"},
          {"name": "product_id", "type": "int"},
          {"name": "price", "type": "double"},
          {"name": "quantity", "type": "int"},
          {"name": "total", "type": "double"}
        ]
      }

  - id: load_from_gcs
    type: io.kestra.plugin.gcp.bigquery.LoadFromGcs
    from:
      - "{{ outputs.ion_to_avro.uri }}"
    destinationTable: "my_project.my_dataset.my_table"
    format: AVRO
    avroOptions:
      useAvroLogicalTypes: true

Load a csv file with a defined schema

yaml
id: gcp_bq_load_files_test
namespace: company.team

tasks:
  - id: load_files_test
    type: io.kestra.plugin.gcp.bigquery.LoadFromGcs
    destinationTable: "myDataset.myTable"
    ignoreUnknownValues: true
    schema:
      fields:
        - name: colA
          type: STRING
        - name: colB
          type: NUMERIC
        - name: colC
          type: STRING
    format: CSV
    csvOptions:
      allowJaggedRows: true
      encoding: UTF-8
      fieldDelimiter: ","
    from:
      - gs://myBucket/myFile.csv

Properties

`autodetect`

Type: boolean
Dynamic: ❌
Required: ❌

Experimental Automatic inference of the options and schema for CSV and JSON sources.

`avroOptions`

Type: AbstractLoad-AvroOptions
Dynamic: ❌
Required: ❌

Avro parsing options.

`clusteringFields`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌

The clustering specification for the destination table.

`createDisposition`

Type: string
Dynamic: ❌
Required: ❌
Possible Values:
- CREATE_IF_NEEDED
- CREATE_NEVER

Whether the job is allowed to create tables.

`csvOptions`

Type: AbstractLoad-CsvOptions
Dynamic: ❌
Required: ❌

Csv parsing options.

`destinationTable`

Type: string
Dynamic: ✔️
Required: ❌

The table where to put query results.

If not provided, a new table is created.

`format`

Type: string
Dynamic: ❌
Required: ❌
Possible Values:
- CSV
- JSON
- AVRO
- PARQUET
- ORC

The source format, and possibly some parsing options, of the external data.

`from`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌

Google Cloud Storage source data

The fully-qualified URIs that point to source data in Google Cloud Storage (e.g. gs://bucket/path). Each URI can contain one '*' wildcard character and it must come after the 'bucket' name.

`ignoreUnknownValues`

Type: boolean
Dynamic: ❌
Required: ❌

Whether BigQuery should allow extra values that are not represented in the table schema.

If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. By default unknown values are not allowed.

`impersonatedServiceAccount`

Type: string
Dynamic: ✔️
Required: ❌

The GCP service account to impersonate.

`location`

Type: string
Dynamic: ✔️
Required: ❌

The geographic location where the dataset should reside.

This property is experimental and might be subject to change or removed.
See Dataset Location

`maxBadRecords`

Type: integer
Dynamic: ❌
Required: ❌

The maximum number of bad records that BigQuery can ignore when running the job.

If the number of bad records exceeds this value, an invalid error is returned in the job result. By default, no bad record is ignored.

`projectId`

Type: string
Dynamic: ✔️
Required: ❌

The GCP project ID.

`retryAuto`

Type:
Dynamic: ❌
Required: ❌

`retryMessages`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌
Default: [ "due to concurrent update", "Retrying the job may solve the problem" ]

The messages which would trigger an automatic retry.

Message is tested as a substring of the full message, and is case insensitive.

`retryReasons`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌
Default: [ "rateLimitExceeded", "jobBackendError", "internalError", "jobInternalError" ]

The reasons which would trigger an automatic retry.

`schema`

Type: object
Dynamic: ❌
Required: ❌

The schema for the destination table.

The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup (i.e. DATASTORE_BACKUP format option).

`schemaUpdateOptions`

Type: array
SubType: string
Dynamic: ❌
Required: ❌

Experimental Options allowing the schema of the destination table to be updated as a side effect of the query job.

Schema update options are supported in two cases: when writeDisposition is WRITE_APPEND; when writeDisposition is WRITE_TRUNCATE and the destination table is a partition of a table, specified by partition decorators. For normal tables, WRITE_TRUNCATE will always overwrite the schema.

`scopes`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌
Default: [ "https://www.googleapis.com/auth/cloud-platform" ]

The GCP scopes to be used.

`serviceAccount`

Type: string
Dynamic: ✔️
Required: ❌

The GCP service account.

`timePartitioningField`

Type: string
Dynamic: ✔️
Required: ❌

The time partitioning field for the destination table.

`timePartitioningType`

Type: string
Dynamic: ✔️
Required: ❌
Default: DAY
Possible Values:
- DAY
- HOUR
- MONTH
- YEAR

The time partitioning type specification for the destination table.

`writeDisposition`

Type: string
Dynamic: ❌
Required: ❌
Possible Values:
- WRITE_TRUNCATE
- WRITE_APPEND
- WRITE_EMPTY

The action that should occur if the destination table already exists.

Outputs

`destinationTable`

Type: string
Required: ❌

`jobId`

Type: string
Required: ❌

`rows`

Type: integer
Required: ❌

Definitions

`io.kestra.core.models.tasks.retrys.Constant`

interval
- Type: string
- Dynamic: ❌
- Required: ✔️
- Format: duration
type
- Type: string
- Dynamic: ❌
- Required: ✔️
- Default: constant
behavior
- Type: string
- Dynamic: ❌
- Required: ❌
- Default: RETRY_FAILED_TASK
- Possible Values:
  - RETRY_FAILED_TASK
  - CREATE_NEW_EXECUTION
maxAttempt
- Type: integer
- Dynamic: ❌
- Required: ❌
- Minimum: ›= 1
maxDuration
- Type: string
- Dynamic: ❌
- Required: ❌
- Format: duration
warningOnRetry
- Type: boolean
- Dynamic: ❌
- Required: ❌
- Default: false

`io.kestra.core.models.tasks.retrys.Random`

maxInterval
- Type: string
- Dynamic: ❌
- Required: ✔️
- Format: duration
minInterval
- Type: string
- Dynamic: ❌
- Required: ✔️
- Format: duration
type
- Type: string
- Dynamic: ❌
- Required: ✔️
- Default: random
behavior
- Type: string
- Dynamic: ❌
- Required: ❌
- Default: RETRY_FAILED_TASK
- Possible Values:
  - RETRY_FAILED_TASK
  - CREATE_NEW_EXECUTION
maxAttempt
- Type: integer
- Dynamic: ❌
- Required: ❌
- Minimum: ›= 1
maxDuration
- Type: string
- Dynamic: ❌
- Required: ❌
- Format: duration
warningOnRetry
- Type: boolean
- Dynamic: ❌
- Required: ❌
- Default: false

`io.kestra.plugin.gcp.bigquery.AbstractLoad-CsvOptions`

allowJaggedRows
- Type: boolean
- Dynamic: ❌
- Required: ❌
allowQuotedNewLines
- Type: boolean
- Dynamic: ✔️
- Required: ❌
encoding
- Type: string
- Dynamic: ✔️
- Required: ❌
fieldDelimiter
- Type: string
- Dynamic: ✔️
- Required: ❌
quote
- Type: string
- Dynamic: ✔️
- Required: ❌
skipLeadingRows
- Type: integer
- Dynamic: ❌
- Required: ❌

`io.kestra.core.models.tasks.retrys.Exponential`

interval
- Type: string
- Dynamic: ❌
- Required: ✔️
- Format: duration
maxInterval
- Type: string
- Dynamic: ❌
- Required: ✔️
- Format: duration
type
- Type: string
- Dynamic: ❌
- Required: ✔️
- Default: exponential
behavior
- Type: string
- Dynamic: ❌
- Required: ❌
- Default: RETRY_FAILED_TASK
- Possible Values:
  - RETRY_FAILED_TASK
  - CREATE_NEW_EXECUTION
delayFactor
- Type: number
- Dynamic: ❌
- Required: ❌
maxAttempt
- Type: integer
- Dynamic: ❌
- Required: ❌
- Minimum: ›= 1
maxDuration
- Type: string
- Dynamic: ❌
- Required: ❌
- Format: duration
warningOnRetry
- Type: boolean
- Dynamic: ❌
- Required: ❌
- Default: false

`io.kestra.plugin.gcp.bigquery.AbstractLoad-AvroOptions`

useAvroLogicalTypes
- Type: boolean
- Dynamic: ❌
- Required: ❌

​Load​From​Gcs

interval

type

behavior

maxAttempt

maxDuration

warningOnRetry

maxInterval

minInterval

type

behavior

maxAttempt

maxDuration

warningOnRetry

allowJaggedRows

allowQuotedNewLines

encoding

fieldDelimiter

quote

skipLeadingRows

interval

maxInterval

type

behavior

delayFactor

maxAttempt

maxDuration

warningOnRetry

useAvroLogicalTypes

LoadFromGcs

`interval`

`type`

`behavior`

`maxAttempt`

`maxDuration`

`warningOnRetry`

`maxInterval`

`minInterval`

`type`

`behavior`

`maxAttempt`

`maxDuration`

`warningOnRetry`

`allowJaggedRows`

`allowQuotedNewLines`

`encoding`

`fieldDelimiter`

`quote`

`skipLeadingRows`

`interval`

`maxInterval`

`type`

`behavior`

`delayFactor`

`maxAttempt`

`maxDuration`

`warningOnRetry`

`useAvroLogicalTypes`