LoadFromGcs
type: "io.kestra.plugin.gcp.bigquery.LoadFromGcs"
Load data from GCS (Google Cloud Storage) to BigQuery
Examples
Load an avro file from a gcs bucket
id: gcp_bq_load_from_gcs
namespace: company.team
tasks:
- id: http_download
type: io.kestra.plugin.core.http.Download
uri: https://huggingface.co/datasets/kestra/datasets/raw/main/csv/orders.csv
- id: csv_to_ion
type: io.kestra.plugin.serdes.csv.CsvToIon
from: "{{ outputs.http_download.uri }}"
header: true
- id: ion_to_avro
type: io.kestra.plugin.serdes.avro.IonToAvro
from: "{{ outputs.csv_to_ion.uri }}"
schema: |
{
"type": "record",
"name": "Order",
"namespace": "com.example.order",
"fields": [
{"name": "order_id", "type": "int"},
{"name": "customer_name", "type": "string"},
{"name": "customer_email", "type": "string"},
{"name": "product_id", "type": "int"},
{"name": "price", "type": "double"},
{"name": "quantity", "type": "int"},
{"name": "total", "type": "double"}
]
}
- id: load_from_gcs
type: io.kestra.plugin.gcp.bigquery.LoadFromGcs
from:
- "{{ outputs.ion_to_avro.uri }}"
destinationTable: "my_project.my_dataset.my_table"
format: AVRO
avroOptions:
useAvroLogicalTypes: true
Load a csv file with a defined schema
id: gcp_bq_load_files_test
namespace: company.team
tasks:
- id: load_files_test
type: io.kestra.plugin.gcp.bigquery.LoadFromGcs
destinationTable: "myDataset.myTable"
ignoreUnknownValues: true
schema:
fields:
- name: colA
type: STRING
- name: colB
type: NUMERIC
- name: colC
type: STRING
format: CSV
csvOptions:
allowJaggedRows: true
encoding: UTF-8
fieldDelimiter: ","
from:
- gs://myBucket/myFile.csv
Properties
autodetect
- Type: boolean
- Dynamic: ❌
- Required: ❌
Experimental Automatic inference of the options and schema for CSV and JSON sources.
avroOptions
- Type: AbstractLoad-AvroOptions
- Dynamic: ❌
- Required: ❌
Avro parsing options.
clusteringFields
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
The clustering specification for the destination table.
createDisposition
- Type: string
- Dynamic: ❌
- Required: ❌
- Possible Values:
CREATE_IF_NEEDED
CREATE_NEVER
Whether the job is allowed to create tables.
csvOptions
- Type: AbstractLoad-CsvOptions
- Dynamic: ❌
- Required: ❌
Csv parsing options.
destinationTable
- Type: string
- Dynamic: ✔️
- Required: ❌
The table where to put query results.
If not provided, a new table is created.
format
- Type: string
- Dynamic: ❌
- Required: ❌
- Possible Values:
CSV
JSON
AVRO
PARQUET
ORC
The source format, and possibly some parsing options, of the external data.
from
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
Google Cloud Storage source data
The fully-qualified URIs that point to source data in Google Cloud Storage (e.g. gs://bucket/path). Each URI can contain one '*' wildcard character and it must come after the 'bucket' name.
ignoreUnknownValues
- Type: boolean
- Dynamic: ❌
- Required: ❌
Whether BigQuery should allow extra values that are not represented in the table schema.
If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. By default unknown values are not allowed.
impersonatedServiceAccount
- Type: string
- Dynamic: ✔️
- Required: ❌
The GCP service account to impersonate.
location
- Type: string
- Dynamic: ✔️
- Required: ❌
The geographic location where the dataset should reside.
This property is experimental and might be subject to change or removed.
See Dataset Location
maxBadRecords
- Type: integer
- Dynamic: ❌
- Required: ❌
The maximum number of bad records that BigQuery can ignore when running the job.
If the number of bad records exceeds this value, an invalid error is returned in the job result. By default, no bad record is ignored.
projectId
- Type: string
- Dynamic: ✔️
- Required: ❌
The GCP project ID.
retryAuto
- Type:
- Dynamic: ❌
- Required: ❌
retryMessages
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
- Default:
[ "due to concurrent update", "Retrying the job may solve the problem" ]
The messages which would trigger an automatic retry.
Message is tested as a substring of the full message, and is case insensitive.
retryReasons
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
- Default:
[ "rateLimitExceeded", "jobBackendError", "internalError", "jobInternalError" ]
The reasons which would trigger an automatic retry.
schema
- Type: object
- Dynamic: ❌
- Required: ❌
The schema for the destination table.
The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup (i.e. DATASTORE_BACKUP format option).
schemaUpdateOptions
- Type: array
- SubType: string
- Dynamic: ❌
- Required: ❌
Experimental Options allowing the schema of the destination table to be updated as a side effect of the query job.
Schema update options are supported in two cases: when writeDisposition is WRITE_APPEND; when writeDisposition is WRITE_TRUNCATE and the destination table is a partition of a table, specified by partition decorators. For normal tables, WRITE_TRUNCATE will always overwrite the schema.
scopes
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
- Default:
[ "https://www.googleapis.com/auth/cloud-platform" ]
The GCP scopes to be used.
serviceAccount
- Type: string
- Dynamic: ✔️
- Required: ❌
The GCP service account.
timePartitioningField
- Type: string
- Dynamic: ✔️
- Required: ❌
The time partitioning field for the destination table.
timePartitioningType
- Type: string
- Dynamic: ✔️
- Required: ❌
- Default:
DAY
- Possible Values:
DAY
HOUR
MONTH
YEAR
The time partitioning type specification for the destination table.
writeDisposition
- Type: string
- Dynamic: ❌
- Required: ❌
- Possible Values:
WRITE_TRUNCATE
WRITE_APPEND
WRITE_EMPTY
The action that should occur if the destination table already exists.
Outputs
destinationTable
- Type: string
- Required: ❌
jobId
- Type: string
- Required: ❌
rows
- Type: integer
- Required: ❌
Definitions
io.kestra.core.models.tasks.retrys.Constant
interval
- Type: string
- Dynamic: ❌
- Required: ✔️
- Format:
duration
type
- Type: string
- Dynamic: ❌
- Required: ✔️
- Default:
constant
behavior
- Type: string
- Dynamic: ❌
- Required: ❌
- Default:
RETRY_FAILED_TASK
- Possible Values:
RETRY_FAILED_TASK
CREATE_NEW_EXECUTION
maxAttempt
- Type: integer
- Dynamic: ❌
- Required: ❌
- Minimum:
›= 1
maxDuration
- Type: string
- Dynamic: ❌
- Required: ❌
- Format:
duration
warningOnRetry
- Type: boolean
- Dynamic: ❌
- Required: ❌
- Default:
false
io.kestra.core.models.tasks.retrys.Random
maxInterval
- Type: string
- Dynamic: ❌
- Required: ✔️
- Format:
duration
minInterval
- Type: string
- Dynamic: ❌
- Required: ✔️
- Format:
duration
type
- Type: string
- Dynamic: ❌
- Required: ✔️
- Default:
random
behavior
- Type: string
- Dynamic: ❌
- Required: ❌
- Default:
RETRY_FAILED_TASK
- Possible Values:
RETRY_FAILED_TASK
CREATE_NEW_EXECUTION
maxAttempt
- Type: integer
- Dynamic: ❌
- Required: ❌
- Minimum:
›= 1
maxDuration
- Type: string
- Dynamic: ❌
- Required: ❌
- Format:
duration
warningOnRetry
- Type: boolean
- Dynamic: ❌
- Required: ❌
- Default:
false
io.kestra.plugin.gcp.bigquery.AbstractLoad-CsvOptions
allowJaggedRows
- Type: boolean
- Dynamic: ❌
- Required: ❌
allowQuotedNewLines
- Type: boolean
- Dynamic: ✔️
- Required: ❌
encoding
- Type: string
- Dynamic: ✔️
- Required: ❌
fieldDelimiter
- Type: string
- Dynamic: ✔️
- Required: ❌
quote
- Type: string
- Dynamic: ✔️
- Required: ❌
skipLeadingRows
- Type: integer
- Dynamic: ❌
- Required: ❌
io.kestra.core.models.tasks.retrys.Exponential
interval
- Type: string
- Dynamic: ❌
- Required: ✔️
- Format:
duration
maxInterval
- Type: string
- Dynamic: ❌
- Required: ✔️
- Format:
duration
type
- Type: string
- Dynamic: ❌
- Required: ✔️
- Default:
exponential
behavior
- Type: string
- Dynamic: ❌
- Required: ❌
- Default:
RETRY_FAILED_TASK
- Possible Values:
RETRY_FAILED_TASK
CREATE_NEW_EXECUTION
delayFactor
- Type: number
- Dynamic: ❌
- Required: ❌
maxAttempt
- Type: integer
- Dynamic: ❌
- Required: ❌
- Minimum:
›= 1
maxDuration
- Type: string
- Dynamic: ❌
- Required: ❌
- Format:
duration
warningOnRetry
- Type: boolean
- Dynamic: ❌
- Required: ❌
- Default:
false
io.kestra.plugin.gcp.bigquery.AbstractLoad-AvroOptions
useAvroLogicalTypes
- Type: boolean
- Dynamic: ❌
- Required: ❌