IonToAvro
type: "io.kestra.plugin.serdes.avro.IonToAvro"
Read a provided file containing ion serialized data and convert it to avro.
Examples
Convert a CSV file to the Avro format.
id: divvy_tripdata
namespace: company.team
variables:
file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks:
- id: get_zipfile
type: io.kestra.plugin.core.http.Download
uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
- id: unzip
type: io.kestra.plugin.compress.ArchiveDecompress
algorithm: ZIP
from: "{{ outputs.get_zipfile.uri }}"
- id: convert
type: io.kestra.plugin.serdes.csv.CsvToIon
from: "{{ outputs.unzip.files[render(vars.file_id) ~ '-divvy-tripdata.csv'] }}"
- id: to_avro
type: io.kestra.plugin.serdes.avro.IonToAvro
from: "{{ outputs.convert.uri }}"
datetimeFormat: "yyyy-MM-dd' 'HH:mm:ss"
schema: |
{
"type": "record",
"name": "Ride",
"namespace": "com.example.bikeshare",
"fields": [
{"name": "ride_id", "type": "string"},
{"name": "rideable_type", "type": "string"},
{"name": "started_at", "type": {"type": "long", "logicalType": "timestamp-millis"}},
{"name": "ended_at", "type": {"type": "long", "logicalType": "timestamp-millis"}},
{"name": "start_station_name", "type": "string"},
{"name": "start_station_id", "type": "string"},
{"name": "end_station_name", "type": "string"},
{"name": "end_station_id", "type": "string"},
{"name": "start_lat", "type": "double"},
{"name": "start_lng", "type": "double"},
{
"name": "end_lat",
"type": ["null", "double"],
"default": null
},
{
"name": "end_lng",
"type": ["null", "double"],
"default": null
},
{"name": "member_casual", "type": "string"}
]
}
Properties
from
- Type: string
- Dynamic: ✔️
- Required: ✔️
Source file URI
schema
- Type: string
- Dynamic: ✔️
- Required: ✔️
The avro schema associated to the data
dateFormat
- Type: string
- Dynamic: ✔️
- Required: ❌
- Default:
yyyy-MM-dd[XXX]
Format to use when parsing date
datetimeFormat
- Type: string
- Dynamic: ✔️
- Required: ❌
- Default:
yyyy-MM-dd'T'HH:mm[:ss][.SSSSSS][XXX]
Format to use when parsing datetime
Default value is yyyy-MM-dd'T'HH:mm[
][.SSSSSS]XXX
decimalSeparator
- Type: string
- Dynamic: ✔️
- Required: ❌
- Default:
.
Character to recognize as decimal point (e.g. use ‘,’ for European data).
Default value is '.'
falseValues
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
- Default:
[ "f", "false", "disabled", "0", "off", "no", "" ]
Values to consider as False
inferAllFields
- Type: boolean
- Dynamic: ❌
- Required: ❌
- Default:
false
Try to infer all fields
If true, we try to infer all fields with
trueValues
,trueValues
&nullValues
.If false, we will infer bool & null only on field declared on schema asnull
andbool
.
nullValues
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
- Default:
[ "", "#N/A", "#N/A N/A", "#NA", "-1.#IND", "-1.#QNAN", "-NaN", "1.#IND", "1.#QNAN", "NA", "n/a", "nan", "null" ]
Values to consider as null
strictSchema
- Type: boolean
- Dynamic: ❌
- Required: ❌
- Default:
false
Whether to consider a field present in the data but not declared in the schema as an error
Default value is false
timeFormat
- Type: string
- Dynamic: ✔️
- Required: ❌
- Default:
HH:mm[:ss][.SSSSSS][XXX]
Format to use when parsing time
timeZoneId
- Type: string
- Dynamic: ❌
- Required: ❌
- Default:
Etc/UTC
Timezone to use when no timezone can be parsed on the source.
If null, the timezone will be
UTC
Default value is system timezone
trueValues
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
- Default:
[ "t", "true", "enabled", "1", "on", "yes" ]
Values to consider as True
Outputs
uri
- Type: string
- Required: ❌
- Format:
uri