Orchestrate CloudQuery data ingestion syncs to integrate data about your AWS resources into Postgres and and analyze it in SQL
Source
id: cloudquery-aws
namespace: company.team
tasks:
- id: cloudquery
type: io.kestra.plugin.cloudquery.CloudQueryCLI
inputFiles:
config.yml: |
kind: source
spec:
name: aws
path: cloudquery/aws
version: "v22.4.0"
tables: ["aws_s3*"]
destinations: ["postgresql"]
spec:
---
kind: destination
spec:
name: "postgresql"
version: "v5.0.3"
path: "cloudquery/postgresql"
write_mode: "overwrite-delete-stale"
spec:
connection_string: ${PG_CONNECTION_STRING}
commands:
- cloudquery sync config.yml --log-console
env:
AWS_ACCESS_KEY_ID: "{{ secret('AWS_ACCESS_KEY_ID') }}"
AWS_SECRET_ACCESS_KEY: "{{ secret('AWS_SECRET_ACCESS_KEY') }}"
AWS_DEFAULT_REGION: "{{ secret('AWS_DEFAULT_REGION') }}"
CLOUDQUERY_API_KEY: "{{ secret('CLOUDQUERY_API_KEY') }}"
PG_CONNECTION_STRING: postgresql://postgres:{{ secret('DB_PASSWORD')
}}@host.docker.internal:5432/postgres?sslmode=disable
- id: query_s3_metadata_in_postgres
type: io.kestra.plugin.jdbc.postgresql.Query
url: jdbc:postgresql://host.docker.internal:5432/postgres
username: postgres
password: "{{ secret('DB_PASSWORD') }}"
fetchType: FETCH
sql: |
SELECT arn, region
FROM public.aws_s3_buckets
WHERE block_public_acls IS NOT TRUE
OR block_public_policy IS NOT TRUE
OR ignore_public_acls IS NOT TRUE
OR restrict_public_buckets IS NOT TRUE;
About this blueprint
S3 Ingest AWS SQL
CloudQuery is an open-source data integration platform built for developers. CloudQuery integrates seamlessly with Kestra.
The flow below syncs metadata about various AWS services and loads it to a Postgres database.
You can use the following command to start a Postgres instance using a Docker container:
docker run -d --name mypostgres -v mypostgresdb:/var/lib/postgresql/data -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=yourPassword1234 -e POSTGRES_DB=postgres postgres:latest
You can securely manage the Postgres password and AWS credentials in Kestra by using Secrets.
After you execute the sync, we can start querying information about AWS resources in Postgres, e.g. to find all S3 buckets that are permitted to be public.
Note that instead of only querying tables about S3 (on line 17), you can add many more tables such as: tables: ["aws_s3*", "aws_ec2*", "aws_ecs*", "aws_iam*", "aws_glue*", "aws_dynamodb*"]
. Check the Cloud Query documentation for a full list of supported tables.
More Related Blueprints