Orchestrate CloudQuery data ingestion syncs to integrate data about your AWS resources into Postgres and and analyze it in SQL

Source

yaml

id: cloudquery-aws
namespace: company.team

tasks:
  - id: cloudquery
    type: io.kestra.plugin.cloudquery.CloudQueryCLI
    inputFiles:
      config.yml: |
        kind: source
        spec:
          name: aws
          path: cloudquery/aws
          version: "v22.4.0"
          tables: ["aws_s3*"]
          destinations: ["postgresql"]
          spec:
        ---
        kind: destination
        spec:
          name: "postgresql"
          version: "v5.0.3"
          path: "cloudquery/postgresql"
          write_mode: "overwrite-delete-stale"
          spec:
            connection_string: ${PG_CONNECTION_STRING}
    commands:
      - cloudquery sync config.yml --log-console
    env:
      AWS_ACCESS_KEY_ID: "{{ secret('AWS_ACCESS_KEY_ID') }}"
      AWS_SECRET_ACCESS_KEY: "{{ secret('AWS_SECRET_ACCESS_KEY') }}"
      AWS_DEFAULT_REGION: "{{ secret('AWS_DEFAULT_REGION') }}"
      CLOUDQUERY_API_KEY: "{{ secret('CLOUDQUERY_API_KEY') }}"
      PG_CONNECTION_STRING: postgresql://postgres:{{ secret('DB_PASSWORD')
        }}@host.docker.internal:5432/postgres?sslmode=disable

  - id: query_s3_metadata_in_postgres
    type: io.kestra.plugin.jdbc.postgresql.Query
    url: jdbc:postgresql://host.docker.internal:5432/postgres
    username: postgres
    password: "{{ secret('DB_PASSWORD') }}"
    fetchType: FETCH
    sql: |
      SELECT arn, region
      FROM public.aws_s3_buckets
      WHERE block_public_acls IS NOT TRUE
          OR block_public_policy IS NOT TRUE
          OR ignore_public_acls IS NOT TRUE
          OR restrict_public_buckets IS NOT TRUE;

About this blueprint

S3 Ingest AWS SQL

CloudQuery is an open-source data integration platform built for developers. CloudQuery integrates seamlessly with Kestra.

The flow below syncs metadata about various AWS services and loads it to a Postgres database.

You can use the following command to start a Postgres instance using a Docker container:

bash

docker run -d --name mypostgres -v mypostgresdb:/var/lib/postgresql/data -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=yourPassword1234 -e POSTGRES_DB=postgres postgres:latest

You can securely manage the Postgres password and AWS credentials in Kestra by using Secrets.

After you execute the sync, we can start querying information about AWS resources in Postgres, e.g. to find all S3 buckets that are permitted to be public.

Note that instead of only querying tables about S3 (on line 17), you can add many more tables such as: tables: ["aws_s3*", "aws_ec2*", "aws_ecs*", "aws_iam*", "aws_glue*", "aws_dynamodb*"]. Check the Cloud Query documentation for a full list of supported tables.

Cloud Query CLI

Query

More Related Blueprints

S3 Ingest AWS SQL

Query data from Druid and load it to Redshift

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra