1 year ago

#361689

test-img

M4rk

From Postgres CDC to BigQuery with materialized views

I'm designing a Change Data Capture from our Postgres to BigQuery.

I came across something that seems a powerful combination of:

My intuition tells me that by combining these two features, I should be able to have always up to date, incremental synced tables, with the least amount of engineering.

However, the lack of documentation (or suggesting different approaches) from the Google documentation, makes me skeptical that the above approach would work: e.g. from google docs:

So yes, the idea would be having a materialized view such as

given a table composed by the columns (id, version, json_valeus, soft_delete)

CREATE Materialized view SELECT * FROM myTable WHERE concat(id, version) in (SELECT concat(id, max(version)) FROM myTable GROUP BY id)

Would this be extremely inefficient/not work?

System size:

  • about 1000 tables,
  • with 1M to 500M rows each.
  • 10-50M updates/day total across all tables

Expected cost:

  • 5-15k/month

postgresql

google-bigquery

gcloud

debezium

cdc

0 Answers

Your Answer

Accepted video resources