1 year ago

#388842

test-img

DarkLeafyGreen

How to specify input locations parameter in dataflow pipeline?

I have a Dataflow template, that I originally created via Dataprep. Now I want to move away from Dataprep and use just Dataflow and schedule jobs using Cloud Scheduler. In GC console in Dataflow -> Jobs -> Select Job is a feature "Import as pipeline", which I use to create a batch job pipeline.

In the multistep form I cannot get past specifying the input locations:

enter image description here

It wants me to specify locations matching this regex:

[ \t\n\x0B\f\r]*\{[ \t\n\x0B\f\r]*((.|\r|\n)*".*"[ \t\n\x0B\f\r]*:[ \t\n\x0B\f\r]*".*"(.|\r|\n)*){0}[ \t\n\x0B\f\r]*\}[ \t\n\x0B\f\r]*

I tried with:

{
    "location1": "project:bq_dataset.bq_table1",
    "location10": "project:bq_dataset.bq_table10",
    "location17": "project:bq_dataset.bq_table17"
}

https://regex101.com/r/rTUfHH/1

In fact as others pointed out in the comments, it seems that I can only input {}

enter image description here

Any ideas why that is?

More details:

The flow is quite simple:

enter image description here

Data is loaded from GCS, transformed and put into BigQuery.

The loading of data is parameterized:

enter image description here

Here is the dataflow template: https://gist.github.com/arturozz/68fc482dba53ee2ab45f08b768e2d7cb

regex

google-cloud-dataflow

0 Answers

Your Answer

Accepted video resources