1 year ago

#122829

test-img

Murli Krishnan

Apache Beam Python - SQL Transform with named PCollection Issue

I am trying to execute the below code in which I am using Named Tuple for PCollection and SQL transform for doing a simple select.

As per the video link (4:06) : https://www.youtube.com/watch?v=zx4p-UNSmrA.

Instead of using PCOLLECTION in SQLTransform query, named PCollections can also be provided as below.

Code Block

class EmployeeType(typing.NamedTuple):
    name:str
    age:int

beam.coders.registry.register_coder(EmployeeType, beam.coders.RowCoder)

pcol = p | "Create" >> beam.Create([EmployeeType(name="ABC", age=10)]).with_output_types(EmployeeType)

(
{'a':pcol} | SqlTransform(
    """ SELECT age FROM a """) 
    | "Map" >> beam.Map(lambda row: row.age)
    | "Print" >> beam.Map(print) 
)

p.run()

However the below code block errors out with error

Caused by: org.apache.beam.vendor.calcite.v1_28_0.org.apache.calcite.sql.validate.SqlValidatorException: Object 'a' not found

Apache Beam SDK used is 2.35.0, are there any known limitation in using named PCollection

google-cloud-dataflow

apache-beam

dataflow

apache-beam-internals

0 Answers

Your Answer

Accepted video resources