1 year ago

#290970

test-img

Tim

Is there an option to directly delete rows in ORC file in pyspark or databricks

Is there any option to directly delete the rows from ORC files, provided its structure.

I am using Azure Databricks,

With below query i am reading the content of the ORC file, and wanted to delete those

%sql select * from orc.`/mnt/my-adls-storage/data/app/simple.orc` 
  where field='test'

Is there a way to directly remove the rows from orc.

Alternatively,

    1. I can read the orc as dataframe
    1. Filter the records to be deleted
    1. Write back to a new orc file.
    1. Remove the older one

pyspark

databricks

orc

0 Answers

Your Answer

Accepted video resources