1 year ago
#290970
Tim
Is there an option to directly delete rows in ORC file in pyspark or databricks
Is there any option to directly delete the rows from ORC files, provided its structure.
I am using Azure Databricks,
With below query i am reading the content of the ORC file, and wanted to delete those
%sql select * from orc.`/mnt/my-adls-storage/data/app/simple.orc`
where field='test'
Is there a way to directly remove the rows from orc.
Alternatively,
-
- I can read the orc as dataframe
-
- Filter the records to be deleted
-
- Write back to a new orc file.
-
- Remove the older one
pyspark
databricks
orc
0 Answers
Your Answer