pyspark csv format - mergeschema - Enhance your coding expertise with dragonachu on @onlycoders.net

2 years ago

#382082

dragonachu

pyspark csv format - mergeschema

I have a large dump of data that spans in TB's. The files contain activity data on a daily basis. Day 1 can have 2 columns and Day 2 can have 3 columns. The file dump was in csv format. Now I need to read all these files and load it into a table. Problem is the format is csv and I am not sure how to merge the schema so as to lose not any columns. I know this can be achieved in parquet through mergeschema, but I cant convert these files one by one into parquet as the data is huge. Is there any way to merge schema with format as csv?

apache-spark

pyspark

file-format

0 Answers

Your Answer

Posts

Questions

Blogs