1 year ago

#158965

test-img

Jared DuPont

How do I control file output size when using Beeline's INSERT OVERWRITE DIRECTORY?

I am running many commands like this to export data from hive as CSVs:

INSERT OVERWRITE DIRECTORY '/output/database/table/' 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
SELECT * FROM database.table;

That command does exactly what I want it to, but I am getting totally random file sizes on the output ranging from 100mb to 500mb. I would like to specify a max file size for it to write out, around 200mb. Is this possible?

hadoop

hive

beeline

0 Answers

Your Answer

Accepted video resources