1 year ago
#245915
gfytd
Spark, when we need to enable KryoSerializer?
I have a spark (version 2.4.7) job,
JavaRDD<Row> rows = javaSparkContext.newAPIHadoopFile(...)
.map(d -> {
Foo foo = Foo.parseFrom(d._2.copyBytes());
String v1 = foo.getField1();
int v2 = foo.getField2();
double[] v3 = foo.getField3();
return RowFactory.create(v1, v2, v3);
})
spark.createDataFrame(rows, schema).createOrReplaceTempView("table1");
spark.sql("...sum/avg/group-by...").show();
Class Foo here is a complex Google protobuf class.
I have several questions:
- Will changing the 'spark.serializer' to Kryo have any impact on Foo objects in this case?
- In this case, if all columns of DataFrame are either primitive or String or array of primitive/String, from the performance perspective, is it necessary to change the 'spark.serializer' to Kryo?
Many thnaks.
apache-spark
kryo
0 Answers
Your Answer