Spark, when we need to enable KryoSerializer?

2 years ago

#245915

gfytd

I have a spark (version 2.4.7) job,

JavaRDD<Row> rows = javaSparkContext.newAPIHadoopFile(...)
                    .map(d -> {
                      Foo foo = Foo.parseFrom(d._2.copyBytes());
                      String v1 = foo.getField1();
                      int v2 = foo.getField2();
                      double[] v3 = foo.getField3();
                      return RowFactory.create(v1, v2, v3);
                    })
spark.createDataFrame(rows, schema).createOrReplaceTempView("table1");
spark.sql("...sum/avg/group-by...").show();

Class Foo here is a complex Google protobuf class.

I have several questions:

Will changing the 'spark.serializer' to Kryo have any impact on Foo objects in this case?
In this case, if all columns of DataFrame are either primitive or String or array of primitive/String, from the performance perspective, is it necessary to change the 'spark.serializer' to Kryo?

Many thnaks.

apache-spark

kryo

0 Answers

Your Answer

Posts

Questions

Blogs