1 year ago

#360283

test-img

mohit

Scala - java.lang.IllegalArgumentException: requirement failed: All input types must be the same except

Here I'm trying to add date to the data frame(232) but I'm getting exception on line number 234:

232 -    val df5Final = df5Phone.withColumn(colName, regexp_replace(col(colName), date_pattern, "REDACTED_DATE"))
233 -    logger.info("FETCH all data query ::{}", df5Final.printSchema())
234 -    df5Final.persist(StorageLevel.MEMORY_AND_DISK)
235 -    logger.info("FETCH all data query ::{}", df5Final.printSchema())
236 -     // LOADING WHOLE TABLE NOW
237 -    val colms = paramMap("output_column_names").concat(",").concat(MASK_COLUMN_NAME)
238 -    val allQuery = "select ".concat(colms).concat(" from ").concat(paramMap("table_name"))

exception:

java.lang.IllegalArgumentException: requirement failed: All input types must be the same except nullable, containsNull, valueContainsNull flags. The input types found are
    ArrayType(StructType(StructField(annotatorType,StringType,true), StructField(begin,IntegerType,false), StructField(end,IntegerType,false), StructField(result,StringType,true), StructField(metadata,MapType(StringType,StringType,true),true), StructField(embeddings,ArrayType(FloatType,false),true)),false)
    ArrayType(StructType(StructField(result,StringType,true), StructField(metadata,MapType(StringType,StringType,true),true)),false)
    at scala.Predef$.require(Predef.scala:281)
    at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataTypeCheck(Expression.scala:1043)
    at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataTypeCheck$(Expression.scala:1037)
    at org.apache.spark.sql.catalyst.expressions.If.dataTypeCheck(conditionalExpressions.scala:35)
    at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataType(Expression.scala:1048)
    at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataType$(Expression.scala:1047)
    at org.apache.spark.sql.catalyst.expressions.If.dataType(conditionalExpressions.scala:35)
    at org.apache.spark.sql.catalyst.expressions.Alias.dataType(namedExpressions.scala:164)
    at org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$.pruneSerializer(objects.scala:196)
    at org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.$anonfun$applyOrElse$3(objects.scala:216)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at scala.collection.TraversableLike.map(TraversableLike.scala:238)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
    at scala.collection.immutable.List.map(List.scala:298)

I'm using scala 2.12, spark 3.0.0

P.S. -> Shall I use lit() around regex replace ?

Latter on I will be writing this data frame on HDFS in the form of ORC files.

UPDATE:

On further more analyzing the code, I found the actual LOC which is causing this issue.

val nerResultMdExp = nerResultMd.select(col("text"), col(MASK_COLUMN_NAME), explode_outer(col("nerArray")))

scala

apache-spark

apache-spark-sql

spark-streaming

amazon-emr

0 Answers

Your Answer

Accepted video resources