1 year ago

#341585

test-img

Fahadakbar

How to drop original columns in a spark ML transformer

When I run a spark ml transformer, we provide input and output columns. The transformed data set contains both types of columns, i.e. old columns and transformed columns

e.g.

from pyspark.ml.feature import Imputer

df = spark.createDataFrame([
(1.0, float("nan")),
(2.0, float("nan")),
(float("nan"), 3.0),
(4.0, 4.0),
(5.0, 5.0)
], ["a", "b"])

imputer = Imputer(inputCols=["a", "b"], outputCols=["out_a", "out_b"])
model = imputer.fit(df)

model.transform(df).columns

This will print out

['a','b','out_a','out_b']

Is it possible to ask the transformer to spit out the transformed column only?
I want this to happen inside the transformer, and do not want to remove the columns using the drop method in spark dataframe

apache-spark

pyspark

apache-spark-ml

0 Answers

Your Answer

Accepted video resources