1 year ago
#341585
Fahadakbar
How to drop original columns in a spark ML transformer
When I run a spark ml transformer, we provide input and output columns. The transformed data set contains both types of columns, i.e. old columns and transformed columns
e.g.
from pyspark.ml.feature import Imputer
df = spark.createDataFrame([
(1.0, float("nan")),
(2.0, float("nan")),
(float("nan"), 3.0),
(4.0, 4.0),
(5.0, 5.0)
], ["a", "b"])
imputer = Imputer(inputCols=["a", "b"], outputCols=["out_a", "out_b"])
model = imputer.fit(df)
model.transform(df).columns
This will print out
['a','b','out_a','out_b']
Is it possible to ask the transformer to spit out the transformed column only?
I want this to happen inside the transformer, and do not want to remove the columns using the drop method in spark dataframe
apache-spark
pyspark
apache-spark-ml
0 Answers
Your Answer