python (65.2k questions)
javascript (44.3k questions)
reactjs (22.7k questions)
java (20.8k questions)
c# (17.4k questions)
html (16.3k questions)
r (13.7k questions)
android (13k questions)
Rewrite UDF to pandas UDF Pyspark
I have a dataframe:
import pyspark.sql.functions as F
sdf1 = spark.createDataFrame(
[
(2022, 1, ["apple", "edible"]),
(2022, 1, ["edible", "frui...

Rory
Votes: 0
Answers: 1
Create column using Spark pandas_udf, with dynamic number of input columns
I have this df:
df = spark.createDataFrame(
[('row_a', 5.0, 0.0, 11.0),
('row_b', 3394.0, 0.0, 4543.0),
('row_c', 136111.0, 0.0, 219255.0),
('row_d', 0.0, 0.0, 0.0),
('row_e', ...

ZygD
Votes: 0
Answers: 4
find the top n unique values of a column based on ranking of another column within groups in pyspark
I have a dataframe like below:
df = pd.DataFrame({ 'region': [1,1,1,1,1,1,2,2,2,3],
'store': ['A', 'A', 'C', 'C', 'D', 'B', 'F', 'F', 'E', 'G'],
'call_date': ['2022-03-...
zesla
Votes: 0
Answers: 4
pyspark.pandas.exceptions.PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented
I am trying to replace pandas library with pyspark.pandas library.
I tried this :
NOTE : df is pyspark.pandas dataframe
import pyspark.pandas as pd
print(set(df["horizon"].unique()))
But g...
user19930511
Votes: 0
Answers: 0