python (65.1k questions)
javascript (44.2k questions)
reactjs (22.7k questions)
java (20.8k questions)
c# (17.4k questions)
html (16.3k questions)
r (13.7k questions)
android (12.9k questions)
Empty result after left-joining two streaming DataFrames and then aggregating them in PySpark
Data description
I have two CSV files with logs - one contains Span logs, and the other contains Error logs (Elastic APM logs, for those who are acquainted). The relevant span fields are timestamp, id...
Gloripaxis
Votes: 0
Answers: 0
Driver memory not getting cleaned up in Spark Structured Streaming
I am using my Spark Structured Streaming job to perform my ETL in AWS platform
My Driver memory is not getting cleared-up. The job is reading the events from Kinesis and writing to S3
Below are the my...
Ankur Shrivastava
Votes: 0
Answers: 1
Scala - java.lang.IllegalArgumentException: requirement failed: All input types must be the same except
Here I'm trying to add date to the data frame(232) but I'm getting exception on line number 234:
232 - val df5Final = df5Phone.withColumn(colName, regexp_replace(col(colName), date_pattern, "R...
mohit
Votes: 0
Answers: 0
Spark RDD S3 saveAsTextFile taking long time
I have a Spark Streaming job on EMR which runs on batches of 30 mins, processes the data and finally writes the output to several different files in S3. Now the output step to S3 is taking too long (a...
ekjot
Votes: 0
Answers: 1