1 year ago

#386698

test-img

yalexx

Pandas - MemoryError: Unable to allocate 220. MiB

So I have a data frame of orders, with the order date as the index, which I set so:

df = df.set_index('ORDER_ENTRY_DATE', drop=False)

In the code below I create a new feature, containing the total amount successfully paid in the last 8 weeks for a specific customer. (excluding current order)

df["LAST_8_WEEKS_SUCCESSFUL"] = (df["PAYMENT_SUCCESSFUL"].mul(df["TOTAL_AMOUNT"])
                                                                .groupby(df["CUST_NO"])
                                                                .transform(lambda x: x.rolling(window='56D', min_periods= 1).sum().shift())
                                                                .fillna(0)
                                        )

I have tested this code on a smaller version of my dataset and it works fine, but when running it on the full fledged 28 million rows dataset, I get a memory error

MemoryError: Unable to allocate 220. MiB for an array with shape (28879273,) and data type int64

Is there any other way to accomplish this without needing 220 MiB RAM? Is my code way too inefficient?

python

pandas

dataframe

pandas-groupby

bigdata

0 Answers

Your Answer

Accepted video resources