1 year ago
#386698
yalexx
Pandas - MemoryError: Unable to allocate 220. MiB
So I have a data frame of orders, with the order date as the index, which I set so:
df = df.set_index('ORDER_ENTRY_DATE', drop=False)
In the code below I create a new feature, containing the total amount successfully paid in the last 8 weeks for a specific customer. (excluding current order)
df["LAST_8_WEEKS_SUCCESSFUL"] = (df["PAYMENT_SUCCESSFUL"].mul(df["TOTAL_AMOUNT"])
.groupby(df["CUST_NO"])
.transform(lambda x: x.rolling(window='56D', min_periods= 1).sum().shift())
.fillna(0)
)
I have tested this code on a smaller version of my dataset and it works fine, but when running it on the full fledged 28 million rows dataset, I get a memory error
MemoryError: Unable to allocate 220. MiB for an array with shape (28879273,) and data type int64
Is there any other way to accomplish this without needing 220 MiB RAM? Is my code way too inefficient?
python
pandas
dataframe
pandas-groupby
bigdata
0 Answers
Your Answer