1 year ago
#224618
Lorenzo Siboni
How to synchronize and redistribute data WITHOUT interpolation?
Suppose I have a DataFrame like this, which is supposed to measure energy consumtpions every 15 minutes for a whole House and, separately, also for the Oven of that house. Unfortunately, once I put them together I can see that the measurements are not synchronized:
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['timestamp1'] = ['27/06/2021 11:50:00', '27/06/2021 12:05:00', '27/06/2021 12:20:00', '27/06/2021 12:35:00', '27/06/2021 12:55:00', '27/06/2021 13:05:00' ]
df['Energy House'] = [814,642,473,783,386,503]
df['timestamp2'] = ['27/06/2021 12:00:00', '27/06/2021 12:15:00', '27/06/2021 12:35:00', '27/06/2021 12:45:00', '27/06/2021 13:00:00', '27/06/2021 13:15:00']
df['Energy Oven'] = [160,40,175,337,50,19]
timestamp1 Energy House timestamp2 Energy Oven
0 27/06/2021 11:50:00 814 27/06/2021 12:00:00 160
1 27/06/2021 12:05:00 642 27/06/2021 12:15:00 40
2 27/06/2021 12:20:00 473 27/06/2021 12:35:00 175
3 27/06/2021 12:35:00 783 27/06/2021 12:45:00 337
4 27/06/2021 12:55:00 386 27/06/2021 13:00:00 50
5 27/06/2021 13:05:00 503 27/06/2021 13:15:00 19
What I want to achieve, overall, is to synchronize the energy consumption of the Oven with respect to the consumption of the whole House.
Note the problem here is double: on one side the Oven's data are 10 minutes ahead with respect to the House's data (this is the basic part of the question), but addiotionally the data don't always respect the 15 minutes rule (see row number 2, where Oven measures 20 minutes consumption, and the same happens at row 4 for House's consumtpions). Therefore there's no uniformity in timestamps for both of the columns.
Alternative 1: we synchronize data by making sure that Oven's measurements adapt to the timing of the House's measurements, also changing the Oven's energy consumtpions according to the timings change, even if the House is not respecting the 15 minutes rule.
Alternative 2: first we re-distribute only the House's data evenly over 15 minutes sharp, then we do the same procedure separately for the Oven... So the final result will still be a perfect synchronization, but all of the original timings will be lost.
If possible, it would be important for me to do this re-distribution of the energy data WITHOUT interpolation . My idea is that, since we are talking about energy (and not power) consumed along 15 minutes, interpolation is not necessary. For example the data of the Oven can be re-distributed in this way (example for the row number 0, just a concept):
- "Oven_new (@ 12:05)" = last 10 minutes of "Oven_old (@12:00)" + first 5 minutes of "Oven_old (@12:15)"
Which mathematically means to multiply by some simple time fractions, written in minutes:
- "Oven_new (@ 12:05)" = 160*((60-50)/15) + 40*((05-00)/15)
Can you help me to find a solution both for Alternative 1 and Alternative 2, without the use of interpolation? Do you think that my idea about re-distribution is correct or is it better to interpolate with the function "reindex"?
python
pandas
synchronization
interpolation
reindex
0 Answers
Your Answer