1 year ago

#224618

test-img

Lorenzo Siboni

How to synchronize and redistribute data WITHOUT interpolation?

Suppose I have a DataFrame like this, which is supposed to measure energy consumtpions every 15 minutes for a whole House and, separately, also for the Oven of that house. Unfortunately, once I put them together I can see that the measurements are not synchronized:

import pandas as pd
import numpy as np
df = pd.DataFrame()

df['timestamp1'] = ['27/06/2021  11:50:00', '27/06/2021  12:05:00', '27/06/2021  12:20:00', '27/06/2021  12:35:00', '27/06/2021  12:55:00', '27/06/2021  13:05:00' ]
df['Energy House'] = [814,642,473,783,386,503]

df['timestamp2'] = ['27/06/2021  12:00:00', '27/06/2021  12:15:00', '27/06/2021  12:35:00', '27/06/2021  12:45:00', '27/06/2021  13:00:00', '27/06/2021  13:15:00']
df['Energy Oven'] = [160,40,175,337,50,19]

             timestamp1  Energy House            timestamp2  Energy Oven
0  27/06/2021  11:50:00           814  27/06/2021  12:00:00          160
1  27/06/2021  12:05:00           642  27/06/2021  12:15:00           40
2  27/06/2021  12:20:00           473  27/06/2021  12:35:00          175
3  27/06/2021  12:35:00           783  27/06/2021  12:45:00          337
4  27/06/2021  12:55:00           386  27/06/2021  13:00:00           50
5  27/06/2021  13:05:00           503  27/06/2021  13:15:00           19

What I want to achieve, overall, is to synchronize the energy consumption of the Oven with respect to the consumption of the whole House.

Note the problem here is double: on one side the Oven's data are 10 minutes ahead with respect to the House's data (this is the basic part of the question), but addiotionally the data don't always respect the 15 minutes rule (see row number 2, where Oven measures 20 minutes consumption, and the same happens at row 4 for House's consumtpions). Therefore there's no uniformity in timestamps for both of the columns.

Alternative 1: we synchronize data by making sure that Oven's measurements adapt to the timing of the House's measurements, also changing the Oven's energy consumtpions according to the timings change, even if the House is not respecting the 15 minutes rule.

Alternative 2: first we re-distribute only the House's data evenly over 15 minutes sharp, then we do the same procedure separately for the Oven... So the final result will still be a perfect synchronization, but all of the original timings will be lost.

If possible, it would be important for me to do this re-distribution of the energy data WITHOUT interpolation . My idea is that, since we are talking about energy (and not power) consumed along 15 minutes, interpolation is not necessary. For example the data of the Oven can be re-distributed in this way (example for the row number 0, just a concept):

  • "Oven_new (@ 12:05)" = last 10 minutes of "Oven_old (@12:00)" + first 5 minutes of "Oven_old (@12:15)"

Which mathematically means to multiply by some simple time fractions, written in minutes:

  • "Oven_new (@ 12:05)" = 160*((60-50)/15) + 40*((05-00)/15)

Can you help me to find a solution both for Alternative 1 and Alternative 2, without the use of interpolation? Do you think that my idea about re-distribution is correct or is it better to interpolate with the function "reindex"?

python

pandas

synchronization

interpolation

reindex

0 Answers

Your Answer

Accepted video resources