Copying values iteratively
Dear Community,
I am dealing with some data preprocessing in order to fill some Missing Values on my Dataset. The problem is that I have two attributes with different sampling (i.e. one of them has data hourly, and the other one just daily). So, I have this dataset:
date atr1 atr2
d1_09:00 5 ?
d1_10:00 6 ?
d1_11:00 5 20
d1_12:00 5 ?
...
d2_09:00 7 ?
d2_10:00 6 ?
d2_11:00 5 13
d2_12:00 6 ?
I would like to be able to take a value of attr2 and use it to fill MV until the next value is found. So, this example would be end somehow like:
date atr1 atr2
d1_09:00 5 ?
d1_10:00 6 ?
d1_11:00 5 20
d1_12:00 5 20
...
d2_09:00 7 20
d2_10:00 6 20
d2_11:00 5 13
d2_12:00 6 13
I guess I should be using some "Loop" operator, but so far I couldn't achieve what I am looking for.
Anyone dealing with similar issues?
Thanks in advance,
Iker
Best Answer
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
Hi,
i think replace Missing Values (Series) of Series extension should do the trick.
Best,
Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany1
Answers
Hi Iker,
If there is a single value for att2 every day, you could separate the date attribute into Day and Hour and then use the day as key for a left/right join. I attach a sample process.
Regards,
Sebastian
Hi @SGolbert
Even though I have already installed R Scripting extension (v 7.0.0) on my Rapidminer Studio (v7.1) I cannot load your example. I get the idea of what you say, but I cannot see how to implement it...
BR
Hi,
is it possible that you don't have the R path configured? (settings -> Preferences -> R Scripting)
I attach a CSV file with the dataset just to be sure.
Best,
Sebastian
Hi,
using R of course solves the problem. However, using R solves any problem with an arbitrary overhead.
The single operator @mschmitz quoted will do the trick. (easier AND with less computational overhead)
If you additionally have the problem that the timestamps are not nicely distributed, eg some of them are missing or they are not equidistant, try the "Resample Multiple Series" operator from the Jackhammer extension of the marketplace. May also be worth a look if you have many of the attributes as the "Replace Missing Values (Series)" operator only works on a single attribute.
Greetings,
Sebastian