Copying values iteratively

iesnaola · February 2018

Dear Community,

I am dealing with some data preprocessing in order to fill some Missing Values on my Dataset. The problem is that I have two attributes with different sampling (i.e. one of them has data hourly, and the other one just daily). So, I have this dataset:

date atr1 atr2

d1_09:00 5 ?

d1_10:00 6 ?

d1_11:00 5 20

d1_12:00 5 ?

...

d2_09:00 7 ?

d2_10:00 6 ?

d2_11:00 5 13

d2_12:00 6 ?

I would like to be able to take a value of attr2 and use it to fill MV until the next value is found. So, this example would be end somehow like:

date atr1 atr2

d1_09:00 5 ?

d1_10:00 6 ?

d1_11:00 5 20

d1_12:00 5 20

...

d2_09:00 7 20

d2_10:00 6 20

d2_11:00 5 13

d2_12:00 6 13

I guess I should be using some "Loop" operator, but so far I couldn't achieve what I am looking for.

Anyone dealing with similar issues?

Thanks in advance,

Iker

MartinLiebig · February 2018

Hi,

i think replace Missing Values (Series) of Series extension should do the trick.

Best,

Martin

SGolbert · February 2018

Hi Iker,

If there is a single value for att2 every day, you could separate the date attribute into Day and Hour and then use the day as key for a left/right join. I attach a sample process.

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="82" name="Execute R" width="90" x="112" y="34">
        <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function()&#10;{&#10;    data = expand.grid(Day = 1:2, Hour = 0:23)&#10;&#10;    data$att1 = sample(1:100, 48)&#10;&#10;    data$att2 = NA&#10;&#10;    data[data$Hour == 1, &quot;att2&quot;] &lt;- c(10, 20)&#10;    &#10;&#10;    # connect 2 output ports to see the results&#10;    return(list(data))&#10;}&#10;"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="8.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="313" y="34">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="att2.is_not_missing."/>
        </list>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="514" y="136">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="att2"/>
        <parameter key="invert_selection" value="true"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="514" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="att1"/>
        <parameter key="invert_selection" value="true"/>
      </operator>
      <operator activated="true" class="concurrency:join" compatibility="8.1.000" expanded="true" height="82" name="Join" width="90" x="715" y="34">
        <parameter key="join_type" value="right"/>
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="Day" value="Day"/>
        </list>
      </operator>
      <connect from_op="Execute R" from_port="output 1" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="original" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Join" to_port="right"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Join" to_port="left"/>
      <connect from_op="Join" from_port="join" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Regards,

Sebastian

iesnaola · February 2018

Hi @SGolbert

Even though I have already installed R Scripting extension (v 7.0.0) on my Rapidminer Studio (v7.1) I cannot load your example. I get the idea of what you say, but I cannot see how to implement it...

BR

SGolbert · February 2018

Hi,

is it possible that you don't have the R path configured? (settings -> Preferences -> R Scripting)

I attach a CSV file with the dataset just to be sure.

Best,

Sebastian

land · February 2018

Hi,

using R of course solves the problem. However, using R solves any problem with an arbitrary overhead.

The single operator @mschmitz quoted will do the trick. (easier AND with less computational overhead)

If you additionally have the problem that the timestamps are not nicely distributed, eg some of them are missing or they are not equidistant, try the "Resample Multiple Series" operator from the Jackhammer extension of the marketplace. May also be worth a look if you have many of the attributes as the "Replace Missing Values (Series)" operator only works on a single attribute.

Greetings,

Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Copying values iteratively

Best Answer

Answers