Moving average for each ID

Cristina_daimiel · April 2020

Hello all,

I have a dataset with the energy produced by several PV plants each 15 minutes across 1 year. Therefore, I have a column with the datetime (around 18000 examples for each ID), another one with the ID (each PV plant have a different ID, in total I have 4 IDs) and the energy produced. For each example, I'm calculating the moving average of the previous 3 hours with the Operator "Moving average filter". However, when the first year of the first ID ends, for the second ID, the moving average is is calculating the average for the last 3 hours of the previous ID, instead of starting the calculation from the beginning. Is there a way for me to take the ID into account to do this calculation? Or should I separate the exampleset into 4 different exampleset (one for each ID) and do the calculation separately?

Many thanks in advance

MartinLiebig · April 2020

Hi @Cristina_daimiel ,

you can use Group into Collection to split the example set and then use Loop Collection to do it per plant. There are defenitly a few ways to do this, but that would be mine. Attached is an example.

Can I maybe ask for what kind of project you are doing this? This sounds very cool.

Best,

Martin

<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
<context>
    <input/>
    <output/>
    <macros/>
</context>
<operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" breakpoints="after" class="subprocess" compatibility="9.6.000" expanded="true" height="82" name="Subprocess" width="90" x="179" y="34">
        <process expanded="true">
          <operator activated="true" class="concurrency:loop" compatibility="9.6.000" expanded="true" height="82" name="Loop" width="90" x="45" y="34">
            <parameter key="number_of_iterations" value="5"/>
            <parameter key="iteration_macro" value="iteration"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="utility:create_exampleset" compatibility="9.6.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="380" y="34">
                <parameter key="generator_type" value="attribute functions"/>
                <parameter key="number_of_examples" value="100"/>
                <parameter key="use_stepsize" value="false"/>
                <list key="function_descriptions">
                  <parameter key="Consumption" value="round(rand()*1000)"/>
                  <parameter key="Date" value="date_add(date_now(),id,DATE_UNIT_DAY)"/>
                  <parameter key="Plant Id" value="%{a}"/>
                </list>
                <parameter key="add_id_attribute" value="true"/>
                <list key="numeric_series_configuration"/>
                <list key="date_series_configuration"/>
                <list key="date_series_configuration (interval)"/>
                <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
                <parameter key="time_zone" value="SYSTEM"/>
                <parameter key="column_separator" value=","/>
                <parameter key="parse_all_as_nominal" value="false"/>
                <parameter key="decimal_point_character" value="."/>
                <parameter key="trim_attribute_names" value="true"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="9.6.000" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="34">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="id"/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="true"/>
                <parameter key="include_special_attributes" value="true"/>
              </operator>
              <connect from_op="Create ExampleSet" from_port="output" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="9.6.000" expanded="true" height="82" name="Append" width="90" x="179" y="34">
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="merge_type" value="all"/>
          </operator>
          <connect from_op="Loop" from_port="output 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Generate Dummy Data</description>
      </operator>
      <operator activated="true" class="operator_toolbox:group_into_collection" compatibility="2.4.000-SNAPSHOT" expanded="true" height="82" name="Group Into Collection" width="90" x="447" y="34">
        <parameter key="group_by_attribute" value="Plant Id"/>
        <parameter key="group_by_attribute (numerical)" value=""/>
        <parameter key="sorting_order" value="none"/>
        <description align="center" color="transparent" colored="false" width="126">Split into 5 example sets, one plant each</description>
      </operator>
      <operator activated="true" class="loop_collection" compatibility="9.6.000" expanded="true" height="82" name="Loop Collection" width="90" x="715" y="34">
        <parameter key="set_iteration_macro" value="false"/>
        <parameter key="macro_name" value="iteration"/>
        <parameter key="macro_start_value" value="1"/>
        <parameter key="unfold" value="false"/>
        <process expanded="true">
          <operator activated="true" class="time_series:moving_average_filter" compatibility="9.6.000" expanded="true" height="68" name="Moving Average Filter" width="90" x="112" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Consumption"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="real"/>
            <parameter key="block_type" value="value_series"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_series_end"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="overwrite_attributes" value="true"/>
            <parameter key="new_attributes_postfix" value="_filtered"/>
            <parameter key="filter_type" value="simple"/>
            <parameter key="filter_size_left" value="1"/>
            <parameter key="filter_size_right" value="1"/>
            <parameter key="filter_size" value="1"/>
          </operator>
          <connect from_port="single" to_op="Moving Average Filter" to_port="example set"/>
          <connect from_op="Moving Average Filter" from_port="example set" to_port="output 1"/>
          <portSpacing port="source_single" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Do moving average per plant</description>
      </operator>
      <operator activated="true" class="operator_toolbox:advanced_append" compatibility="2.4.000-SNAPSHOT" expanded="true" height="82" name="Append (Superset)" width="90" x="849" y="34"/>
      <connect from_op="Subprocess" from_port="out 1" to_op="Group Into Collection" to_port="exa"/>
      <connect from_op="Group Into Collection" from_port="col" to_op="Loop Collection" to_port="collection"/>
      <connect from_op="Loop Collection" from_port="output 1" to_op="Append (Superset)" to_port="example set 1"/>
      <connect from_op="Append (Superset)" from_port="merged set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
</operator>
</process>

Cristina_daimiel · April 2020

mschmitz ! It's working now

After the loop Connection operator, the example set have been split into 4 different dataset (one per PV plant) within an IOObjectcollection. Do you happen to know how can I combine again the data into the same exampleset?

The project I'm working on has as its objective the prediction of failures in a photovoltaic plant. For this I have data from different variables, together with enviroment conditions (irradiation, temperature, humidity, etc) every 15 minutes and for a full year.

MartinLiebig · April 2020

Hi @Cristina_daimiel ,

the append operator is able to append horizontally back to one example set. I recommend to use the operator Append (Superset) which has some more advanced features, but the normal append should so do it.

Let me know if we can help you further. Especially if this is a commerical opportunity.

Cheers,

Martin

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Moving average for each ID

Best Answer

Answers