The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Strange behaviour of impute missing values component [Solved]"

ammarghammargh Member Posts: 27 Maven
edited June 2019 in Help
Using Rapidminer 5.3.015

I am trying to process missing values.
After retrieving the data I used a multiply component. One of the multiply component's output is used as an input to the impute missing values component and a second output is connected to the process res port.

After running the process missing values both before and after the impute missing values were replaced !!.


This is strange because the original data should not be changed !!!

(Edited:  Same results with RM Studio 6.0.3)

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
       <parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
     </operator>
     <operator activated="true" class="multiply" compatibility="5.3.015" expanded="true" height="94" name="Multiply" width="90" x="179" y="165"/>
     <operator activated="true" class="impute_missing_values" compatibility="5.3.015" expanded="true" height="60" name="Impute Missing Values" width="90" x="447" y="255">
       <parameter key="attribute" value="class"/>
       <process expanded="true">
         <operator activated="true" class="k_nn" compatibility="5.3.015" expanded="true" height="76" name="k-NN" width="90" x="601" y="30">
           <parameter key="k" value="5"/>
         </operator>
         <connect from_port="example set source" to_op="k-NN" to_port="training set"/>
         <connect from_op="k-NN" from_port="model" to_port="model sink"/>
         <portSpacing port="source_example set source" spacing="0"/>
         <portSpacing port="sink_model sink" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Retrieve" from_port="output" to_op="Multiply" to_port="input"/>
     <connect from_op="Multiply" from_port="output 1" to_op="Impute Missing Values" to_port="example set in"/>
     <connect from_op="Multiply" from_port="output 2" to_port="result 2"/>
     <connect from_op="Impute Missing Values" from_port="example set out" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>
   </process>
 </operator>
</process>
Tagged:

Answers

  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee-RapidMiner, Member, University Professor Posts: 1,996 RM Engineering
    Hi,

    while most operators work on a view of the data, i.e. do not modify the underlying data, some do. This is sort of a mixture between internal restrictions and a bug. You can work around this by adding a "Materialize Data" operator after the "Multiply" for the connection which should return the original example set. See the following example process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.006">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="6.0.006" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="retrieve" compatibility="6.0.006" expanded="true" height="60" name="Retrieve Labor-Negotiations" width="90" x="45" y="75">
           <parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
         </operator>
         <operator activated="true" class="multiply" compatibility="6.0.006" expanded="true" height="94" name="Multiply" width="90" x="246" y="75"/>
         <operator activated="true" class="materialize_data" compatibility="6.0.006" expanded="true" height="76" name="Materialize Data" width="90" x="380" y="30"/>
         <operator activated="true" class="impute_missing_values" compatibility="6.0.006" expanded="true" height="60" name="Impute Missing Values" width="90" x="379" y="120">
           <process expanded="true">
             <operator activated="true" class="k_nn" compatibility="6.0.006" expanded="true" height="76" name="k-NN" width="90" x="112" y="30"/>
             <connect from_port="example set source" to_op="k-NN" to_port="training set"/>
             <connect from_op="k-NN" from_port="model" to_port="model sink"/>
             <portSpacing port="source_example set source" spacing="0"/>
             <portSpacing port="sink_model sink" spacing="0"/>
           </process>
         </operator>
         <connect from_op="Retrieve Labor-Negotiations" from_port="output" to_op="Multiply" to_port="input"/>
         <connect from_op="Multiply" from_port="output 1" to_op="Materialize Data" to_port="example set input"/>
         <connect from_op="Multiply" from_port="output 2" to_op="Impute Missing Values" to_port="example set in"/>
         <connect from_op="Materialize Data" from_port="example set output" to_port="result 1"/>
         <connect from_op="Impute Missing Values" from_port="example set out" to_port="result 2"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>
    Regards,
    Marco
  • ammarghammargh Member Posts: 27 Maven
    Thank you very much.
Sign In or Register to comment.