The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Outer join behaves differently if left and right inputs are swapped
awchisholm
RapidMiner Certified Expert, Member Posts: 458 Unicorn
Hello
I'm trying to join two example sets using the outer join option but I've observed that the operator behaves differently depening on which order the left and right inputs are presented.
I've made an example that shows this; if the inputs are swapped the results change.
Am I right to think the order should not make a difference?
regards
Andrew
I'm trying to join two example sets using the outer join option but I've observed that the operator behaves differently depening on which order the left and right inputs are presented.
I've made an example that shows this; if the inputs are swapped the results change.
Am I right to think the order should not make a difference?
regards
Andrew
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
<process expanded="true" height="686" width="858">
<operator activated="true" class="generate_data" compatibility="5.1.001" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="number_examples" value="1"/>
<parameter key="number_of_attributes" value="2"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="120">
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="generate_id" compatibility="5.1.001" expanded="true" height="76" name="Generate ID" width="90" x="45" y="210"/>
<operator activated="true" class="generate_attributes" compatibility="5.1.001" expanded="true" height="76" name="Generate Attributes" width="90" x="179" y="210">
<list key="function_descriptions">
<parameter key="a1" value="10+id"/>
<parameter key="a2" value="20+id"/>
</list>
</operator>
<operator activated="true" class="declare_missing_value" compatibility="5.1.001" expanded="true" height="76" name="Declare Missing Value" width="90" x="179" y="75">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="a2"/>
<parameter key="numeric_value" value="21.0"/>
</operator>
<operator activated="true" class="generate_data" compatibility="5.1.001" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="525">
<parameter key="number_examples" value="2"/>
<parameter key="number_of_attributes" value="2"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes (2)" width="90" x="45" y="435">
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="generate_id" compatibility="5.1.001" expanded="true" height="76" name="Generate ID (2)" width="90" x="45" y="345"/>
<operator activated="true" class="generate_attributes" compatibility="5.1.001" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="179" y="345">
<list key="function_descriptions">
<parameter key="a2" value="20+id"/>
<parameter key="a3" value="30+id"/>
</list>
</operator>
<operator activated="true" class="multiply" compatibility="5.1.001" expanded="true" height="94" name="Multiply" width="90" x="380" y="75"/>
<operator activated="true" class="multiply" compatibility="5.1.001" expanded="true" height="94" name="Multiply (2)" width="90" x="380" y="435"/>
<operator activated="true" class="join" compatibility="5.1.001" expanded="true" height="76" name="Join" width="90" x="514" y="255">
<parameter key="join_type" value="outer"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Declare Missing Value" to_port="example set input"/>
<connect from_op="Declare Missing Value" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Generate Data (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
<connect from_op="Generate ID (2)" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
<connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_port="result 2"/>
<connect from_op="Multiply" from_port="output 2" to_op="Join" to_port="right"/>
<connect from_op="Multiply (2)" from_port="output 1" to_op="Join" to_port="left"/>
<connect from_op="Multiply (2)" from_port="output 2" to_port="result 3"/>
<connect from_op="Join" from_port="join" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="198"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
they are equal in the sense that the same number of examples with the same set of attributes (but in different order) is returned. They might differ in the values of the examples if the example sets are contradictionary.
Greetings,
Sebastian
I get the logic but I was hoping for a loophole. I set one of the values explicitly to missing and I observe this takes precedence over an actual value if missing is encountered first. Logically, it's missing so the second value should take precedence over it.
regards
Andrew
well I doubt that this behavior is desired in all situations. And if you are in another situation you will quite blame us why the missing value is just overwritten although all other values are kept.
You will have to deal with this problem explicitly. The only thing one could do is to include a parameter for that. If you want this, please go ahead and make a feature request for that.
Greetings,
Sebastian