The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Decision Tree (Parallel) randomness in numerical attribute's splits?
Hi, how are you?
I've been using Decision Tree (Parallel) and I noticed there's a huge difference with using the non-parallel version of the node.
Using exactly the same attributes, parameters and sample, numerical attributes get different splits whenever I run the node again, while the non-parallel version will always produce the same splits and exactly the same trees.
This has something to do with splitting processing in multiple threads, but what is going on exactly?
Check the following process:
You can clearly see the difference between running 3 times a non-parallel and a parallel Decision Tree. You can also change number of threads to 1 and see how the trees become identical.
Thanks for your insight, best regards.
I've been using Decision Tree (Parallel) and I noticed there's a huge difference with using the non-parallel version of the node.
Using exactly the same attributes, parameters and sample, numerical attributes get different splits whenever I run the node again, while the non-parallel version will always produce the same splits and exactly the same trees.
This has something to do with splitting processing in multiple threads, but what is going on exactly?
Check the following process:
You can clearly see the difference between running 3 times a non-parallel and a parallel Decision Tree. You can also change number of threads to 1 and see how the trees become identical.
Thanks for your insight, best regards.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="multiply" compatibility="5.3.015" expanded="true" height="94" name="Multiply" width="90" x="179" y="30"/>
<operator activated="true" class="loop" compatibility="5.3.015" expanded="true" height="76" name="LOOP DT PAR" width="90" x="380" y="120">
<parameter key="iterations" value="3"/>
<process expanded="true">
<operator activated="true" class="parallel:decision_tree_parallel" compatibility="5.3.000" expanded="true" height="76" name="DT PAR" width="90" x="112" y="30">
<parameter key="number_of_threads" value="2"/>
</operator>
<connect from_port="input 1" to_op="DT PAR" to_port="training set"/>
<connect from_op="DT PAR" from_port="model" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="loop" compatibility="5.3.015" expanded="true" height="76" name="LOOP DT" width="90" x="380" y="30">
<parameter key="iterations" value="3"/>
<process expanded="true">
<operator activated="true" class="decision_tree" compatibility="5.3.015" expanded="true" height="76" name="DT" width="90" x="179" y="30"/>
<connect from_port="input 1" to_op="DT" to_port="training set"/>
<connect from_op="DT" from_port="model" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="LOOP DT" to_port="input 1"/>
<connect from_op="Multiply" from_port="output 2" to_op="LOOP DT PAR" to_port="input 1"/>
<connect from_op="LOOP DT PAR" from_port="output 1" to_port="result 2"/>
<connect from_op="LOOP DT" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
From my understanding, diversity with this operator comes from using a subset of attributes, therefore using subset ratio 1 should give every tree all attributes and therefore produce identical trees.
we coded a new decision tree in version 6.3., so i can not reproduce your code.
It could be that this was a known issue fixed in v 6.X
Cheers,
Martin
Dortmund, Germany
We are looking forward to upgrade to 6.X whenever we can afford it.
Cheers!