The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Average over Random Forest - Tree Importance Weights"

MuehliManMuehliMan Member Posts: 85 Maven
edited May 2019 in Help
Hi all,

here is another thing I would need your help. Random Forest (Weighting) is giving different attributes every run (due to the random selection of attributes I guess).  I would like to perform the Random Forest multiple times and average over all weights to see, when the weights converge.

Here is my basic workflow, but I do not know how to average over all weights in the paramter loop.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.0.10" expanded="true" name="Process">
   <process expanded="true" height="388" width="1065">
     <operator activated="true" class="generate_data" compatibility="5.0.8" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
       <parameter key="number_examples" value="200"/>
       <parameter key="number_of_attributes" value="100"/>
     </operator>
     <operator activated="true" class="discretize_by_user_specification" compatibility="5.0.8" expanded="true" height="94" name="Discretize" width="90" x="179" y="30">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="label"/>
       <parameter key="include_special_attributes" value="true"/>
       <list key="classes">
         <parameter key="0" value="0.5"/>
         <parameter key="1" value="1.0"/>
       </list>
     </operator>
     <operator activated="true" class="nominal_to_binominal" compatibility="5.0.8" expanded="true" height="94" name="Nominal to Binominal" width="90" x="313" y="30">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="label"/>
       <parameter key="include_special_attributes" value="true"/>
     </operator>
     <operator activated="true" class="generate_id" compatibility="5.0.8" expanded="true" height="76" name="Generate ID" width="90" x="447" y="30"/>
     <operator activated="true" class="loop_parameters" compatibility="5.0.10" expanded="true" height="76" name="Loop Parameters" width="90" x="581" y="30">
       <list key="parameters">
         <parameter key="Random Forest.local_random_seed" value="[333;33333333;1000;linear]"/>
       </list>
       <process expanded="true" height="388" width="979">
         <operator activated="true" class="random_forest" compatibility="5.0.10" expanded="true" height="76" name="Random Forest" width="90" x="45" y="30">
           <parameter key="criterion" value="gini_index"/>
           <parameter key="maximal_depth" value="5"/>
           <parameter key="use_local_random_seed" value="true"/>
           <parameter key="local_random_seed" value="1866981"/>
         </operator>
         <operator activated="true" class="weight_by_forest" compatibility="5.0.10" expanded="true" height="76" name="Weight by Tree Importance" width="90" x="180" y="30">
           <parameter key="criterion" value="gini_index"/>
         </operator>
         <operator activated="true" class="weights_to_data" compatibility="5.0.10" expanded="true" height="60" name="Weights to Data" width="90" x="313" y="30"/>
         <connect from_port="input 1" to_op="Random Forest" to_port="training set"/>
         <connect from_op="Random Forest" from_port="model" to_op="Weight by Tree Importance" to_port="random forest"/>
         <connect from_op="Weight by Tree Importance" from_port="weights" to_op="Weights to Data" to_port="attribute weights"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="source_input 2" spacing="0"/>
         <portSpacing port="sink_performance" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Generate Data" from_port="output" to_op="Discretize" to_port="example set input"/>
     <connect from_op="Discretize" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
     <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
     <connect from_op="Generate ID" from_port="example set output" to_op="Loop Parameters" to_port="input 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
   </process>
 </operator>
</process>
Tagged:

Answers

  • fischerfischer Member Posts: 439 Maven
    Hi,

    I think you can use "Loop and Average" for your purposes.

    Best,
    Simon
  • MuehliManMuehliMan Member Posts: 85 Maven
    But I want to use different random seeds as well. Is that still possible? (I got the tip to use loop parameter to obtain new random seeds by Ingo.)

    UPDATE: I tested a random forest looped over different number of trees (10 to 1000) and each number of trees I looped 50 times using different local random seeds. What surprised me most was that only some (almost discrete) performances appeared in the trees vs performance plot. (I could supply the plot if needed. ) I am aksing myself (and you) if this occurs because of the random seeds or the number of attributes.
  • fischerfischer Member Posts: 439 Maven
    Hi,

    if the hypothesis space is limited, it is not uncommon that you only observe a limited number of possible performances. This certainly depends on the number of attributes you have (and whether these are actually used by the model).

    Best,
    Simon
Sign In or Register to comment.