The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Random Forest - Attribute Importance
I have built a Random Forest model that shows very good accuracy after many test runs so I think I found a winner for my simple problem. I used "Weight by Tree Importance" operator to see which attributes are most important. Customer Income turned out to be most important.
But how do I know if higher or lower income supports my prediction? With a simple decision tree I can just look at the split and see but how do I do that in a Random Forest?
Apologies for noob question.
Thank you in advance!
But how do I know if higher or lower income supports my prediction? With a simple decision tree I can just look at the split and see but how do I do that in a Random Forest?
Apologies for noob question.
Thank you in advance!
Tagged:
0
Best Answers
-
yyhuang Administrator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data ScientistHi @asav_yu, have you run the model simulator in Auto Model? If you try "simulator" operator on your random forest trees, it will show you how the input "customer income" would affect the prediction interactively..
https://rapidminer.com/products/auto-model/<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="112" y="34"> <parameter key="repository_entry" value="//Samples/data/Iris"/> </operator> <operator activated="true" class="split_data" compatibility="9.1.000" expanded="true" height="103" name="Split Data" width="90" x="246" y="238"> <enumeration key="partitions"> <parameter key="ratio" value="0.6"/> <parameter key="ratio" value="0.4"/> </enumeration> <parameter key="sampling_type" value="automatic"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> </operator> <operator activated="true" class="concurrency:parallel_random_forest" compatibility="9.1.000" expanded="true" height="103" name="Random Forest" width="90" x="380" y="34"> <parameter key="number_of_trees" value="100"/> <parameter key="criterion" value="gain_ratio"/> <parameter key="maximal_depth" value="10"/> <parameter key="apply_pruning" value="false"/> <parameter key="confidence" value="0.1"/> <parameter key="apply_prepruning" value="false"/> <parameter key="minimal_gain" value="0.01"/> <parameter key="minimal_leaf_size" value="2"/> <parameter key="minimal_size_for_split" value="4"/> <parameter key="number_of_prepruning_alternatives" value="3"/> <parameter key="random_splits" value="false"/> <parameter key="guess_subset_ratio" value="true"/> <parameter key="subset_ratio" value="0.2"/> <parameter key="voting_strategy" value="confidence vote"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="enable_parallel_execution" value="true"/> </operator> <operator activated="true" class="multiply" compatibility="9.1.000" expanded="true" height="103" name="Multiply" width="90" x="380" y="238"/> <operator activated="true" class="apply_model" compatibility="9.1.000" expanded="true" height="82" name="Apply Model" width="90" x="715" y="34"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="model_simulator:model_simulator" compatibility="9.1.000" expanded="true" height="103" name="Model Simulator" width="90" x="782" y="238"/> <connect from_op="Retrieve Iris" from_port="output" to_op="Split Data" to_port="example set"/> <connect from_op="Split Data" from_port="partition 1" to_op="Random Forest" to_port="training set"/> <connect from_op="Split Data" from_port="partition 2" to_op="Multiply" to_port="input"/> <connect from_op="Random Forest" from_port="model" to_op="Apply Model" to_port="model"/> <connect from_op="Random Forest" from_port="exampleSet" to_op="Model Simulator" to_port="training data"/> <connect from_op="Multiply" from_port="output 1" to_op="Apply Model" to_port="unlabelled data"/> <connect from_op="Multiply" from_port="output 2" to_op="Model Simulator" to_port="test data"/> <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/> <connect from_op="Apply Model" from_port="model" to_op="Model Simulator" to_port="model"/> <connect from_op="Model Simulator" from_port="simulator output" to_port="result 2"/> <connect from_op="Model Simulator" from_port="model output" to_port="result 3"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> <portSpacing port="sink_result 4" spacing="0"/> </process> </operator> </process>
11 -
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornYou can also use the Explain Predictions operator to do the same thing even if you don't have access to Automodel.
5 -
SGolbert RapidMiner Certified Analyst, Member Posts: 344 UnicornThere is the operator "Model Simulator" that does exactly that, Automodel is not necessary. In fact, it is used by Automodel if you take a look at the underlying process (no black boxes indeed ).Regards,Sebastian6
Answers
Did you saw some trees generated by Random forest by connecting model output from apply model to results? Here you can see how the trees are differentiating.
Thanks,
Varun
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing