The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Optimizing Set Macro on 7.5
JEdward
RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
Is anyone else finding problems optimizing Set Macro in version 7.5 of RapidMiner?
Was trying to optimize a python model & found that the value parametrer of Set Macro doesn't appear in Optimize Evolutionary.
<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.5.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="optimize_parameters_evolutionary" compatibility="6.0.003" expanded="true" height="103" name="Optimize Parameters (Evolutionary)" width="90" x="313" y="34">
<list key="parameters">
<parameter key="nTree.value" value="[1.0;100.0]"/>
</list>
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="7.5.001" expanded="true" height="82" name="Hyperparameters" width="90" x="112" y="34">
<process expanded="true">
<operator activated="true" class="set_macro" compatibility="7.5.001" expanded="true" height="82" name="nTree" width="90" x="45" y="34">
<parameter key="macro" value="nTree"/>
<parameter key="value" value="200"/>
</operator>
<operator activated="true" class="set_macro" compatibility="7.5.001" expanded="true" height="82" name="minSizeSplit" width="90" x="246" y="34">
<parameter key="macro" value="minSizeSplit"/>
<parameter key="value" value="4"/>
</operator>
<operator activated="true" class="set_macro" compatibility="7.5.001" expanded="true" height="82" name="minLeafSize" width="90" x="45" y="289">
<parameter key="macro" value="minLeafSize"/>
<parameter key="value" value="2"/>
</operator>
<operator activated="true" class="set_macro" compatibility="7.5.001" expanded="true" height="82" name="maxDepth" width="90" x="45" y="391">
<parameter key="macro" value="maxDepth"/>
<parameter key="value" value="20"/>
</operator>
<connect from_port="in 1" to_op="nTree" to_port="through 1"/>
<connect from_op="nTree" from_port="through 1" to_op="minSizeSplit" to_port="through 1"/>
<connect from_op="minSizeSplit" from_port="through 1" to_op="minLeafSize" to_port="through 1"/>
<connect from_op="minLeafSize" from_port="through 1" to_op="maxDepth" to_port="through 1"/>
<connect from_op="maxDepth" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="id"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="7.5.001" expanded="true" height="145" name="Cross Validation 2" width="90" x="447" y="34">
<parameter key="use_local_random_seed" value="true"/>
<process expanded="true">
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="BDT (sklearn)" width="90" x="112" y="34">
<parameter key="script" value=" import pandas as pd from sklearn.ensemble import GradientBoostingClassifier from sklearn.ensemble import RandomForestClassifier #use RandomForestRegressor for regression problem # This script creates a RandomForestClassifier from SKLearn on RM data # It can be used as a generic template for other sklearn classifiers or regressors def rm_main(data): metadata = data.rm_metadata # Get the list of regular attributes and the label df = pd.DataFrame(metadata).T label = df[df[1]=="label"].index.values regular = df[df[1] != df[1]].index.values # === RandomForest === # # Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset # Create Random Forest object model= RandomForestClassifier(n_estimators = %{nTree} , max_depth = %{maxDepth} , min_samples_split = %{minSizeSplit} # The minimum number of samples required to split an internal node , min_samples_leaf = %{minLeafSize} # The minimum number of samples required to be at a leaf node ) # Train the model using the training sets and check score # model.fit(X, y) model.fit(data[regular], data[label]) # Predict Output # predicted = model.predict(x_test) return (model,regular,label[0]), data"/>
</operator>
<connect from_port="training set" to_op="BDT (sklearn)" to_port="input 1"/>
<connect from_op="BDT (sklearn)" from_port="output 1" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="103" name="Apply Model (2)" width="90" x="112" y="34">
<parameter key="script" value="import pandas as pd # rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) def rm_main(rfinfo, data): rf = rfinfo[0] regular = rfinfo[1] label = rfinfo[2] meta = data.rm_metadata predictions = rf.predict(data[regular]) confidences = rf.predict_proba(data[regular]) predictions = pd.DataFrame(predictions, columns=["prediction("+label+")"]) confidences = pd.DataFrame(confidences, columns=["confidence(" + str(c) + ")" for c in rf.classes_]) data = data.join(predictions) data = data.join(confidences) data.rm_metadata = meta data.rm_metadata["prediction("+label+")"] = ("nominal","prediction") for c in rf.classes_: data.rm_metadata["confidence("+str(c)+")"] = ("numerical","confidence_"+str(c)) return data, rf"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="7.5.001" expanded="true" height="82" name="Python" width="90" x="246" y="34">
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model (2)" to_port="input 1"/>
<connect from_port="test set" to_op="Apply Model (2)" to_port="input 2"/>
<connect from_op="Apply Model (2)" from_port="output 1" to_op="Python" to_port="labelled data"/>
<connect from_op="Python" from_port="performance" to_port="performance 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Python</description>
</operator>
<operator activated="true" class="log" compatibility="7.5.001" expanded="true" height="82" name="Log" width="90" x="715" y="30">
<list key="log">
<parameter key="Count" value="operator.Apply Model (2).value.applycount"/>
<parameter key=" Testing Error" value="operator.Cross Validation 2.value.performance 1"/>
<parameter key="Training StdDev" value="operator.Cross Validation 2.value.std deviation 1"/>
<parameter key="nTree" value="operator.nTree.parameter.value"/>
<parameter key="maxDepth" value="operator.maxDepth.parameter.value"/>
<parameter key="minLeafSize" value="operator.minLeafSize.parameter.value"/>
<parameter key="minSizeSplit" value="operator.minSizeSplit.parameter.value"/>
</list>
</operator>
<connect from_port="input 1" to_op="Hyperparameters" to_port="in 1"/>
<connect from_op="Hyperparameters" from_port="out 1" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Cross Validation 2" to_port="example set"/>
<connect from_op="Cross Validation 2" from_port="performance 1" to_op="Log" to_port="through 1"/>
<connect from_op="Log" from_port="through 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Optimize Parameters (Evolutionary)" to_port="input 1"/>
<connect from_op="Optimize Parameters (Evolutionary)" from_port="performance" to_port="result 1"/>
<connect from_op="Optimize Parameters (Evolutionary)" from_port="parameter" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
0
Answers
I think this isn't a bug because the Evolutionary optimzer uses the genetic parameters to 'randomly' assign values, so you can't take a Grid approach this. Did you try this in a regular Grid optmizer?
If not a bug then it's a missing feature. I've managed to create a workaround which works, but is clearly not the most efficient. Let's move this thread into feature requests.
Edit: realise my process didn't display properly.
As you can see, the workaround uses RM modelling operators to represent the values that I want to change in the Python code. So the feature I'd like is an operator which Optimize Parameters Evolutionary can access allowing values to be set and used by macros.