The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Forward Selection error thrown
Hi- I'm using the Forward Selection operator for the first time and encountered the error below.
I'm working with time series data, GBTs, and the sliding window validation operator. I'm using the default parameters for the FS operator.
I'm using RM studio 9.2
I'm working with time series data, GBTs, and the sliding window validation operator. I'm using the default parameters for the FS operator.
I'm using RM studio 9.2
Tagged:
0
Best Answer
-
varunm1 Member Posts: 1,207 UnicornHello @Noel
I checked with Ingo about this. The issue is with H2O operators that are removing variables with constant values and throwing an error, for now its tough to resolve this from RM side as this is developed by H20, RM will try to bring it to their notice. For complete discussion, please visit below thread.
https://community.rapidminer.com/discussion/55807/running-out-of-features-during-feature-selection#latest
Hope this helpsRegards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
7
Answers
Looks like there are no attributes going inside the GBT operator. Can you set a breakpoint before GBT operator (right-click on GBT operator and click breakpoint before) and see if the data going into this operator consists of at least 2 attributes?
Can you provide your XML code to see the process as well?
There was an earlier discussion about this in the below thread.
https://community.rapidminer.com/discussion/55807/running-out-of-features-during-feature-selection#latest
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
So, apparently three "attributes" are going in, but one is the label and one is the id.
Could it be that windowed data was not contemplated when this operator was constructed?
Can you provide XML code if possible? I want to look at the process. I tried with the below sample process and it didn't threw an error and this is a regression problem.
<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process" origin="GENERATED_SAMPLE">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve Prices of Gas Station" origin="GENERATED_SAMPLE" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/Time Series/data sets/Prices of Gas Station"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="9.3.001" expanded="true" height="82" name="Filter Example Range" origin="GENERATED_SAMPLE" width="90" x="179" y="34">
<parameter key="first_example" value="1"/>
<parameter key="last_example" value="16"/>
<parameter key="invert_filter" value="true"/>
</operator>
<operator activated="true" breakpoints="after" class="time_series:windowing" compatibility="9.3.001" expanded="true" height="82" name="Windowing" origin="GENERATED_SAMPLE" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="gas price / euro (times 1000)"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="numeric"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="real"/>
<parameter key="block_type" value="value_series"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_series_end"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="has_indices" value="true"/>
<parameter key="indices_attribute" value="date"/>
<parameter key="window_size" value="48"/>
<parameter key="no_overlapping_windows" value="false"/>
<parameter key="step_size" value="24"/>
<parameter key="create_horizon_(labels)" value="true"/>
<parameter key="horizon_attribute" value="gas price / euro (times 1000)"/>
<parameter key="horizon_size" value="1"/>
<parameter key="horizon_offset" value="23"/>
</operator>
<operator activated="true" class="optimize_selection_forward" compatibility="9.3.001" expanded="true" height="103" name="Forward Selection" width="90" x="581" y="34">
<parameter key="maximal_number_of_attributes" value="10"/>
<parameter key="speculative_rounds" value="0"/>
<parameter key="stopping_behavior" value="without increase"/>
<parameter key="use_relative_increase" value="true"/>
<parameter key="alpha" value="0.05"/>
<process expanded="true">
<operator activated="true" class="concurrency:cross_validation" compatibility="9.3.001" expanded="true" height="145" name="Cross Validation (2)" width="90" x="313" y="85">
<parameter key="split_on_batch_attribute" value="false"/>
<parameter key="leave_one_out" value="false"/>
<parameter key="number_of_folds" value="5"/>
<parameter key="sampling_type" value="automatic"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="enable_parallel_execution" value="true"/>
<process expanded="true">
<operator activated="true" class="h2o:gradient_boosted_trees" compatibility="9.3.001" expanded="true" height="103" name="Gradient Boosted Trees (2)" width="90" x="112" y="34">
<parameter key="number_of_trees" value="100"/>
<parameter key="reproducible" value="false"/>
<parameter key="maximum_number_of_threads" value="4"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="maximal_depth" value="10"/>
<parameter key="min_rows" value="10.0"/>
<parameter key="min_split_improvement" value="0.0"/>
<parameter key="number_of_bins" value="20"/>
<parameter key="learning_rate" value="0.01"/>
<parameter key="sample_rate" value="1.0"/>
<parameter key="distribution" value="AUTO"/>
<parameter key="early_stopping" value="false"/>
<parameter key="stopping_rounds" value="1"/>
<parameter key="stopping_metric" value="AUTO"/>
<parameter key="stopping_tolerance" value="0.001"/>
<parameter key="max_runtime_seconds" value="0"/>
<list key="expert_parameters"/>
</operator>
<connect from_port="training set" to_op="Gradient Boosted Trees (2)" to_port="training set"/>
<connect from_op="Gradient Boosted Trees (2)" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="9.3.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="performance_regression" compatibility="9.3.001" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="34">
<parameter key="main_criterion" value="first"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="false"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="false"/>
<parameter key="squared_correlation" value="false"/>
<parameter key="prediction_average" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
</operator>
<connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
<connect from_op="Performance (2)" from_port="performance" to_port="performance 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_port="example set" to_op="Cross Validation (2)" to_port="example set"/>
<connect from_op="Cross Validation (2)" from_port="performance 1" to_port="performance"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
</process>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="9.3.001" expanded="true" height="145" name="Cross Validation" origin="GENERATED_SAMPLE" width="90" x="782" y="34">
<parameter key="split_on_batch_attribute" value="false"/>
<parameter key="leave_one_out" value="false"/>
<parameter key="number_of_folds" value="10"/>
<parameter key="sampling_type" value="automatic"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="enable_parallel_execution" value="true"/>
<process expanded="true">
<operator activated="true" class="h2o:gradient_boosted_trees" compatibility="9.3.001" expanded="true" height="103" name="Gradient Boosted Trees" origin="GENERATED_SAMPLE" width="90" x="179" y="34">
<parameter key="number_of_trees" value="100"/>
<parameter key="reproducible" value="false"/>
<parameter key="maximum_number_of_threads" value="4"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="maximal_depth" value="5"/>
<parameter key="min_rows" value="10.0"/>
<parameter key="min_split_improvement" value="0.0"/>
<parameter key="number_of_bins" value="20"/>
<parameter key="learning_rate" value="0.1"/>
<parameter key="sample_rate" value="1.0"/>
<parameter key="distribution" value="AUTO"/>
<parameter key="early_stopping" value="false"/>
<parameter key="stopping_rounds" value="1"/>
<parameter key="stopping_metric" value="AUTO"/>
<parameter key="stopping_tolerance" value="0.001"/>
<parameter key="max_runtime_seconds" value="0"/>
<list key="expert_parameters"/>
</operator>
<connect from_port="training set" to_op="Gradient Boosted Trees" to_port="training set"/>
<connect from_op="Gradient Boosted Trees" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="9.3.001" expanded="true" height="82" name="Apply Model" origin="GENERATED_SAMPLE" width="90" x="45" y="34">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="performance_regression" compatibility="9.3.001" expanded="true" height="82" name="Performance" origin="GENERATED_SAMPLE" width="90" x="246" y="34">
<parameter key="main_criterion" value="first"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="true"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="false"/>
<parameter key="squared_correlation" value="false"/>
<parameter key="prediction_average" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Prices of Gas Station" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_op="Windowing" to_port="example set"/>
<connect from_op="Windowing" from_port="windowed example set" to_op="Forward Selection" to_port="example set"/>
<connect from_op="Forward Selection" from_port="example set" to_op="Cross Validation" to_port="example set"/>
<connect from_op="Cross Validation" from_port="model" to_port="result 1"/>
<connect from_op="Cross Validation" from_port="example set" to_port="result 2"/>
<connect from_op="Cross Validation" from_port="test result set" to_port="result 3"/>
<connect from_op="Cross Validation" from_port="performance 1" to_port="result 4"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
<description align="center" color="blue" colored="true" height="166" resized="true" width="259" x="27" y="130">Retrieve the German gas prices data set from the Samples/Time Series folder.<br><br>Remove the first 16 Examples, so that the remaining Examples starts at 9:00 AM</description>
<description align="center" color="green" colored="true" height="427" resized="true" width="366" x="313" y="130">Perform a Windowing on the data set.<br><br>The window size is set to 48, to include the prices of the previous 48 hours for each window.<br><br>The step size is set to 24, so that we only look at windows which ends at 8:00 AM.<br><br>The horizon size is set to 1, cause we want to forecast 1 price.<br><br>The horizon offset is set to 23, so that the horizon is 23+1 hours after the window, hence the gas price of the next day at the same time.<br><br>The resulting ExampleSet contains all we need to train any machine learning model on it. A label (the price of the next day, (gas price / euro cents (times 1000) + 24 (horizon); 48 Attributes containing the prices of the last 48 hours (gas price / euro cents (times 1000) - i) and a special attribute holding the last date in window, which is not used in the training).</description>
<description align="center" color="yellow" colored="false" height="91" resized="true" width="230" x="703" y="198">Train a Gradient Boosted Tree on the ExampleSet created by the Windowing operator.</description>
</process>
</operator>
</process>
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
I totally spaced on the sample data thing, my bad! Let me see if I can dig some up.
No worries. I am getting same error in my process for forward selection with GBT and logistic regression as well. I think H20 should resolve this asap.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing