"(Normal ?) bug : log all criteria / Optimization of cluster model"
Hi,
It's to report a bug when the parameter log all criteria is checked for the optimization of a cluster model (Kmeans) .
When the process is executed, RapidMiner raise the following error :
java.lang.ArrayIndexOutOfBoundsException
When RM create the Optimize Parameters results, each row has in theory a different length - (length(row(i+1)) = length(row(i)) + 1 -
because for each row , RM add Avg. within centroid distance_cluster_i. So when RM try to create the second row, it raise an error
because the dimensions of the table change.
I hope it is understanble. Here the process :
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="open_file" compatibility="8.0.001" expanded="true" height="68" name="Open File" width="90" x="45" y="34">
<parameter key="resource_type" value="URL"/>
<parameter key="url" value="https://archive.ics.uci.edu/ml/machine-learning-databases/00292/Wholesale customers data.csv"/>
</operator>
<operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34">
<parameter key="csv_file" value="C:\Users\lueth\Desktop\Wholesale customers data.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Channel.true.binominal.attribute"/>
<parameter key="1" value="Region.true.polynominal.attribute"/>
<parameter key="2" value="Fresh.true.integer.attribute"/>
<parameter key="3" value="Milk.true.integer.attribute"/>
<parameter key="4" value="Grocery.true.integer.attribute"/>
<parameter key="5" value="Frozen.true.integer.attribute"/>
<parameter key="6" value="Detergents_Paper.true.integer.attribute"/>
<parameter key="7" value="Delicassen.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Channel|Region"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="145" name="Multiply" width="90" x="447" y="34"/>
<operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="8.0.001" expanded="true" height="124" name="Optimize Parameters (Grid)" width="90" x="715" y="391">
<list key="parameters">
<parameter key="Clustering.k" value="[2.0;10;10;linear]"/>
</list>
<parameter key="log_all_criteria" value="true"/>
<process expanded="true">
<operator activated="true" class="k_means" compatibility="8.0.001" expanded="true" height="82" name="Clustering" width="90" x="112" y="34">
<parameter key="k" value="3"/>
</operator>
<operator activated="true" class="cluster_distance_performance" compatibility="8.0.001" expanded="true" height="103" name="Performance" width="90" x="313" y="34"/>
<connect from_port="input 1" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_op="Performance" to_port="cluster model"/>
<connect from_op="Clustering" from_port="clustered set" to_op="Performance" to_port="example set"/>
<connect from_op="Performance" from_port="performance" to_port="performance"/>
<connect from_op="Performance" from_port="cluster model" to_port="model"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="x_means" compatibility="8.0.001" expanded="true" height="82" name="X-Means" width="90" x="715" y="136"/>
<operator activated="true" class="k_means" compatibility="8.0.001" expanded="true" height="82" name="k-Means" width="90" x="715" y="34">
<parameter key="measure_types" value="NumericalMeasures"/>
</operator>
<operator activated="true" class="agglomerative_clustering" compatibility="8.0.001" expanded="true" height="82" name="Agglomerative Clustering" width="90" x="715" y="238">
<parameter key="mode" value="AverageLink"/>
<parameter key="measure_types" value="NumericalMeasures"/>
</operator>
<connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
<connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="k-Means" to_port="example set"/>
<connect from_op="Multiply" from_port="output 2" to_op="X-Means" to_port="example set"/>
<connect from_op="Multiply" from_port="output 3" to_op="Agglomerative Clustering" to_port="example set"/>
<connect from_op="Multiply" from_port="output 4" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 6"/>
<connect from_op="Optimize Parameters (Grid)" from_port="model" to_port="result 7"/>
<connect from_op="Optimize Parameters (Grid)" from_port="parameter set" to_port="result 8"/>
<connect from_op="X-Means" from_port="cluster model" to_port="result 4"/>
<connect from_op="X-Means" from_port="clustered set" to_port="result 5"/>
<connect from_op="k-Means" from_port="cluster model" to_port="result 1"/>
<connect from_op="k-Means" from_port="clustered set" to_port="result 2"/>
<connect from_op="Agglomerative Clustering" from_port="cluster model" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
<portSpacing port="sink_result 6" spacing="0"/>
<portSpacing port="sink_result 7" spacing="0"/>
<portSpacing port="sink_result 8" spacing="0"/>
<portSpacing port="sink_result 9" spacing="0"/>
</process>
</operator>
</process>
What is your opinion about that, do you think it deserves "Product feedback" ?
Regards,
Lionel
Comments
Hi again,
Little update : the problem concern log all criteria, so the Loop parameters operator is concerned too.
Regards,
Lionel
I would report this as a bug in the Product Feedback board.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
moving this thread to Product Feedback.
Scott
Hi Lionel,
we are looking into this. We will keep you updated here!
Regards
Jan
Hi Lionel,
we temporarily fixed this for the Beta by throwing an appropriate user error and are working on a permanent solution
Regards
Jan
Hi Jan,
Thanks you for your feedback,
Regards,
Lionel
@jczogalla - mark this resolved or still investigating?
@sgenzer - Leave it as investigating please.