The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Identify common atributes using clusters
Hi, I'm quite new to RapidMiner. I was working with the white wine quality dataset from http://www3.dsi.uminho.pt/pcortez/wine/. I have to identify common atributes that generate wines with quality superior to 6 using clustering (k-means and k-menoids). This is the process:
Really don't know how to achieve this.
Thank you in advance.
Tagged:
0
Answers
Hi,
Could please share the XML code of the RapidMiner process you have built (of the screenshot you shared)?
This would help to recreate the process with exact parameters of the operators you have set and work on getting the results you are looking for.
Thanks,
Pavithra
Sorry, the process is quite simple:
Hi,
Thanks for sharing the XML code.
I have tried clustering with a slightly different approach. As seen in the process screenshot,
From the Plot tab of X-Means output, we can observe that Cluster 1 is distinct from other clusters with respect to chloride contents as its, centroid value is greater than centroid values of other attributes.
4. Further, I have used Filter examples to filter out data/examples with quality value >= 6 to narrow down the analysis. We could use charting options in the Filter examples results window to explore the data and observe the distributions of the attribute values in the subset and their cluster groupings.
5. I have used Weighting by relevance to see the importance of these attributes in the data.
Hope this helps. Let me know for any further question/concerns here. Attached are the screenshots and XML code.
Lastly, I had a question, the goal here is to identify common attributes that generate wines with quality superior to 6. This is more like supervised learning problem(target is quality) rather than unsupervised learning(clustering). Any specific reason to choose clustering approach here?
Thanks,
Cheers,
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="retrieve" compatibility="7.6.000" expanded="true" height="68" name="Retrieve winequality-white" width="90" x="45" y="136">
<parameter key="repository_entry" value="//MyRepository/winequality/winequality-white"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.000" expanded="true" height="82" name="Set Role" width="90" x="179" y="136">
<parameter key="attribute_name" value="quality"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="normalize" compatibility="7.6.000" expanded="true" height="103" name="Normalize" width="90" x="313" y="136">
<parameter key="attributes" value="|volatile acidity|total sulfur dioxide|sulphates|residual sugar|pH|free sulfur dioxide|fixed acidity|density|citric acid|chlorides|alcohol"/>
</operator>
<operator activated="true" class="x_means" compatibility="7.6.000" expanded="true" height="82" name="X-Means" width="90" x="447" y="136">
<parameter key="add_as_label" value="true"/>
<parameter key="measure_types" value="MixedMeasures"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="7.6.000" expanded="true" height="103" name="Filter Examples" width="90" x="581" y="238">
<list key="filters_list">
<parameter key="filters_entry_key" value="quality.ge.6"/>
</list>
</operator>
<operator activated="true" class="featselext:maximum_relevance_weighting" compatibility="1.1.004" expanded="true" height="82" name="MR-Weighting" width="90" x="782" y="238"/>
<operator activated="true" class="multiply" compatibility="7.6.000" expanded="true" height="103" name="Multiply" width="90" x="581" y="34"/>
<operator activated="true" class="extract_prototypes" compatibility="7.6.000" expanded="true" height="82" name="Extract Cluster Prototypes" width="90" x="782" y="34"/>
<connect from_op="Retrieve winequality-white" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="X-Means" to_port="example set"/>
<connect from_op="X-Means" from_port="cluster model" to_op="Multiply" to_port="input"/>
<connect from_op="X-Means" from_port="clustered set" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="MR-Weighting" to_port="example set"/>
<connect from_op="MR-Weighting" from_port="weights" to_port="result 4"/>
<connect from_op="MR-Weighting" from_port="example set" to_port="result 5"/>
<connect from_op="Multiply" from_port="output 1" to_op="Extract Cluster Prototypes" to_port="model"/>
<connect from_op="Multiply" from_port="output 2" to_port="result 1"/>
<connect from_op="Extract Cluster Prototypes" from_port="example set" to_port="result 2"/>
<connect from_op="Extract Cluster Prototypes" from_port="model" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
<portSpacing port="sink_result 6" spacing="0"/>
</process>
</operator>
</process>
cluster centroid plotprocessweightings