The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Clustering of the Text

I wanted to cluster the survey comments in different categories like
Comment Category
Restrooms Stinks FMG
Food was costly Restaurant
Poor service in restaurant Restaurant
I want to read to read the comments from excel and write it back in excel with Category.
Can anyone please suggest how to do this?
Comment Category
Restrooms Stinks FMG
Food was costly Restaurant
Poor service in restaurant Restaurant
I want to read to read the comments from excel and write it back in excel with Category.
Can anyone please suggest how to do this?
if you already know which categories you are looking for, you should label your training data manually with these categories and then train a classification algorithm on it. A good choice for text processing could be the SVM.
If you can't or don't want to label your data, just run a clustering algorithm like k-Means on your preprocessed documents, and have a look at the clusters afterwards to see if they make sense for you.
Best, Marius
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.006">
<operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
<process expanded="true" height="252" width="681">
<operator activated="true" class="read_excel" compatibility="5.2.006" expanded="true" height="60" name="Read Excel" width="90" x="45" y="75">
<parameter key="excel_file" value="C:\Users\guagg\Desktop\All\RapidMiner\read.xls"/>
<parameter key="imported_cell_range" value="A1:A6"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
<operator activated="true" class="k_means" compatibility="5.2.006" expanded="true" height="76" name="Clustering" width="90" x="313" y="75">
<parameter key="add_as_label" value="true"/>
<parameter key="remove_unlabeled" value="true"/>
<parameter key="k" value="3"/>
<parameter key="measure_types" value="NominalMeasures"/>
<parameter key="nominal_measure" value="RussellRaoSimilarity"/>
<parameter key="divergence" value="GeneralizedIDivergence"/>
<operator activated="true" class="numerical_to_binominal" compatibility="5.2.006" expanded="true" height="76" name="Numerical to Binominal" width="90" x="514" y="120"/>
<connect from_op="Read Excel" from_port="output" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
<connect from_op="Clustering" from_port="clustered set" to_op="Numerical to Binominal" to_port="example set input"/>
<connect from_op="Numerical to Binominal" from_port="example set output" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
But its not giving me correct results.
cluster_0 I love food
cluster_1 washroom stinks
cluster_2 service is poor
cluster_0 food is great
cluster_0 not great service
Last one should be Cluster 2 not Cluster 0.
Please suggest!!!
Best, Marius
I cant find the link. Please give again.