The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Text Clustering Example"
Legacy User
Member Posts: 0 Newbie
Folks,
All of my queries on text clustering are occurring because this "text clustering example", outputs "no results produced". The newsgroup data is present in the directories noted.
Why does it not generate the output identified in the description?
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#Clustering text documents#ylt#/h3#ygt##ylt#p#ygt#In this experiment, texts from two newsgroups are read and clustered. To make the clusters better comprehensible, three keywords are extracted for each cluster and added to the cluster description.#ylt#/p#ygt#"/>
<parameter key="logverbosity" value="status"/>
<operator name="TextInput" class="TextInput" expanded="yes">
<parameter key="default_content_language" value="english"/>
<list key="namespaces">
</list>
<parameter key="prune_above" value="10"/>
<parameter key="prune_below" value="5"/>
<list key="texts">
<parameter key="graphics" value="../data/newsgroup/graphics"/>
<parameter key="hardware" value="../data/newsgroup/hardware"/>
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="5"/>
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
</operator>
<operator name="KMeans" class="KMeans">
</operator>
<operator name="AttributeSumClusterCharacterizer" class="AttributeSumClusterCharacterizer">
</operator>
</operator>
All of my queries on text clustering are occurring because this "text clustering example", outputs "no results produced". The newsgroup data is present in the directories noted.
Why does it not generate the output identified in the description?
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#Clustering text documents#ylt#/h3#ygt##ylt#p#ygt#In this experiment, texts from two newsgroups are read and clustered. To make the clusters better comprehensible, three keywords are extracted for each cluster and added to the cluster description.#ylt#/p#ygt#"/>
<parameter key="logverbosity" value="status"/>
<operator name="TextInput" class="TextInput" expanded="yes">
<parameter key="default_content_language" value="english"/>
<list key="namespaces">
</list>
<parameter key="prune_above" value="10"/>
<parameter key="prune_below" value="5"/>
<list key="texts">
<parameter key="graphics" value="../data/newsgroup/graphics"/>
<parameter key="hardware" value="../data/newsgroup/hardware"/>
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="5"/>
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
</operator>
<operator name="KMeans" class="KMeans">
</operator>
<operator name="AttributeSumClusterCharacterizer" class="AttributeSumClusterCharacterizer">
</operator>
</operator>
Tagged:
0
Answers
this sample (from a plugin (!)) was not updated to the fact that the automatic cluster characterization was removed some time ago. I can hardly believe that this process has worked at all (did you really run it on a fresh RM 4.4 installation?) since I would think that the operator "AttributeSumClusterCharacterizer" is deprecated if not even removed - but I can be mistaken.
Before you ask: the characterization took a long a time even if you were not interested in it and worked not well enough. Much better characterizations can be found with the approaches I sketched in the other thread and hence it was removed.
Cheers,
Ingo