The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Dictionary based analysis"
Hi!
Is there a way to use RapidMiner to perform dictionary-based analysis of document collections?
In particular I'm interested in term frequency and other statistics to be applied to term occurrences in the documents, where the terms of interest are provided by the user, already classified in one or more user-defined category lists (dictionaries).
Thanks for your help!
Giulio
Is there a way to use RapidMiner to perform dictionary-based analysis of document collections?
In particular I'm interested in term frequency and other statistics to be applied to term occurrences in the documents, where the terms of interest are provided by the user, already classified in one or more user-defined category lists (dictionaries).
Thanks for your help!
Giulio
Tagged:
0
Answers
yes this is easily possible with the Text Processing Extension. You can simply use the Dictionary based filtering to remove all uninteresting words.
Another way around would be to first count all words and then postprocess this word list using the "WordList to Data" operator.
Greetings,
Sebastian