The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
text mining - linguistic preprocessing (thesaurus, synonyms, concepts, ...)
Hello,
I am trying to use RapidMiner for some text mining. In order to improve my classification results I would like to "improve" my raw data.
My raw data consists of short (max 2 sentences) descriptions of machine failures. So the raw data is free text with no regulations at all.
My problem is therefore: Rapidminer can't differentiate between "part is overheated" or "part got too warm". To solve this I have to ideas (probably there are a lot more and a lot better ideas)
first: find and replace similar words
Using some preprocessing to realize, that "overheated" and "too warm" means nearly the same. --> Is it possible to integrate a Thesaurus to identify synonyms (for example: openthesaurus)
second: use categories
Replace word with categories. So "apple" and "pear" are replaced to "fruit". But again, I would need to integrate a tool/ an addon to solve this in Rapidminer.
--> Have you done anything similar with RapidMiner before? Or could you give me a tip how to integrate this linguistic preprocessing into rapidminer?
And to make it a bit more complicated: my raw data is in German...
Thanks a lot for your help and your ideas
Aaron
I am trying to use RapidMiner for some text mining. In order to improve my classification results I would like to "improve" my raw data.
My raw data consists of short (max 2 sentences) descriptions of machine failures. So the raw data is free text with no regulations at all.
My problem is therefore: Rapidminer can't differentiate between "part is overheated" or "part got too warm". To solve this I have to ideas (probably there are a lot more and a lot better ideas)
first: find and replace similar words
Using some preprocessing to realize, that "overheated" and "too warm" means nearly the same. --> Is it possible to integrate a Thesaurus to identify synonyms (for example: openthesaurus)
second: use categories
Replace word with categories. So "apple" and "pear" are replaced to "fruit". But again, I would need to integrate a tool/ an addon to solve this in Rapidminer.
--> Have you done anything similar with RapidMiner before? Or could you give me a tip how to integrate this linguistic preprocessing into rapidminer?
And to make it a bit more complicated: my raw data is in German...
Thanks a lot for your help and your ideas
Aaron
0
Answers
I think that the solution of your problem is :
- Firstly must install 'Text Mining Extension' .
- Secondly install tools linguistic dictionary like Wordnet Extention.
to do those,I suggest you follow the following steps:
In Menu bar of RapidMiner go " Help -> updates and Extentions" . search "Text Mining Extension" and install it.
search also Wordnet Extention and insall it.
thanks for your help. I have akready installed the "Text Mining Extension" and realized a first clustering of my data.
My wish is now to improve the results.
The wordnet extension is really great for this porpose. But: My data is a german text.
--> Do you have an idea how to use wordnet with another language?
Or is it even possible to programm a similar (but much less powerfull) extension on my own?
Thanks
Aaron
I d'nt have any idea about how to use wordnet in another language.
My problem is with the use of wordnet in RapidMiner how do detecte negation between two sentences.
If there is a solution please help me.
thank you in advance.