The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Medical Dictionary"
Hi!
I'm a newbie in rapidMiner and in the world of mining in general. I'm working with medical and scientific texts and my goal now is to pre-process them in a way that is suitable for clustering.
Ideally I want to use a medical/scientific dictionary to help me in the stemming and pre-processing phase, but I don't really now where to search..
I really hope that someone is able to answer these questions:
- which is the content (and the format) of a dictionary to be used in RapidMiner?
- are there some medical/scientific dictionaries available on the web? Where can I find them?
- If the previous answer is no..where I can find dictionaries (non scientific ones I mean) on the web?
Thanks for you attention!
Lorenzo
I'm a newbie in rapidMiner and in the world of mining in general. I'm working with medical and scientific texts and my goal now is to pre-process them in a way that is suitable for clustering.
Ideally I want to use a medical/scientific dictionary to help me in the stemming and pre-processing phase, but I don't really now where to search..
I really hope that someone is able to answer these questions:
- which is the content (and the format) of a dictionary to be used in RapidMiner?
- are there some medical/scientific dictionaries available on the web? Where can I find them?
- If the previous answer is no..where I can find dictionaries (non scientific ones I mean) on the web?
Thanks for you attention!
Lorenzo
Tagged:
0
Answers
there is a simple dictionary feature available in RapidMiner which at leasts supports word replacements including those using regular expressions. The used operator is called "DictionaryStemmer". A dictionary in RapidMiner is a file containing matching rules, i.e. each line contains matching rules for a given entity. A rule is either a term or a regular expression, for example:
weekday:(.*)day
car_manufacturer: bmw chrysler ford toyota
I don't know any medical dictionaries but that does not mean too much A general dictionary is WordNet but actually we did not notice too much success with using a dictionary for text classification / clustering so we usually avoid the additional work.
Cheers,
Ingo
Your immediate answer gave me a direction where to go and made me save a lot of time.
Really thnks.