The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Categorize words by belonging to the dictionaries"
Is it possible to categorize words with RapidMiner by belonging to the dictionary?
For example: I have a document and 2 dictionaries. First dictionary is list of hospitals and second dictionary is list of diagnoses. And I want to determine which words from documents are hospitals and which words are from dictionary of diagnoses. Is this possible with RapidMiner?
Thank you very much
For example: I have a document and 2 dictionaries. First dictionary is list of hospitals and second dictionary is list of diagnoses. And I want to determine which words from documents are hospitals and which words are from dictionary of diagnoses. Is this possible with RapidMiner?
Thank you very much
Tagged:
0
Answers
you can merge dictionaries into one file with two columns. In first column will be words from dictionaries and in second will be hospital/diagnoses. And than use "Replace (Dictionary)" operator.
Cheers
Vaclav
Probably you misunderstood what I need. I have a document. This document is a medical record. And I have also 2 lists of words. First file is a list of diagnoses. Second one is a list of hospitals.
I want to determine which words from my medical record are hospitals and which words are diagnoses.
Thank you very much for answer.
ok now I understand. But I don't know what should be the output. Should it be document with replaced these words by class, or number of occurrence of each category, words occurred in input document. Because I think there is no additional variable for store information for each word in document.
Cheers,
Vaclav
The first time using the dictionary of hospitals and the second time using the dictionary of diagnoses?
Best,
JEdward.
My input is for example:
"The previous Duma was widely viewed as little more than a rubber stamp for the Kremlin, says the BBC's Steve Rosenberg in Moscow - adding that this may explain why the campaign has failed to excite the Russian public.The election is being seen as a referendum on Mr Putin's personal popularity, three months before the Russian prime minister runs again for president. He served two terms in the post between 2000 and 2008.
Vaccinations against influenza are usually made available to people in developed countries. Farmed poultry is often vaccinated to avoid decimation of the flocks.[15] The most common human vaccine is the trivalent influenza vaccine (TIV) that contains purified and inactivated antigens against three viral strains. Typically, this vaccine includes material from two influenza A virus subtypes and one influenza B virus strain."
So we have one file with 2 paragraphs. First paragraph is non-medical and second paragraph is with medical information. And I want to classify these paragraphs. So I want to determine which paragraph is medical and which non-medical. I think that could be possible with dictionary with medical words that I mentioned.
My output should be for example:
"Non-medical information: The previous Duma was widely viewed as little more than a rubber stamp for the Kremlin, says the BBC's Steve Rosenberg in Moscow - adding that this may explain why the campaign has failed to excite the Russian public.The election is being seen as a referendum on Mr Putin's personal popularity, three months before the Russian prime minister runs again for president. He served two terms in the post between 2000 and 2008.
Medical information:Vaccinations against influenza are usually made available to people in developed countries. Farmed poultry is often vaccinated to avoid decimation of the flocks.[15] The most common human vaccine is the trivalent influenza vaccine (TIV) that contains purified and inactivated antigens against three viral strains. Typically, this vaccine includes material from two influenza A virus subtypes and one influenza B virus strain."
Thank you very much.