The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
set categories by finding words in a document
Hello everyone,
I am new to Rapidminer but enjoying the ride so far. I am stuck with a couple of issues..
First, I have a set of 3 categories, each one is defined by 5 words.. meaning that if a document has those 5 words in its corpus then I would like to assign that document to that particular category.
In other words, I would like to go through my dataset, search the corpus for the 5 words of each category and associate the document to the category in which it finds all 5 words.
Is there a way to do that in Rapidminer?
Cheers,
D
I am new to Rapidminer but enjoying the ride so far. I am stuck with a couple of issues..
First, I have a set of 3 categories, each one is defined by 5 words.. meaning that if a document has those 5 words in its corpus then I would like to assign that document to that particular category.
In other words, I would like to go through my dataset, search the corpus for the 5 words of each category and associate the document to the category in which it finds all 5 words.
Is there a way to do that in Rapidminer?
Cheers,
D
Tagged:
0
Answers
you should use the Text Processing extension to tokenize your documents. You end up with an example set which contains the documents as rows and the tokens as columns. If the value of a column is greater than 0 in a row it means that the word appeared in the corresponding document. You can then use Generate Attributes to create a new attribute by checking if the 5 words are present and writing the result to the new attribute. Change the vector_creation parameter of your process documents to Binary Term Occurrences. Have a look at the attached process.
Best,
Marius