The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Text Processing Help! (Beginner at Rapidminer)
antioquia_jonas
Member Posts: 1 Learner II
Im new to Rapidminer and I wanted to generate N-grams from my excel file that contains comments and replies from forum posts. My process design currently contains the following operators: Data, Process Documents (w/ Tokenize, Filter Stopwords English, Generate n-grams, Filter Tokens by Length), and Write Excel. I am not sure why my results are showing me all the possible combinations of words within the data instead of just showing me the combinations that occur twice or more. Maybe im missing an important detail. Really need urgent help! TIA!
(Images below depicting my current problem)
what i want it to look likewhat it actually looks like
Tagged:
0
Answers
I want to extract five words with the highest tf-idf in the output tf-idf matrix.
How should i do ???
Thanks
Hi @antioquia_jonas,
You can find here a process, which extract the token and the number of occurences of this token in an Excel file.
I don't know how to create the attribute "string" (where the token is repeated n times).
This process is to adapt to your own data :
I hope it helps,
Regards,
Lionel