The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
extract information
Hello,
I have got a txt file with more than 100 articles (containing date, headline, text, author).
I want the program to list all the terms ending with -ing, -ion etc. Afterwards i want the program to sort the terms by frequency alphabeticaly. Unfortunately I am a beginner and i dont know how to go on.
The following steps are working at the moment
1. Read txt
2. Tokenize
3. Delete stopwords
After these steps Rapidminer gives me the text without stopword. But how can i make the program to give me
the words with different endings? Is "extract information" the correct input?
I have got a txt file with more than 100 articles (containing date, headline, text, author).
I want the program to list all the terms ending with -ing, -ion etc. Afterwards i want the program to sort the terms by frequency alphabeticaly. Unfortunately I am a beginner and i dont know how to go on.
The following steps are working at the moment
1. Read txt
2. Tokenize
3. Delete stopwords
After these steps Rapidminer gives me the text without stopword. But how can i make the program to give me
the words with different endings? Is "extract information" the correct input?
0
Answers
the Filter Tokens operator is what you are looking for. Set "condition" to "matches" and enter a regular expression like , this should give you the expected results.
Best
Marius
That helped me a lot!
Can somebody tell me if it is possible du remove the duplicates?
Best, Marius