The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Remove all lines with text occurency smaller than 10 from certain column"
Hi,
Im trying to refer to a certain column of the sample set and remove all lines smaller than 10. Whats the way to do that?
e.g.
Process Documents from Files >> Filter Stopwords >> Tokenize >> Transform Cases >> Stem >> ??? now remove all lines where the clumn "text occurence" is lower than 10 ???
Tagged:
0
Answers
Your question is a bit confusing. Do you want to get rid of tokens that occur less than 10 times, or sentences (lines) that have fewer than 10 tokens? In either case, RapidMiner can do it. In the first case, just use the pruning options in Process Documents and set an absolute threshold of 10. In the 2nd case, split each sentence into a separate document (you can use "Cut Documents" for this) and then "Extract Token Number" and then filter for any document (sentence) that has token length fewer than 10.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thanks for the fast answer.
Let me try to rephrase a bit: The task is to remove all words from our document with a total occurance smaller than 10. I already tried the pruning operator, but since there is no option to refer to the column "total occurance", i dont have the opporutity to prune after it / remove all words with a smaller occurance than 10
.
Ah, got it. "Wordlist to Data" will let you take the wordlist and turn it into an exampleset and then you will be able to Filter on the "Total Occurrences" column.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Okay. I did the first part, but I still cant filter for columns. Where do I apply the filter? / Which filter do I apply
Use "Filter Examples" and then set your condition to values where the Total Occurrence column is greater than 10.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts