The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

List of words that are filtered with Stopwords, Stemming and Tokenizing?

Jonas97Jonas97 Member Posts: 2 Learner I
Hello,

is there a function in Rapid Miner that I can use to create a list of words or the number of words, which the Process Steps Filter Stopwords, Stemming and Tokenizing has identiefied and excluded from the analyse of the Textcorpus?

Thank you in advance!

Jonas

Answers

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    I am not sure if there is a direct way to view this, but you could accomplish this if you first run your document through and just tokenize, then run it through a 2nd time and tokenize as well as the other text processing options you want (stopwords, stemming, etc.) and then take both resulting wordlist datasets and use Set Minus (join type) to get the non-matches.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.