The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

WordList -> Document Operator?

benjamin_petersbenjamin_peters Member Posts: 2 Contributor I
edited November 2018 in Help

I'm trying to batch process a large group of individual text files which I can then tokenize. I'm using the Text Processing operator group. I'm processing the files into a single WordList which I'm then trying to tokenize. Before I can tokenize I need to convert the WordList into a document - there doesn't appear to be a Generate Document operator as is being recommended to me by Quick Fix.

 

Any ideas?

 

Sorry for the beginner's question - I'm brand new to this.

 

Very respectfully,

Ben

Best Answer

  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Solution Accepted

    Hi Ben,

     

    No worries - we all started at some point :smileyhappy:

     

    The wordlist is actually the final result of the text processing operators, i.e. after you did all the necessary text processing like tokenization etc.  All those steps happen "inside" of the text processing operator (do you see the little icon in the bottom right corner of the operator? This indicates that this is an operator in which you can go "inside" with a double click).  

     

    I think it is probably easier if you follow along one of the following videos (there are tons more if you search on Google):

     

    https://rapidminer.com/resource/text-mining-rapidminer/

    https://www.youtube.com/watch?v=6EyQ2TWYsVw

    http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-loading.html

     

    So what is the point of the wordlist then?  This makes sure that you use exactly the same words (and only those) for scoring than for training.  This is something which is actually kind of annoying in R for example which is why I really prefer to do text analytics in RapidMiner...

     

    Cheers,

    Ingo

Sign In or Register to comment.