WordList -> Document Operator?
I'm trying to batch process a large group of individual text files which I can then tokenize. I'm using the Text Processing operator group. I'm processing the files into a single WordList which I'm then trying to tokenize. Before I can tokenize I need to convert the WordList into a document - there doesn't appear to be a Generate Document operator as is being recommended to me by Quick Fix.
Any ideas?
Sorry for the beginner's question - I'm brand new to this.
Very respectfully,
Ben
Best Answer
-
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
Hi Ben,
No worries - we all started at some point :smileyhappy:
The wordlist is actually the final result of the text processing operators, i.e. after you did all the necessary text processing like tokenization etc. All those steps happen "inside" of the text processing operator (do you see the little icon in the bottom right corner of the operator? This indicates that this is an operator in which you can go "inside" with a double click).
I think it is probably easier if you follow along one of the following videos (there are tons more if you search on Google):
https://rapidminer.com/resource/text-mining-rapidminer/
https://www.youtube.com/watch?v=6EyQ2TWYsVw
http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-loading.html
So what is the point of the wordlist then? This makes sure that you use exactly the same words (and only those) for scoring than for training. This is something which is actually kind of annoying in R for example which is why I really prefer to do text analytics in RapidMiner...
Cheers,
Ingo
1