Text Processing

LucasCu · July 2015

Hi all,

I'm new to RapidMiner, and have downloaded it for the soul purpose of text processing.

I have followed the introductory tutorial to text processing, and have successfully worked on a text document to look at word occurrences etc.

What I would like to do is use a codebook I have created (approximately 30 words/phrases) to see whether these particular groups of words occur within a text document (e.g. forced labor, compliance, human rights). I am hoping to do this on a number of documents in order to measure the frequency of phrases.

Would someone be able to point me in the direction of a tutorial or video that explains how I apply a specific list of words/phrases in text processing?

Any feedback would be greatly appreciated

Thank you,

Lucas

aruberutou · July 2015

Hi, Lucas,

I, myself am new, but what I think you want to do is the following.

#1: Load your word list into a csv or excel file.
#2: run that through "process documents from data", leaving all of the options at default.
#3: within the process, simply connect the input and output ports
#4: go back up to your main process

If you pay attention, you will see that the process documents has two output ports. The second one is "word list". Now,

#1: setup your text mining process as you usually would
#2: connect the word list port (per the above) to the incoming wordlist port in your original process

This will most definitely solve your problem. In fact, as your experience increases, you will find this type of setup becomes increasingly more essential to your work: step 1, generate a word list and/or model; step 2, apply to another data set.

I hope this helps!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Text Processing

Answers