The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Text Processing

LucasCuLucasCu Member Posts: 1 Learner III
edited November 2018 in Help
Hi all,

I'm new to RapidMiner, and have downloaded it for the soul purpose of text processing.

I have followed the introductory tutorial to text processing, and have successfully worked on a text document to look at word occurrences etc.

What I would like to do is use a codebook I have created (approximately 30 words/phrases) to see whether these particular groups of words occur within a text document (e.g. forced labor, compliance, human rights). I am hoping to do this on a number of documents in order to measure the frequency of phrases.

Would someone be able to point me in the direction of a tutorial or video that explains how I apply a specific list of words/phrases in text processing?

Any feedback would be greatly appreciated  :)

Thank you,

Lucas

Answers

  • aruberutouaruberutou Member Posts: 23 Contributor II
    Hi, Lucas,

    I, myself am new, but what I think you want to do is the following.


    #1: Load your word list into a csv or excel file.
    #2: run that through "process documents from data", leaving all of the options at default.
    #3: within the process, simply connect the input and output ports
    #4: go back up to your main process

    If you pay attention, you will see that the process documents has two output ports. The second one is "word list". Now,

    #1: setup your text mining process as you usually would
    #2: connect the word list port (per the above) to the incoming wordlist port in your original process

    This will most definitely solve your problem. In fact, as your experience increases, you will find this type of setup becomes increasingly more essential to your work: step 1, generate a word list and/or model; step 2, apply to another data set.

    I hope this helps!
Sign In or Register to comment.