The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Search for Keywords

Tim91Tim91 Member Posts: 5 Contributor II

Hello community,

I am currently doing my masters degree and in one of our courses me and my group have to work on a project with rapidminer. We have no background in programming and this is the first time we are working with rapidminer. Our task is do create a textmining tool that crawls a list of excel-files and in a first step enables us to search for a list of keywords. We then need to know wether the texts contain those keywords or not. We would also like to know how often a keyword appears in those texts.

We tried using the following operators:

1.       SelectAttributes

2.       Filter documents (by content) (we created a loop that goes through the excel-file and wrote every text in a separate document)

3.       FilterExamples

However we don’t really know how to use those operators because everything we’ve tried (playing with the different options of the operators) didn’t work out.

Another thing we thought about is to create a cut-set of the texts and the keywordlist and see which elements the two files have in common (but again we don’t know how to implement this).

Are we heading towards the right direction or do you have any tips how we should start?

I hope you can help us

 

Cheers

Tim

Best Answer

Answers

  • Tim91Tim91 Member Posts: 5 Contributor II
    Hi Martin,
    Thank you for your answer, we'll try that.
  • Tim91Tim91 Member Posts: 5 Contributor II
    One Followup Question,
    right now we have two streams, the first one is reading the excel list with the texts and the second one is reading the one with the keywords. Then we used Process Documents with tokenize for both paths (1.Read excel 2.Nominal to text 3.Process Documents form Data 4.Data to Documents-Process Documents). We see all the words and in which row they occur, but is it possible to filter the results or change the setting so that we only see the keywords? This would be a lot more convenient because you wouldn't have to look through all the words.
Sign In or Register to comment.