The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Extraction of sentences based on a wordlist (to create a new doc)
Hello,
For the purpose of my thesis I have to analyze multiple corporate reports. I have to extract from these reports sentences that contains specific words (from a wordlist) and create a document with all the selected sentences, which will be used later for further analysis.
For that I used first a "read document" operator. Then I used a "process document" operator that contains a tokenize operator (based on linguistic sentences). After (still inside the process documents operator), I used the filter tokens by content and I put in the string parameter the specific words that I want in the retained sentences.
For the purpose of my thesis I have to analyze multiple corporate reports. I have to extract from these reports sentences that contains specific words (from a wordlist) and create a document with all the selected sentences, which will be used later for further analysis.
For that I used first a "read document" operator. Then I used a "process document" operator that contains a tokenize operator (based on linguistic sentences). After (still inside the process documents operator), I used the filter tokens by content and I put in the string parameter the specific words that I want in the retained sentences.
My problem is that I can't put all the selected sentences in a list where they can be read easily seperatly. Each selected sentence becomes an attribute. I think my problem is not complicated but I can't find an answer on the forum that solves my problem.
I don't know much about data and how to use Rapidminer to do textmining (first time). I would like to apologize because it is possible that the answer is on the forum and I am the one who is doing the research wrong.
Thank you!
Thank you!
Tagged:
0