The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How can I calculate the frequency of specific words for each row in the excel data
Hi,
I'm working on a data that each sentence is in separate rows. I want to determine word frequency in each row with a word list that I have created. Then I would like to add these values to my dataframe as a new variable.
For example:
Let's say, I have a list of words that contains apple and banana (it is my dictionary). And I have independent sentences in rows like that:
1. X x x apple x x banana x apple.
2. X apple x x x x.
3. X x banana x apple x.
.
..
...
.
..
...
Now I want to calculate how many times the words in my list have been repeated separately. As a result, the new column I want to create is:
1. = 3
2. = 1
3. = 2
.
..
...
.
..
...
Thanks in advance.
Tagged:
0
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornIf I understand your question, this is pretty straightforward in RapidMiner. Process your text data using the "Process Documents from Data" operator, which allows you to input both a defined wordlist and your data source. Inside you'll need to use Tokenize to split your text into words and then set the word vector option to "term occurrences". The output will be a new attribute (column) for each word in your wordlist with the count of the number of occurrences for the text you process (each text will be its own row or example).5