The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Sentiment Analysis - Numerical Labels, and the search for the right Process"
I have got a question again which might be easy to answer for those of you who already played around with the Sentiment Analysis qualities of Rapidminer. On the one hand I have a collection of thousands of documents where i extracted the information I need and compiled a matrix with the concerning T-IDF scores of expressions appearing in the documents. On the other hand I have a matrix with words which also contains a certain sentiment score between 0 and 1 attributed to each word. The question is now how to bring these two strings together to measure the sentiments reflected in the documents over time. The idea now is match the T-IDF matrix with the word/sentiment score matrix. Or more precisely, I want to look which expressions of the sentiment matrix also appear in the concerning documents and weight them with the respective IDF values. Is there a process which does this? I tried to go along the example described here http://rapid-i.com/rapidforum/index.php/topic,2993.0.html and the classification approach presented in the Vancouver Data Blog Video Tutorial 5 but it seems that the problem hinges on the fact that the Learning Processes don't accept numerical labels. Could somebody give me a hint? I would really appreciate that!
Best regards,
André
Best regards,
André
Tagged:
0
Answers
this is a very unusual approach. Normally you want to avoid to put up this sentiment/word matrix yourself and let it do the program! You normally assign all your documents a certain sentiment and then apply a learning scheme to derive the effects.
If you have manually assigned these factors, you have done data mining manually and derived some sort of a linear model. What you have to do is to put them into a model so that you are able to apply them. There's no suggested way for this, because, well, as I said: Nobody normally wants to do this.
Only thing I can imagine is exporting a linear regression model in XML and then manually edit this file and reimporting it...
Greetings,
Sebastian
guys i hope i don't nerve you too much. as far as this is possible i will also contribute on the helping side in this forum!
best regards, andre
I think you explained what you are doing in an understandable way, but I don't WHY you should do this? What would be the meaning of the result?
Greetings,
Sebastian
best regards, andre
Sentiment Wordlist (created from a CSV file)(Tab1)
Best regards,
André
are the words in Tab2 unique (I guess they are at least in Tab1)? If yes, a simple "Join" would be sufficient with the word columns as IDs if you are interested in "full match" (distance 0) vs. "no match" (distance 1) only.
Otherwise a more complex process has to be created which would definitely also be possible.
Cheers,
Ingo
merci!
andré
actually, even if the words in Tab2 are not unique, the join approach should work pretty well. You will end up (depending on using a left or a right join) with a data set Tab2 with an additional column containing the corresponding sentiment scores from Tab1. A simple aggregation (average or sum) will then deliver the final, aggregated score for the document encoded in Tab2.
Well, if you want to calculate text based similarities, I would have a look into the Text Extension of RapidMiner and use the preprocessing operators delivered. You could, for example, transform the words into their stems, use character n-grams and other approaches for calculating the distances between the terms in both tables. Of course it would also be possible to loop through both tables and perform any type of distance measure you can build with operators inside. Finally, you could of course write your own distance measure and use it within RapidMiner. There are probably hundreds of options. Have fun trying them!
Cheers,
Ingo
Have a nice weekend!