The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Sentiment Analysis as a supervised learning problem

bhupendra_patilbhupendra_patil Employee-RapidMiner, Member Posts: 168 RM Data Scientist
edited November 2018 in Knowledge Base

 

 

Rapidminer provides multiple ways to do sentiment analysis. A very commonly used and powerful solution for sentiment analysis is training a model based on historical information or training set and then building a predictive model using that. Historical information may be available if in the past certain content was manually coded into different sentiment values. If not one will have to do a preparation step where a good sample should be manually classified as positive or negative sentiment. This is a one time effort and having a good training set will lead to better models and better predictions.

 

Please use this example along with the provided sample process (Attached as a zip file with this article)

 

An example of training set we will use today is as seen below. (It is also attached in the zip file attached with this article)

textmining training set.png

 

The process to build a model using this would involve at least following operators

  • Read Excel (To read the sample data)
  • Nominal to text (To specify which column is a text column, since Rapidminer "Process Documents..." Operators work only on text data
  • Process Documents from Data (This is the meta process for most text processing capabilities)
  • Tokenize (This will be used to tokenize the content into words, n grams etc as needed)

The actual process will look like this for the processing of training text

supervised sentiment analysis.png

 

Inside the "Process  Documents from Data' operator we will have one step for the basic process, i.e Tokenize

 

supervised sentimenet basic.pngWe will later on work on improviing this sub process if needed.

The output of "Process Documents from Data" will be your tokenized exampleset as well as a wordlist.

 

Now we can build a cross validation step using our "Tokenized example set". We will also need to add the "Set Role" operator to specify our Label (i.e target) variable.

The process should look something like this.

basic supervised add validation.png

 

To know more about validation, please look at these links

Inside the validation operator we can use any of the learners. For text mining use cases, Naive Bayes is many times good and fast. You can also try SVM or Neural Nets but that increases the computational  complexity of the solution.

 

The validation step provides the model as well as information of performance of the model. "mod" provides the model and "ave" provides the performance.

In our case for the basic example when using Naive Bayes our accuracy confusion matrix looks like

basic performance.png

 

When using SVM our confusion matrix looks like

svm performance.png

 

We will explore in a later article on how to improve on text processing. But for now lets assume this a good model.

 

Now to use this predictive model we will basically do similar process on the actual data set and then apply the model on the tokenized dataset.

One addtional step we need to do is, pass the wordlist from the training "Process Document from Data" operator to the scoring "Process Document from Data"

You process will look something like this.

apply basic model.png

The output from the Apply model will have three special columns. as seen the screen shot below

Prediction(Sentiment) - Actual class

confidence(negative)

confidence(positive)

final output.png

You can then add additonal text processing operators as needed in your use case to improve on your model

 

A sample detailed "Process Documents from data" with more pre processing will look something like below.

Please ensure that you do the same steps on the scoring side to get correct results. Using Building Blocks is helpful here. 

 

 

detailed process mining.png

 

 

Comments

  • transilicatransilica Member Posts: 1 Learner II

    Thanks for the tutorial. Very helpfull (y)

  • Anna_May1Anna_May1 Member Posts: 14 Learner I
    Hi there, 
     
    thank you so much for this post!

    Sadly some of the links you mention in the text are not included. I am fairly new to Rapidminer Studio and am trying to follow the process as described by you, with your test data. I reached the point "We will also need to add the "Set Role" operator to specify our Label (i.e target) variable." but sadly, Rapidminer Studio tells me that I need to choose an attribute name and a label but that's not a possibility, as my data (after putting it through Process Documents from Data) looks like down below.

    Have you got any solution to this issue? And is there a version of your article, where the Links are included?

    I wanna thank you so much for your input! Your post is awesome!

    Kind Regards

    Anna May
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    even if the attribute does not exist in the drop down of set role, you can still type it in and it will work. Sometimes it is impossible to guess for us what attributes are available at what operator. Thus you sometimes need to do it manually.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Anna_May1Anna_May1 Member Posts: 14 Learner I
    Hi @mschmitz

    thanks for your response! The issue is that the attributes available are the ones you see in my picture above. When doing "set role" as described in the tutorial, the data I do it with is the one showed in the picture. It doesn't make sense though to choose any attribute, from my understanding, if the data you choose it from is the one in the picture. 
    That's why I don't get how to "set role", as it doesn't make sense with the data output. 

    So I'm asking myself where my mistake lies. I have worked with the data described as in the tutorial, No idea what the issue is.

    Cheers,

    Anna May
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    any chance you can post the process so we can have a look?

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Hi @Anna_May1

    Welcome to the RapidMiner Community! 

    The purpose of using Set Role for an attribute as a "label" is so that the algorithm later knows how to classify the data.  Usually your Label attribute might be named "Sentiment" and have two values "Positive" and "Negative".  
    And, usually you have this attribute in your dataset BEFORE you pass the data into Process Documents from Data to extract all the new attributes from the text.  

    Maybe the trick here for you is to put your Set Role operator just before your Process Documents from Data operator and see if you can select an attribute for your label this time.  

    Hope that helps! 
  • EkitsuneEkitsune Member Posts: 2 Learner I
    Hi! @bhupendra_patil you have mentioned that there will be another article focused on how to improve on text processing. Where could I find it? :) Thanks a lot!
Sign In or Register to comment.