All of my prediction row are the same...

Mustafa_AVDAN · December 2017

Hi,Im new on rapid miner;

at like tittle , all of my predictions are the same,but ı dont know?

all my predictions are negative when ı used naive bayes .

all my predictions are neutral when ı used decision tree.

ı have attached some screen capture about my train set or my result table.Please someone help me...

I just wanna do sentiment analysis on twitter data but ı coulnt do it...And my train set include 92 examples(ı know that isnt enough for the train set) But my train set was just 2 or 3 negative sentences but like I said;

when ı used naive bayes,all predictions were negative,but WHY?

PLEASE HELP ME...

Regards

Thomas_Ott · December 2017

Here is a very simple process that you can build off. This is how I would start.

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.003" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
        <parameter key="connection" value="Twitter - Studio Connection"/>
        <parameter key="query" value="#tesla"/>
        <parameter key="locale" value="en"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="7.6.003" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
        <list key="function_descriptions">
          <parameter key="Sentiment" value="if([Retweet-Count]&gt;20,&quot;Positive&quot;,&quot;Negative&quot;)"/>
        </list>
        <description align="center" color="transparent" colored="false" width="126">Create Fake Sentiment (add your sentiment labels)</description>
      </operator>
      <operator activated="true" class="set_role" compatibility="7.6.003" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
        <parameter key="attribute_name" value="Sentiment"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.6.003" expanded="true" height="82" name="Select Attributes" width="90" x="447" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Text|Sentiment"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="7.6.003" expanded="true" height="82" name="Nominal to Text" width="90" x="581" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="715" y="34">
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="7.6.003" expanded="true" height="145" name="Validation" width="90" x="849" y="34">
        <parameter key="sampling_type" value="shuffled sampling"/>
        <process expanded="true">
          <operator activated="true" class="naive_bayes" compatibility="7.6.003" expanded="true" height="82" name="Naive Bayes" width="90" x="250" y="34"/>
          <connect from_port="training set" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
          <description align="left" color="green" colored="true" height="113" resized="true" width="284" x="104" y="200">Builds a model on the current training data set (90 % of the data by default, 10 times).&lt;br&gt;&lt;br&gt;Make sure that you only put numerical attributes into a linear regression!</description>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="7.6.003" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="7.6.003" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <connect from_op="Performance" from_port="example set" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
          <description align="left" color="blue" colored="true" height="107" resized="true" width="333" x="28" y="139">Applies the model built from the training data set on the current test set (10 % by default).&lt;br/&gt;The Performance operator calculates performance indicators and sends them to the operator result.</description>
        </process>
        <description align="center" color="transparent" colored="false" width="126">A cross validation including a linear regression.</description>
      </operator>
      <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter (2)" width="90" x="45" y="289">
        <parameter key="connection" value="Twitter - Studio Connection"/>
        <parameter key="query" value="#tesla"/>
        <parameter key="locale" value="en"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.6.003" expanded="true" height="82" name="Select Attributes (2)" width="90" x="246" y="289">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Text|Sentiment"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="7.6.003" expanded="true" height="82" name="Nominal to Text (2)" width="90" x="380" y="289">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="849" y="289">
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="112" y="34"/>
          <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
          <connect from_op="Tokenize (2)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="7.6.003" expanded="true" height="82" name="Apply Model (2)" width="90" x="1050" y="289">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Search Twitter" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Validation" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents from Data (2)" to_port="word list"/>
      <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Validation" from_port="performance 1" to_port="result 1"/>
      <connect from_op="Search Twitter (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Nominal to Text (2)" to_port="example set input"/>
      <connect from_op="Nominal to Text (2)" from_port="example set output" to_op="Process Documents from Data (2)" to_port="example set"/>
      <connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Mustafa_AVDAN · December 2017

Thomas_Ott · December 2017

You should test the Naive Bayes learner inside a Cross Validation to see how well it can discern between the three classes, first. The initial predictions tell me that the model is not good at determining the 3 classes. You might need to use another learner or do some more text preprocessing.

Mustafa_AVDAN · December 2017

thanks for help;

you think which learning algorithm will be succes with three clasification?K-nn?

Thomas_Ott · December 2017

The beauty of RapidMiner Studio is that you can create multiple branches of your process and try different learners all at once. You can use a Multiply to split the training data and attach a Deep Learning, SVM, Knn, Naive Bayes, whatever after it and then review all the results

Mustafa_AVDAN · December 2017

hey sir , Im sorry have stolen your time , but ı just wanna learn sentiment analysis on rapidminer...

I changed my little project...First of all , I changed my train set , Im using positive and negative wordlist as train set anymore...And I removed the neutral clasification , ı have to 2 sentiment clasification anymore(Roughly 2000 positive words-4500 negative words)...

But all of my predictions are NEGATİVE again...What can I do?

Thomas_Ott · December 2017

I have no idea what your process looks likes. Can you share it using the </> code button above along with your data files?

Mustafa_AVDAN · December 2017

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
        <parameter key="excel_file" value="C:\Users\musta\Desktop\train_set.xlsx"/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <operator activated="false" class="subprocess" compatibility="8.0.001" expanded="true" height="82" name="tweets" width="90" x="112" y="391">
        <process expanded="true">
          <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
            <parameter key="connection" value="Twitterrr"/>
            <parameter key="query" value="#tesla"/>
            <parameter key="result_type" value="recent"/>
            <parameter key="limit" value="800"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="8.0.001" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
            <parameter key="attribute_name" value="Id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="380" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Text"/>
          </operator>
          <operator activated="true" class="remove_duplicates" compatibility="8.0.001" expanded="true" height="103" name="Remove Duplicates" width="90" x="514" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Text"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="648" y="34">
            <parameter key="attribute_filter_type" value="no_missing_values"/>
          </operator>
          <connect from_op="Search Twitter" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Remove Duplicates" to_port="example set input"/>
          <connect from_op="Remove Duplicates" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="false" class="write_excel" compatibility="8.0.001" expanded="true" height="82" name="Write Excel" width="90" x="246" y="391">
        <parameter key="excel_file" value="C:\Users\musta\Desktop\twiti.xlsx"/>
      </operator>
      <operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="187">
        <parameter key="excel_file" value="C:\Users\musta\Desktop\twiti.xlsx"/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <operator activated="true" class="text:data_to_documents" compatibility="7.5.000" expanded="true" height="68" name="Data to Documents (2)" width="90" x="179" y="187">
        <list key="specify_weights"/>
      </operator>
      <operator activated="true" class="text:process_documents" compatibility="7.5.000" expanded="true" height="103" name="Process Documents (2)" width="90" x="313" y="187">
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (3)" width="90" x="112" y="34"/>
          <operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases (3)" width="90" x="246" y="34"/>
          <operator activated="true" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (2)" width="90" x="380" y="34"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (2)" width="90" x="514" y="34"/>
          <operator activated="true" class="text:stem_porter" compatibility="7.5.000" expanded="true" height="68" name="Stem (2)" width="90" x="648" y="34"/>
          <connect from_port="document" to_op="Tokenize (3)" to_port="document"/>
          <connect from_op="Tokenize (3)" from_port="document" to_op="Transform Cases (3)" to_port="document"/>
          <connect from_op="Transform Cases (3)" from_port="document" to_op="Filter Tokens (2)" to_port="document"/>
          <connect from_op="Filter Tokens (2)" from_port="document" to_op="Filter Stopwords (2)" to_port="document"/>
          <connect from_op="Filter Stopwords (2)" from_port="document" to_op="Stem (2)" to_port="document"/>
          <connect from_op="Stem (2)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (3)" width="90" x="447" y="187">
        <parameter key="attribute_filter_type" value="no_missing_values"/>
      </operator>
      <operator activated="true" class="text:data_to_documents" compatibility="7.5.000" expanded="true" height="68" name="Data to Documents" width="90" x="179" y="34">
        <list key="specify_weights"/>
      </operator>
      <operator activated="true" class="text:process_documents" compatibility="7.5.000" expanded="true" height="103" name="Process Documents" width="90" x="313" y="34">
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="112" y="34"/>
          <operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="246" y="34"/>
          <operator activated="true" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="380" y="34"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="514" y="34"/>
          <operator activated="true" class="text:stem_porter" compatibility="7.5.000" expanded="true" height="68" name="Stem (Porter)" width="90" x="648" y="34"/>
          <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
          <connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
          <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
          <connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
          <connect from_op="Stem (Porter)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="set_role" compatibility="8.0.001" expanded="true" height="82" name="Set Role (2)" width="90" x="447" y="34">
        <parameter key="attribute_name" value="sentiment"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles">
          <parameter key="Text" value="regular"/>
        </list>
      </operator>
      <operator activated="true" class="naive_bayes" compatibility="8.0.001" expanded="true" height="82" name="Naive Bayes" width="90" x="715" y="34"/>
      <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="782" y="187">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Read Excel" from_port="output" to_op="Data to Documents" to_port="example set"/>
      <connect from_op="tweets" from_port="out 1" to_op="Write Excel" to_port="input"/>
      <connect from_op="Read Excel (2)" from_port="output" to_op="Data to Documents (2)" to_port="example set"/>
      <connect from_op="Data to Documents (2)" from_port="documents" to_op="Process Documents (2)" to_port="documents 1"/>
      <connect from_op="Process Documents (2)" from_port="example set" to_op="Select Attributes (3)" to_port="example set input"/>
      <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Data to Documents" from_port="documents" to_op="Process Documents" to_port="documents 1"/>
      <connect from_op="Process Documents" from_port="example set" to_op="Set Role (2)" to_port="example set input"/>
      <connect from_op="Set Role (2)" from_port="example set output" to_op="Naive Bayes" to_port="training set"/>
      <connect from_op="Naive Bayes" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Mustafa_AVDAN · December 2017

those are my train set(postive-negative word list) and unlabeled tweets list

lionelderkrikor · December 2017

Hi @Mustafa_AVDAN

Can you share your train dataset (train_set.xlsx) and test data set (twiti.xlsx) too ?

Regards,

Lionel

Mustafa_AVDAN · December 2017

@lionelderkrikor ı have shared but not .xlxs format.I converted to .txt

Thomas_Ott · December 2017

First things first, connect your WOR ports like so:

lionelderkrikor · December 2017

OK @Mustafa_AVDAN, I did'nt see your last post. (our posts have crossed)...

Lionel

Thomas_Ott · December 2017

The first problem I see is that you have in your training set a word and it's sentiment. That's like a word list and and really does nothing for you in this context. If you all you really want use that training set to lookup if the same word is in the test sentence and assign it a sentiment score, you could do that but I think that's not what you want to do.

What you really need is a training set to have something like this:

Sentiment | Tweet

Positive | Tesla is the best car

Negative | Teslsa is overhyped and they stink

Then when you feed the new Test Tweets with the following sentences you would have something like this

I own Tesla Stock and I think it will go to the moon

You have to make sure that the trained WordList is used to feed into the Test Process Documents flow.

I would also wrap the Naive Bayes into a Cross Validation to measure how well you can discern between negative and positive.

Mustafa_AVDAN · December 2017

thanks for everything Sir;

But ı dont know where ı can found the train set like said...So that ı have used wordlist as train set...Please help me Rapid Miner Family ...I dont know what to do

Mustafa_AVDAN · December 2017

thank you Sir,for everything...

I guess the thing ı need is a good train set that include tweets...But where can ı find such a train set?

thanks for everything again...

Thomas_Ott · December 2017

You can start out with 100 hand labeled and reviewed tweets. Get 50 negative and 50 positive. Then run the process and label the test set. Review that test set and see if the predictions are good, then add that to the training set, etc. This way over time you can build a very good training set.

If you want to go the fast way, try using the SentiWordnet operator, it will assign a -1 to +1 range to each tweet as sentiment. Or you can use the Aylien extension and get up to 1000 free sentiment scores a day.

lionelderkrikor · December 2017

Hi @Mustafa_AVDAN again,

if the keyword #tesla is not mandatory, here a labeled dataset of sentiment analysis to download :

https://www.kaggle.com/crowdflower/twitter-airline-sentiment/data

In this dataset there are tweets (attribute "text") with the keyword @VirginAmerica which are labeled (attribute "airline sentiment") as

positive/negative/neutral.

Maybe it can help you.

Regards,

Lionel

lionelderkrikor · December 2017

Hi @Mustafa_AVDAN,

You can find a (little) labeled training set of tweets with the keyword #tesla obtained

with the librarie "TextBlob" which "score" the sentiment of a tweet (and more generally of a sentence).

Here the process, if you want experiment by yourself :

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
        <parameter key="connection" value="dkk"/>
        <parameter key="query" value="#tesla"/>
        <parameter key="limit" value="1000"/>
        <parameter key="locale" value="en"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
        <parameter key="attributes" value="Text|Sentiment"/>
      </operator>
      <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Sentiment analysis" width="90" x="313" y="34">
        <parameter key="script" value="import pandas as pd&#10;from textblob import TextBlob&#10;import sys&#10;&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;reload(sys)&#10;sys.setdefaultencoding('utf8')&#10;&#10;&#10;def rm_main(data):&#10;&#10;  for i in range(0,len(data)):&#10;  &#10;    sentiment = TextBlob(str(data.iloc[i,0]).encode('utf-8').decode('utf-8'))&#10;    data.loc[i,'sentiment_scoring'] = sentiment.sentiment.polarity&#10;    &#10;  return data"/>
      </operator>
      <operator activated="true" class="order_attributes" compatibility="8.0.001" expanded="true" height="82" name="Reorder Attributes" width="90" x="447" y="34">
        <parameter key="attribute_ordering" value="Text|sentiment_scoring|sentiment_scoring_2"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="8.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="136">
        <list key="function_descriptions">
          <parameter key="sentiment" value="if(sentiment_scoring&lt;=0,&quot;negative&quot;,if(sentiment_scoring&gt;0,&quot;positive&quot;,&quot;?&quot;))"/>
        </list>
      </operator>
      <operator activated="true" class="write_excel" compatibility="8.0.001" expanded="true" height="82" name="Write Excel" width="90" x="447" y="238">
        <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Twitter_sentiment_training_set.xlsx"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="8.0.001" expanded="true" height="82" name="Nominal to Text" width="90" x="581" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (3)" width="90" x="581" y="136">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="sentiment_scoring"/>
        <parameter key="invert_selection" value="true"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="8.0.001" expanded="true" height="82" name="Set Role" width="90" x="581" y="238">
        <parameter key="attribute_name" value="sentiment"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="782" y="136">
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="8.0.001" expanded="true" height="145" name="Validation" width="90" x="983" y="136">
        <parameter key="sampling_type" value="shuffled sampling"/>
        <process expanded="true">
          <operator activated="true" class="naive_bayes" compatibility="8.0.001" expanded="true" height="82" name="Naive Bayes" width="90" x="250" y="34"/>
          <operator activated="false" class="weka:W-Logistic" compatibility="7.3.000" expanded="true" height="82" name="W-Logistic" width="90" x="246" y="136"/>
          <operator activated="false" class="h2o:deep_learning" compatibility="7.6.001" expanded="true" height="82" name="Deep Learning" width="90" x="246" y="238">
            <enumeration key="hidden_layer_sizes">
              <parameter key="hidden_layer_sizes" value="50"/>
              <parameter key="hidden_layer_sizes" value="50"/>
            </enumeration>
            <enumeration key="hidden_dropout_ratios"/>
            <list key="expert_parameters"/>
            <list key="expert_parameters_"/>
          </operator>
          <connect from_port="training set" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
          <description align="left" color="green" colored="true" height="113" resized="true" width="284" x="176" y="637">Builds a model on the current training data set (90 % of the data by default, 10 times).&lt;br&gt;&lt;br&gt;Make sure that you only put numerical attributes into a linear regression!</description>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="8.0.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
            <list key="class_weights"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <connect from_op="Performance" from_port="example set" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
          <description align="left" color="blue" colored="true" height="107" resized="true" width="333" x="28" y="139">Applies the model built from the training data set on the current test set (10 % by default).&lt;br/&gt;The Performance operator calculates performance indicators and sends them to the operator result.</description>
        </process>
      </operator>
      <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter (2)" width="90" x="112" y="442">
        <parameter key="connection" value="dkk"/>
        <parameter key="query" value="#tesla"/>
        <parameter key="locale" value="en"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="246" y="442">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Text|Sentiment"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="8.0.001" expanded="true" height="82" name="Nominal to Text (2)" width="90" x="380" y="442">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="849" y="442">
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="112" y="34"/>
          <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
          <connect from_op="Tokenize (2)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="1050" y="442">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Sentiment analysis" to_port="input 1"/>
      <connect from_op="Sentiment analysis" from_port="output 1" to_op="Reorder Attributes" to_port="example set input"/>
      <connect from_op="Reorder Attributes" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Write Excel" to_port="input"/>
      <connect from_op="Write Excel" from_port="through" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
      <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Validation" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents from Data (2)" to_port="word list"/>
      <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Validation" from_port="example set" to_port="result 4"/>
      <connect from_op="Validation" from_port="performance 1" to_port="result 1"/>
      <connect from_op="Search Twitter (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Nominal to Text (2)" to_port="example set input"/>
      <connect from_op="Nominal to Text (2)" from_port="example set output" to_op="Process Documents from Data (2)" to_port="example set"/>
      <connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
      <connect from_op="Apply Model (2)" from_port="model" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>

I hope that it will be helpful to your project.

Regards,

Lionel

Mustafa_AVDAN · December 2017

thank for everything @lionelderkrikor and @Thomas_Ott;

I almost finished my little project thanks to you two...I have done a train set with negative words and positive words.And it usualy work succesfull . Now , ı have only one problem . ı have to classification like possitive-negative and neutral(3 class) but the neutral words are absent in my train set . So my procces is classifiying only positive or negative . But my procces also have to do neutral clasification . So I guess I need neutral words in my trainset but I haven't found neutral wordlist yet.

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter (2)" width="90" x="45" y="340">
        <parameter key="connection" value="Twitterrr"/>
        <parameter key="query" value="#tesla"/>
        <parameter key="language" value="en"/>
      </operator>
      <operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
        <parameter key="excel_file" value="C:\Users\musta\Desktop\append\main_train_set.xlsx"/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="8.0.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
        <parameter key="attribute_name" value="Sentiment"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="8.0.001" expanded="true" height="82" name="Nominal to Text" width="90" x="313" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="447" y="34">
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="45" y="34"/>
          <operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases" width="90" x="179" y="34"/>
          <operator activated="true" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="313" y="34">
            <parameter key="min_chars" value="3"/>
          </operator>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="447" y="34"/>
          <operator activated="true" class="text:stem_snowball" compatibility="7.5.000" expanded="true" height="68" name="Stem (Snowball)" width="90" x="581" y="34"/>
          <operator activated="true" class="text:stem_porter" compatibility="7.5.000" expanded="true" height="68" name="Stem (Porter)" width="90" x="715" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
          <connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
          <connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Stem (Snowball)" to_port="document"/>
          <connect from_op="Stem (Snowball)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
          <connect from_op="Stem (Porter)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="8.0.001" expanded="true" height="145" name="Validation" width="90" x="648" y="34">
        <parameter key="sampling_type" value="shuffled sampling"/>
        <process expanded="true">
          <operator activated="true" class="naive_bayes" compatibility="8.0.001" expanded="true" height="82" name="Naive Bayes" width="90" x="250" y="34"/>
          <connect from_port="training set" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="8.0.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <connect from_op="Performance" from_port="example set" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="179" y="340">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
        <parameter key="attributes" value="Text"/>
      </operator>
      <operator activated="true" class="remove_duplicates" compatibility="8.0.001" expanded="true" height="103" name="Remove Duplicates" width="90" x="313" y="340">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="447" y="340"/>
      <operator activated="true" class="nominal_to_text" compatibility="8.0.001" expanded="true" height="82" name="Nominal to Text (2)" width="90" x="581" y="289">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="715" y="238">
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="45" y="34"/>
          <operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="179" y="34"/>
          <operator activated="true" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (2)" width="90" x="313" y="34">
            <parameter key="min_chars" value="3"/>
          </operator>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (2)" width="90" x="447" y="34"/>
          <operator activated="true" class="text:stem_snowball" compatibility="7.5.000" expanded="true" height="68" name="Stem (2)" width="90" x="581" y="34"/>
          <operator activated="true" class="text:stem_porter" compatibility="7.5.000" expanded="true" height="68" name="Stem (3)" width="90" x="715" y="34"/>
          <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
          <connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
          <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Tokens (2)" to_port="document"/>
          <connect from_op="Filter Tokens (2)" from_port="document" to_op="Filter Stopwords (2)" to_port="document"/>
          <connect from_op="Filter Stopwords (2)" from_port="document" to_op="Stem (2)" to_port="document"/>
          <connect from_op="Stem (2)" from_port="document" to_op="Stem (3)" to_port="document"/>
          <connect from_op="Stem (3)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="849" y="34">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="983" y="238">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="prediction(Sentiment)"/>
      </operator>
      <operator activated="true" class="join" compatibility="8.0.001" expanded="true" height="82" name="Join" width="90" x="1117" y="340">
        <list key="key_attributes"/>
      </operator>
      <connect from_op="Search Twitter (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Validation" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents from Data (2)" to_port="word list"/>
      <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Remove Duplicates" to_port="example set input"/>
      <connect from_op="Remove Duplicates" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Nominal to Text (2)" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Join" to_port="right"/>
      <connect from_op="Nominal to Text (2)" from_port="example set output" to_op="Process Documents from Data (2)" to_port="example set"/>
      <connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Join" to_port="left"/>
      <connect from_op="Join" from_port="join" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Thomas_Ott · December 2017

Ideally, yes you'll need a set of neutral tweets to train on. However, you could look at the confidence levels of the positive and negative predicted sentiment and set any Positive predicted tweet between 50 and 70% confidence is a neutral and any Negative predicted tweet between 50 and 70% confidence is also neutral.

lionelderkrikor · December 2017

Hi @Mustafa_AVDAN,

Here a (little) train set of tweet (with the keyword #tesla) labeled as positive/neutral/negative.

I used like previously the TextBlob library which score the sentiment of a tweet in range [-1,+1]

I use a generate attribute operaror to create the label considering that :

-1 <= score < 0 --> negative

score = 0 --> neutral

0<score <= 1 --> positive

Regards,

Lionel

Mustafa_AVDAN · December 2017

hi @Thomas_Ott ;

I think your way is so logical but ı have a problem my confidence values are either 1 or 0... Ekran Görüntüsü (28).png

Thomas_Ott · January 2018

Yeah something doesn't look right here.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

All of my prediction row are the same...

Best Answer

Answers