All of my prediction row are the same...
Hi,Im new on rapid miner;
at like tittle , all of my predictions are the same,but ı dont know?
all my predictions are negative when ı used naive bayes .
all my predictions are neutral when ı used decision tree.
ı have attached some screen capture about my train set or my result table.Please someone help me...
I just wanna do sentiment analysis on twitter data but ı coulnt do it...And my train set include 92 examples(ı know that isnt enough for the train set) But my train set was just 2 or 3 negative sentences but like I said;
when ı used naive bayes,all predictions were negative,but WHY?
PLEASE HELP ME...
Regards
Best Answer
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
Here is a very simple process that you can build off. This is how I would start.
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
<parameter key="connection" value="Twitter - Studio Connection"/>
<parameter key="query" value="#tesla"/>
<parameter key="locale" value="en"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.6.003" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
<list key="function_descriptions">
<parameter key="Sentiment" value="if([Retweet-Count]>20,"Positive","Negative")"/>
</list>
<description align="center" color="transparent" colored="false" width="126">Create Fake Sentiment (add your sentiment labels)</description>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.003" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
<parameter key="attribute_name" value="Sentiment"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.6.003" expanded="true" height="82" name="Select Attributes" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Text|Sentiment"/>
</operator>
<operator activated="true" class="nominal_to_text" compatibility="7.6.003" expanded="true" height="82" name="Nominal to Text" width="90" x="581" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Text"/>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="715" y="34">
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="7.6.003" expanded="true" height="145" name="Validation" width="90" x="849" y="34">
<parameter key="sampling_type" value="shuffled sampling"/>
<process expanded="true">
<operator activated="true" class="naive_bayes" compatibility="7.6.003" expanded="true" height="82" name="Naive Bayes" width="90" x="250" y="34"/>
<connect from_port="training set" to_op="Naive Bayes" to_port="training set"/>
<connect from_op="Naive Bayes" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
<description align="left" color="green" colored="true" height="113" resized="true" width="284" x="104" y="200">Builds a model on the current training data set (90 % of the data by default, 10 times).<br><br>Make sure that you only put numerical attributes into a linear regression!</description>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="7.6.003" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="7.6.003" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
<description align="left" color="blue" colored="true" height="107" resized="true" width="333" x="28" y="139">Applies the model built from the training data set on the current test set (10 % by default).<br/>The Performance operator calculates performance indicators and sends them to the operator result.</description>
</process>
<description align="center" color="transparent" colored="false" width="126">A cross validation including a linear regression.</description>
</operator>
<operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter (2)" width="90" x="45" y="289">
<parameter key="connection" value="Twitter - Studio Connection"/>
<parameter key="query" value="#tesla"/>
<parameter key="locale" value="en"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.6.003" expanded="true" height="82" name="Select Attributes (2)" width="90" x="246" y="289">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Text|Sentiment"/>
</operator>
<operator activated="true" class="nominal_to_text" compatibility="7.6.003" expanded="true" height="82" name="Nominal to Text (2)" width="90" x="380" y="289">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Text"/>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="849" y="289">
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="112" y="34"/>
<connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
<connect from_op="Tokenize (2)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" compatibility="7.6.003" expanded="true" height="82" name="Apply Model (2)" width="90" x="1050" y="289">
<list key="application_parameters"/>
</operator>
<connect from_op="Search Twitter" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
<connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_op="Validation" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents from Data (2)" to_port="word list"/>
<connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Validation" from_port="performance 1" to_port="result 1"/>
<connect from_op="Search Twitter (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Nominal to Text (2)" to_port="example set input"/>
<connect from_op="Nominal to Text (2)" from_port="example set output" to_op="Process Documents from Data (2)" to_port="example set"/>
<connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>2
Answers
You should test the Naive Bayes learner inside a Cross Validation to see how well it can discern between the three classes, first. The initial predictions tell me that the model is not good at determining the 3 classes. You might need to use another learner or do some more text preprocessing.
thanks for help;
you think which learning algorithm will be succes with three clasification?K-nn?
The beauty of RapidMiner Studio is that you can create multiple branches of your process and try different learners all at once. You can use a Multiply to split the training data and attach a Deep Learning, SVM, Knn, Naive Bayes, whatever after it and then review all the results
hey sir , Im sorry have stolen your time , but ı just wanna learn sentiment analysis on rapidminer...
I changed my little project...First of all , I changed my train set , Im using positive and negative wordlist as train set anymore...And I removed the neutral clasification , ı have to 2 sentiment clasification anymore(Roughly 2000 positive words-4500 negative words)...
But all of my predictions are NEGATİVE again...What can I do?
I have no idea what your process looks likes. Can you share it using the </> code button above along with your data files?
those are my train set(postive-negative word list) and unlabeled tweets list
Hi @Mustafa_AVDAN
Can you share your train dataset (train_set.xlsx) and test data set (twiti.xlsx) too ?
Regards,
Lionel
@lionelderkrikor ı have shared but not .xlxs format.I converted to .txt
First things first, connect your WOR ports like so:
OK @Mustafa_AVDAN, I did'nt see your last post. (our posts have crossed)...
Lionel
The first problem I see is that you have in your training set a word and it's sentiment. That's like a word list and and really does nothing for you in this context. If you all you really want use that training set to lookup if the same word is in the test sentence and assign it a sentiment score, you could do that but I think that's not what you want to do.
What you really need is a training set to have something like this:
Sentiment | Tweet
Positive | Tesla is the best car
Negative | Teslsa is overhyped and they stink
Then when you feed the new Test Tweets with the following sentences you would have something like this
Tweet
I own Tesla Stock and I think it will go to the moon
You have to make sure that the trained WordList is used to feed into the Test Process Documents flow.
I would also wrap the Naive Bayes into a Cross Validation to measure how well you can discern between negative and positive.
thanks for everything Sir;
But ı dont know where ı can found the train set like said...So that ı have used wordlist as train set...Please help me Rapid Miner Family ...I dont know what to do
thank you Sir,for everything...
I guess the thing ı need is a good train set that include tweets...But where can ı find such a train set?
thanks for everything again...
You can start out with 100 hand labeled and reviewed tweets. Get 50 negative and 50 positive. Then run the process and label the test set. Review that test set and see if the predictions are good, then add that to the training set, etc. This way over time you can build a very good training set.
If you want to go the fast way, try using the SentiWordnet operator, it will assign a -1 to +1 range to each tweet as sentiment. Or you can use the Aylien extension and get up to 1000 free sentiment scores a day.
Hi @Mustafa_AVDAN again,
if the keyword #tesla is not mandatory, here a labeled dataset of sentiment analysis to download :
https://www.kaggle.com/crowdflower/twitter-airline-sentiment/data
In this dataset there are tweets (attribute "text") with the keyword @VirginAmerica which are labeled (attribute "airline sentiment") as
positive/negative/neutral.
Maybe it can help you.
Regards,
Lionel
Hi @Mustafa_AVDAN,
You can find a (little) labeled training set of tweets with the keyword #tesla obtained
with the librarie "TextBlob" which "score" the sentiment of a tweet (and more generally of a sentence).
Here the process, if you want experiment by yourself :
I hope that it will be helpful to your project.
Regards,
Lionel
thank for everything @lionelderkrikor and @Thomas_Ott;
I almost finished my little project thanks to you two...I have done a train set with negative words and positive words.And it usualy work succesfull . Now , ı have only one problem . ı have to classification like possitive-negative and neutral(3 class) but the neutral words are absent in my train set . So my procces is classifiying only positive or negative . But my procces also have to do neutral clasification . So I guess I need neutral words in my trainset but I haven't found neutral wordlist yet.
Ideally, yes you'll need a set of neutral tweets to train on. However, you could look at the confidence levels of the positive and negative predicted sentiment and set any Positive predicted tweet between 50 and 70% confidence is a neutral and any Negative predicted tweet between 50 and 70% confidence is also neutral.
Hi @Mustafa_AVDAN,
Here a (little) train set of tweet (with the keyword #tesla) labeled as positive/neutral/negative.
I used like previously the TextBlob library which score the sentiment of a tweet in range [-1,+1]
I use a generate attribute operaror to create the label considering that :
-1 <= score < 0 --> negative
score = 0 --> neutral
0<score <= 1 --> positive
Regards,
Lionel
hi @Thomas_Ott ;
I think your way is so logical but ı have a problem my confidence values are either 1 or 0...
Yeah something doesn't look right here.