The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Low Recall High Accuracy
Below example results for same dataset. And dataset has not missing value;
For Naive Bayes:
Rapidminer Recall: 26.35% +/- 5.17% (micro average: 26.37%)
Weka Recall: 0.768
Rapidminer Precision: 43.41%
Weka Precision: 0.735
Rapidminer Accuracy:77.14
Weka Accuracy:76.7639 %
For Random Forrest:
Rapidminer Recall: 16.60% +/- 6.01% (micro average: 16.59%)
Weka Recall: 0.843
Rapidminer Accuracy:81.75%
Weka Accuracy:84.2897 %
For KNN:
Rapidminer Recall: 12.89% +/- 3.82% (micro average: 12.89%)
Weka Recall: 0.824
Rapidminer Precision: 55.82% +/- 12.05% (micro average: 55.77%)
Weka Precision: 0.810
Rapidminer Accuracy:79.40%
Weka Accuracy:82.4396 %
For Decision Tree
Weka Accuracy; 81.4989 %
RapidMiner Accuracy: 83.07%
Weka Recall; 0.815
RapidMiner Recall: 30.67%
Why rapidminer recall and and precision value is very low despite accuracy is high. Especially recall value. ?
My process is in attach. I use same process for other algorithms
**Also I try other settings in related Algorithms for improve recall in Rapidminer.
I mean ,
For Example KNN;
Changing K values, measure types, mixes measure, weighted vote.
Decision Tress;
Changing criterion,maximal dept, prunning,confidence,preprunning,minimal gain, leaf size,minimal size for split,number of preprunning alternatives
Random Forrest;
Changing number of trees, criterion,prunning,confidence,preprunning, random splits,guess subset ratio, voting strategy ets
But still recall value is low
For Naive Bayes:
Rapidminer Recall: 26.35% +/- 5.17% (micro average: 26.37%)
Weka Recall: 0.768
Rapidminer Precision: 43.41%
Weka Precision: 0.735
Rapidminer Accuracy:77.14
Weka Accuracy:76.7639 %
For Random Forrest:
Rapidminer Recall: 16.60% +/- 6.01% (micro average: 16.59%)
Weka Recall: 0.843
Rapidminer Accuracy:81.75%
Weka Accuracy:84.2897 %
For KNN:
Rapidminer Recall: 12.89% +/- 3.82% (micro average: 12.89%)
Weka Recall: 0.824
Rapidminer Precision: 55.82% +/- 12.05% (micro average: 55.77%)
Weka Precision: 0.810
Rapidminer Accuracy:79.40%
Weka Accuracy:82.4396 %
For Decision Tree
Weka Accuracy; 81.4989 %
RapidMiner Accuracy: 83.07%
Weka Recall; 0.815
RapidMiner Recall: 30.67%
Why rapidminer recall and and precision value is very low despite accuracy is high. Especially recall value. ?
My process is in attach. I use same process for other algorithms
**Also I try other settings in related Algorithms for improve recall in Rapidminer.
I mean ,
For Example KNN;
Changing K values, measure types, mixes measure, weighted vote.
Decision Tress;
Changing criterion,maximal dept, prunning,confidence,preprunning,minimal gain, leaf size,minimal size for split,number of preprunning alternatives
Random Forrest;
Changing number of trees, criterion,prunning,confidence,preprunning, random splits,guess subset ratio, voting strategy ets
But still recall value is low
Tagged:
1
Best Answer
-
varunm1 Member Posts: 1,207 UnicornHello @ozcan
I checked your decision tree process and data. So, recall and precision are calculated based on positive and negative classes. In your case, rapidminer is taking label "1" as a positive class by default. In that case, the recall is low as mentioned in your post. If you set the positive class manually by using "Performance (Binominal Classification)" to "0" then your recall is 90.25%.
I think in weka the positive class might be 0, you need to check that and confirm. Try checking recall for both classes in rapidminer and weka. There might be other issues as well. I also added a better way to build your process.
Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
5
Answers
This results can be explained by a highly imbalanced dataset.
In this case, the algorithm has difficulties to "capture" the relationships between your regular attribute(s) and the minority class of your label and thus to correctly predict the minority class, that's why the recall is low although the accuracy is relatively good.
However I don't know why there is significant difference between Weka and RapidMiner.
Could your share your dataset ?
Regards,
Lionel
This is a tricky question. How are you gettings these results? Are you cross validating or split validating your data? If so are the test data sets same in both rapidminer and weka.
How about the hyperparameters of these algorithms? Are they exactly same?
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Hello
It depends on your data and depends on algorithm. According to the classification and clustering when any software wants to do classification or clustering on your data may be you see some differences and this is not a problem. Base of data science is with Statistics and Probabilities. So different Accuracy is normal.
I hope this helps
mbs
In rapidminer; I change to bug label as nominal other attributes are real.
In weka ; I change bug label numeric to nominal , other attributes are numeric
For all algorithms; I user 10 cross validation for Weka and Rapidminer
I set role bug label. I select all attributes.for all algoritgms.
Cross validations is folds:10, other options are default.
I didnt any changes of algorithm options, All of them are default settings.
But minor differences between Weka and Rapidminer; can be confidence interval. But this should not be affect recall like this.
This is not a tricky question. These results and comprassion are need to my thesis. @lionelderkrikor @varunm1 @mbs
We understand that, but what we are trying to say is that the performance varies based on the way 10 folds of cross validations are divided and also the settings of each algorithm. The default in raoidminer and weka might not be same, the base algorithm might not be working the same way of default parameters are not similar
I am not sure if its a good idea to compare two softwares based on performance. I guess @IngoRM might help you with the pitfalls of doing this.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
I had an experience about different Accuracy but this is not a problem you can accept both answers for both software because they are not the same in clustering and classification and according to the Statistics and Probabilities both of them are correct. May be others can help you more.
One more thing:
Take a look on your data please, you have a lot of different numbers in your data which is very important and can affect on your process.
All the best
mbs
Moreover; For decision tree;
Weka Accuracy; 81.4989 %
RapidMiner Accuracy: 83.07% %
Weka Recall; 0.815
RapidMiner Recall: 30.67%
Yes this setting solved my problem. Thank you very much . One more thing, my bug label is nominal. For this, ı get potential problem . Is it effect my results. I have to change bug label to binominal.? ı add screenshot to attachment
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing