Help with correct understanding results of classification

Serek91 · October 2019

Hi, I have such table with results of classifications:

I have 4 algorithms. Classification was made for 16 different training sets:

- all => all 15 predictors were used

- 1-15 => each set contains 14 predictors and in each set one different type of predictor was removed

Example of set is in attachment.

Type of excluded predictor | column name in csv

1 - characters_number

2 - sentences_number

3 - words_number

4 - average_sentence_length

5 - average_sentence_words_number

6 - ratio_unique_words

7 - average_word_length

8 - ratio_word_length_[1-16]

9 - ratio_special_characters

10 - ratio_numbers

11 - ratio_punctuation_characters

12 - most_used_word_[1-4]

13 - ratio_letter_[a-z]

14 - ratio_questions;

15 - ratio_exclamations;

I have to samehow conclude why results for 1-15 for each algorithm and each set are better/worse than results in column "ALL".

But I don't have any idea why. I know that in most cases, when difference between column ALL and column [1-15] is very small (like < 1%) it is just a luck and randomness. But in cases when difference is higher, probably it is caused by something.

The most important thing - I don't know why for k-NN algorithm results are the same for columns 9-15...

And good will be to know, why Naive Bayes is the best (54%) and k-NN is a bad algorithm for this task (20%).

Can someone help me with that?

BalazsBarany · October 2019

Hi!

Some partial answers:

k-NN might maily learn from a single or few attributes only if you don't normalize the data. (It's comparing values of different attributes directly, so an attribute with a high scale (like 1000) will dominate attributes on small scales (like nominal attributes encoded as 0 or 1).) If this attribute is still in your data, the result will stay more or less the same.

Naive Bayes is frequently a good algorithm without tuning. On the other hand, there's not a lot to tune, so it's seldom the best.

If you try different pruning settings in the decision tree, you might even get a better result. You can use a building block to do it:
https://community.rapidminer.com/discussion/33910/optimize-decision-tree-and-optimize-svm

Regards,
Balázs

varunm1 · October 2019

Hello @Serek91

There is a concept in machine learning known as the interaction effect. When you analyze your predictors/features, it is not just the independent features that have an impact on algorithm learning but also due to the interaction effect. For example, let's think there are two features A and B, Now if you run your machine learning model on only A and only B then you might get average performance. If you run the algorithm on A & B in combination, you might get a good result or a really worse result, this means that A and B acted independently in a different way compared to the both A and B when combined given to an algorithm.

This is one reason to check your features using feature selection methods like forward selection or backward elimination. You can also use automatic feature engineering to check this. In your method you tried adding one feature after the other, but what if feature 3 and feature 6 works better in combination than having 1,2,3,4,5,6. This is one important reason we use feature selection. The interaction effect plays major role in traditional algorithms.

Also, did you tune the hyperparameters of these algorithms? For example, in KNN, how did you choose K value? There is an elbow technique that can be used to determine good K-Value. As @BalazsBarany mentioned, it is also important to check the hyperparameters of decision trees, like criterion, pruning (pre and post).

KNN is a lazy algorithm and depends on the K-value. If your labels or data cannot be separated in feature space, KNN misclassifies a lot. Also, you need to check what is the best value for K.

Hope this helps.

Serek91 · October 2019

Ok, thanks. I added normalization to the k-NN and now I have better results (~46%).

Normalization is not needed in rest of algorithms (Naive Bayes, Decision Tree)? I don't see any difference with and without it.

varunm1 · October 2019

I don't say it's not needed, but for KNN you will definitely find a difference with normalization. The reason is the distance calculation methods used in KNN. KNN mainly relies on surrounding data samples for prediction. There is a beautiful visual example in the StackOverflow post below.

https://stats.stackexchange.com/questions/287425/why-do-you-need-to-scale-data-in-knn

From my experience, there won't be much difference (normalization) in the decision tree as they calculate the impurity index for each attribute and branch down.

Serek91 · October 2019

Ok, thanks.

Results for k-NN are now a way better. Results for Decision Tree are a bit better, but difference is not significant. I will try a bit more to improve it.

Image: https://us.v-cdn.net/6030995/uploads/editor/fe/7wpm3cpbkleg.png

Serek91 · October 2019

Hi, I have next question:

Decision Tree - result in columns ALL and 12 are the same. Column 12 has only string values (words), not numerical. Can Decision Tree use predictors with text values? It seems that it can't.

varunm1 · October 2019

From my understanding, the text data is treated as categorical (nominal) in this case.

Serek91 · October 2019

According to docs:

This Operator can process ExampleSets containing both nominal and numerical Attributes.

So it should have some impact on final result. But result is still the same. No matter if predictor is included or not.

varunm1 · October 2019

You should see the two models and see if it has that feature/attribute in the tree. May be that attribute got pruned

Serek91 · October 2019

I made prediction only using this one parameter, and I got:

Image: https://us.v-cdn.net/6030995/uploads/editor/cp/7vzlb4klr9dr.png

Image: https://us.v-cdn.net/6030995/uploads/editor/su/ra39ik3djvej.png

Image: https://us.v-cdn.net/6030995/uploads/editor/d6/r1d9jrrm78is.png

varunm1 · October 2019

Makes sense, its zero accuracy cause it cannot predict with that one, it just randomly labeled predictions. If you want to predict from text, you should use some techniques like tokenization

varunm1 · October 2019

What is 792246? Is it a column name? I think some issue in the process structure. Not sure unless I see data and process. Based on posted picture I am bit confused. Only reasons I can think is everything go pruned due to no added value in tree or some issue in process input

Serek91 · October 2019

Ehhh... so it will be hard to do it now... I don't have time for it...

Thanks anyway.

EDIT:

What is 792246? Is it a column name? I think some issue in the process structure. Not sure unless I see data and process. Based on posted picture I am bit confused. Only reasons I can think is everything go pruned due to no added value in tree or some issue in process input

I added wrong image^^

It should be this one:

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Help with correct understanding results of classification

Best Answer

Answers

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing