Classification model problem

happy_neid · January 2017

Hello everyone.
I made a classification model using the decison tree. But when i apply it, it gives me the same prediction with the same confidence level, for every example, like in this picture i posted. Can anyone tell which mistake could cause thisto happen? Thank you.

MartinLiebig · January 2017

Hi,

can you have alook at your tree? Could it be that it simply does not split? What happens if you deactivate pruning and prepruning in the tree?

~Martin

Thomas_Ott · January 2017

I can't tell from your snapshot but does your label column contain all missing values?

happy_neid · January 2017

Thank you for your answer. There are no missing values.

happy_neid · January 2017

Thank you for your answer. I will try that.

Here you can see how my process looks like, and decesion tree as well. I'm trying to do text mining but i am a beginer, so i don't know too much about it.
First i tried to run a process without connecting wordlist from the first Process Docs from Data operator to the second one, and then i've got an error message that says- atributes dont match. And then i connected those two, so now i have a problem that made me come hereProcess without connectionProcess with connectiontree descriptionTree graph

Thomas_Ott · January 2017

According to what you show, your Decision Tree doesn't split. Wrap that DT into a Cross Validation operator and measure hte performance. My guess is that it'll classify the majority of your "0" class incorrectly.

happy_neid · January 2017

That's true. But can you tell me why does that happen, is there a way to fix it?
Thank you very much.

MartinLiebig · January 2017

Hi,

just google for decision tree and pruning. Your tree got simply too much pruned. Most likely you need to reduce the min_gain to 0.001.

~Martin

happy_neid · January 2017

Thank you very much!

Regards,
Nada

happy_neid · January 2017

Hi, me again.

I tried to set min. gain to 0.01, it still won't split. Also i tried to turn off pruning and prepruning, but it still won't work. I have no idea what can do about that?

Regards,
Nada

happy_neid · January 2017

Now i see that i misunderstood you. Does my Unlabeled dataset should or should not contain label column with empty cell?

Telcontar120 · January 2017

Your tree sounds like it is failing to find any attributes that provide a meaningful split to separate the labels. Did you try any of the other criterion (information gain, gini index) and also the confidence parameter?

You might want to see whether your attributes have any predictive relationship with your label. Try a simpler approach like some of the "weight" operators first, like weight by information gain or weight by gini index. That will show you whether you have attributes that can separate your classes at all. You can also run a simple Naive Bayes model and look at the output, which will show the class distributions. If they are not distinct, then your decision tree is not going to find anything to use for a split.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Classification model problem

Answers