The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Classification model problem

happy_neidhappy_neid Member Posts: 10 Contributor I
edited August 2019 in Help

Hello everyone.
I made a classification model using the decison tree. But when i apply it, it gives me the same prediction with the same confidence level, for every example, like in this picture i posted. Can anyone tell which mistake could cause thisto happen? Thank you. why.JPG

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist

    Hi,

     

    can you have alook at your tree? Could it be that it simply does not split? What happens if you deactivate pruning and prepruning in the tree?

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    I can't tell from your snapshot but does your label column contain all missing values?

  • happy_neidhappy_neid Member Posts: 10 Contributor I

    Thank you for your answer. There are no missing values.

  • happy_neidhappy_neid Member Posts: 10 Contributor I

    Thank you for your answer. I will try that.

    Here you can see how my process looks like, and decesion tree as well. I'm trying to do text mining but i am a beginer, so i don't know  too much about it.
    First i tried to run a process without connecting wordlist from the first Process Docs from Data operator to the second one, and then i've got an error message that says- atributes dont match. And then i connected those two, so now i have a problem that made me come herefrm.JPGProcess without connectionrm.JPGProcess with connectionrmm.JPGtree descriptionrmmm.JPGTree graph

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    According to what you show, your Decision Tree doesn't split. Wrap that DT into a Cross Validation operator and measure hte performance. My guess is that it'll classify the majority of your "0" class incorrectly. 

  • happy_neidhappy_neid Member Posts: 10 Contributor I

    That's true. But can you tell me why does that happen, is there a way to fix it?
    Thank you very much.


  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist

    Hi,

     

    just google for decision tree and pruning. Your tree got simply too much pruned. Most likely you need to reduce the min_gain to 0.001.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • happy_neidhappy_neid Member Posts: 10 Contributor I

    Thank you very much!  :)

    Regards,
    Nada

  • happy_neidhappy_neid Member Posts: 10 Contributor I

    Hi, me again.

    I tried to set min. gain to 0.01, it still won't split. Also i tried to turn off pruning and prepruning, but it still won't work. I have no idea what can do about that?

    Regards,
    Nada

  • happy_neidhappy_neid Member Posts: 10 Contributor I

    Now i see that i misunderstood you. Does my Unlabeled dataset should or should not contain label column with empty cell?

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Your tree sounds like it is failing to find any attributes that provide a meaningful split to separate the labels.  Did you try any of the other criterion (information gain, gini index) and also the confidence parameter? 

    You might want to see whether your attributes have any predictive relationship with your label.  Try a simpler approach like some of the "weight" operators first, like weight by information gain or weight by gini index.  That will show you whether you have attributes that can separate your classes at all.  You can also run a simple Naive Bayes model and look at the output, which will show the class distributions. If they are not distinct, then your decision tree is not going to find anything to use for a split.

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.