The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Performance Vector of Decision Trees
Hello,
I think I've some understanding problems regarding the performance vector of a decision tree.
I've a training data set with 16 records, which are categorized negative or positive.
I created a process and rapidminer created a new decision tree, which classifies each record correctly. ( I even checked every record manually by myself.)
Now I'd like the system to check the performance, so i added a "nominal cross validation".
Then the system reproduces the same decision, but the performance vector of this tree says, that both recall and precision are not 100%.
What's the reason for it?
I've checked the dataset manually and the decision tree seems to be allright for that specific dataset. But If i used this validation function, it says it's not?
I'dont understand this atm.
Would you be so nice and try to explain it to me?
Regards
auxilium
I think I've some understanding problems regarding the performance vector of a decision tree.
I've a training data set with 16 records, which are categorized negative or positive.
I created a process and rapidminer created a new decision tree, which classifies each record correctly. ( I even checked every record manually by myself.)
Now I'd like the system to check the performance, so i added a "nominal cross validation".
Then the system reproduces the same decision, but the performance vector of this tree says, that both recall and precision are not 100%.
What's the reason for it?
I've checked the dataset manually and the decision tree seems to be allright for that specific dataset. But If i used this validation function, it says it's not?
I'dont understand this atm.
Would you be so nice and try to explain it to me?
Regards
auxilium
Tagged:
0
Answers
Taking into account what few information you have given the result is not too surprising to me. Depending on how many examples you use for training and testing there might be significant influence of statistical fluctuations on the decision tree, for both training and testing. Maybe you could post the confusion matrix. And then maybe run a Split-Validation and post the confusion of that one as well. Perhaps one can gain some more insight from that.
Timbo
You said that the very same tree is created - that's true for the "model" output of the X-Validation, since it creates the model on all of the data. But as stated above, in each iteration the X-Validation creates a tree on the current subset, which usually differs from the model on all data. Try to set a breakpoint inside the X-Val.
Best regards, Marius