The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"seemingly inconsistent result in prediction with Decision Tree MetaCost"
I've built a process that intends to predict customers that are likely to churn (i.e. leave service provided by a company). I used two DT (decision tree) algorithm implementations based on C4.5 - the one of RM, and J48 of Weka. In particular DTs are useful in profiling the potential churners here, such that you learn about their characteristics. Meta learning via using the MetaCost operator was included to encourage the two algorithms to detect more possible churners. Just playing around with the parameter tuning, I discovered some abnormality:
RM's C4.5 implementation generated a tree formed by the root only: decision churn=No (expected because most customers do not churn). This is not a problem in itself and can be changed easily if you retune parameters, in particular the minimal gain. However what is a problem is that although the prediction with this tree is to be No for all instances, few instances are predicted Yes ...
Since the tree has just one node, the confidence for No is the same for all instances, and is equal to 0.726. I hardly see under these circumstances why a few instances, with the same confidence of 0.726 for No as any other instance, are predicted Yes.
Another inconsistency: the confidences for the classes No and Yes for one instance do not add up to 1.
The dataset is not publicly available for the process to be tested, but if one wants to check this likely inconsistency, I could make available the image files of the instances' scores that sufficiently illustrate what said above, the confusion matrix, the tree (however I'm not sure if the insert image button works for the posting).
Cheers,
Dan
RM's C4.5 implementation generated a tree formed by the root only: decision churn=No (expected because most customers do not churn). This is not a problem in itself and can be changed easily if you retune parameters, in particular the minimal gain. However what is a problem is that although the prediction with this tree is to be No for all instances, few instances are predicted Yes ...
Since the tree has just one node, the confidence for No is the same for all instances, and is equal to 0.726. I hardly see under these circumstances why a few instances, with the same confidence of 0.726 for No as any other instance, are predicted Yes.
Another inconsistency: the confidences for the classes No and Yes for one instance do not add up to 1.
The dataset is not publicly available for the process to be tested, but if one wants to check this likely inconsistency, I could make available the image files of the instances' scores that sufficiently illustrate what said above, the confusion matrix, the tree (however I'm not sure if the insert image button works for the posting).
Cheers,
Dan
Tagged:
0
Answers
it's a little bit difficult for me to check if there is a bug in the code or which other reason this behavior might have without the process and the data.
Did you try to build a process reproducing this behavior with only data generators?
PS: Did you thought about becoming an enterprise customer? We could sign a NDA or make a webex session to make a reliable diagnostic and solve the problem.
Greetings,
Sebastian
Thanks, I am not yet at this stage of becoming a RM enterprise customer. However, as anybody here, I am happy to bring my small contribution in improving RM, in the meanwhile, by spotting whatever inconsistency/bug I may find to the wonderful RM team. I could email the image files, and also the process, if this may be of any help, but unfortunately not the dataset which, as said, is not publicly available. Please let me know if you want me to do that. For my work, that's fine, as I dispose of alternative DM software. For now I am just exploring RM and other DM suites in view of possible future critical use.
Best wishes,
Dan
I'll try once and if by chance (which probabilistically speaking is very small any way) the same problem repeats with generated/artificial data, I will let you know.
please email me the process and the pictures. Might be I will find some time to look at it this (comparably relaxed) week.
I will pm you my email address.
Greetings,
Sebastian