The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Different cost of misclassification for every single example – how to implement?
Hello all!
Please advise on the following problem.
[Dataset description]
Dataset contains about 10 thn examples with about 15-20 numeric attributes (not sure that all of them are useful, many are correlated, so this amount can be diminished to 7-10 attributes). Target (label) for these is binominal, let’s say it’s ‘One’ and ‘Two’.
By nature the data is quite noisy which means that it’s normal to talk about probability that case X belongs to class ‘One’ or ‘Two’. The normal expected probability is not expected to be higher than 0.75 (lower than 0.25) for 95% of cases.
The above is pretty typical. Now what I believe is special: for every case in dataset there is a confidence threshold.
Let’s say we have case X1, it’s actual class is ‘One’, threshold is T1. If we learn a model that predicts case X1 to be class ‘One’ with confidence P1, and P1 > T1 then Cost = 0. But if P1 < T1 then Cost = 1/T1.
For case X2 actual class is ‘Two’, threshold is T2. If a model predicts case X2 to be class ‘One’ with confidence P2, and P2 > T2 then Cost = 1/(1-T2). But if P2 < T2 then Cost = 0.
[Problem]
Normally I use some standard classifier like kNN which provides me with confidence output. Then I compare those confidences computed with my thresholds for every case (I do it with Excel). And this gives me final estimation for training set and for testing set. Then I compare performances and decide whether model is good or bad.
What I want is to make a model minimize the Cost function, which is sum of Costs for every case in set.
Does anyone know how It can be done with Rapidminer? I will appreciate any feedback.
Please advise on the following problem.
[Dataset description]
Dataset contains about 10 thn examples with about 15-20 numeric attributes (not sure that all of them are useful, many are correlated, so this amount can be diminished to 7-10 attributes). Target (label) for these is binominal, let’s say it’s ‘One’ and ‘Two’.
By nature the data is quite noisy which means that it’s normal to talk about probability that case X belongs to class ‘One’ or ‘Two’. The normal expected probability is not expected to be higher than 0.75 (lower than 0.25) for 95% of cases.
The above is pretty typical. Now what I believe is special: for every case in dataset there is a confidence threshold.
Let’s say we have case X1, it’s actual class is ‘One’, threshold is T1. If we learn a model that predicts case X1 to be class ‘One’ with confidence P1, and P1 > T1 then Cost = 0. But if P1 < T1 then Cost = 1/T1.
For case X2 actual class is ‘Two’, threshold is T2. If a model predicts case X2 to be class ‘One’ with confidence P2, and P2 > T2 then Cost = 1/(1-T2). But if P2 < T2 then Cost = 0.
[Problem]
Normally I use some standard classifier like kNN which provides me with confidence output. Then I compare those confidences computed with my thresholds for every case (I do it with Excel). And this gives me final estimation for training set and for testing set. Then I compare performances and decide whether model is good or bad.
What I want is to make a model minimize the Cost function, which is sum of Costs for every case in set.
Does anyone know how It can be done with Rapidminer? I will appreciate any feedback.
0
Answers
Maybe the custom cost function for the labels would be a good option for you? The Performance (Costs) operator does that (you can specify the cost for each misclassification in a matrix), and there is also a meta-modelling operator called MetaCost.
Hope this helps, gabor
Performance doesn't really help as I want to learn using these costs, not just to estimate the result.
Or maybe I have to think of how to reformulate my task to make it solvable by existing algorythmes.