Which performance of those operators is the now valid one?
hi,
in my process, I have a optimize Parameter operator, inside it a X-Validation with MetaCost, Adaboost and WREP Tree...(picture):
I use different parameters for M between 2 and 5 and V between 0.001 and 0.1 (3 and 5 steps).
In the results perspective from the log operator (That comes just after the X-Validation operator), I get different values for performance:
The thing is, I don't know which performance I should use, or which is representative,the kappa and performance column is from the performance (Classification) operator which is inside the X-Validation, (besides, what does "main Criterion" inside the Performance(Classification) operator mean?).
The val_perf column is from the X-Validation parameter with value "performance". The val_perf3 is from X-Validation with performance3... I asked the question before, but I'm not sure if I understood that correct, what does "performance,performance1, performance2, performance3" in the X-Validation mean (see screenshot)?
and finally, I got the performance from "Optimize Parameter Grid" operator:
so which of the 3 performances are the most "representative" now for my dataset? that from Performance(Classification) , X-Validation or Optimize Parameter operator? and should I use "Performance", or accuracy or kappa ? or what is best to decide if my model is a good one for data classification?
Screenshot from X-Validation:
Best Answer
-
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
Hi,
Use the performance of the Optimize Parameters operator - this is the one which is the result of the parameter settings you have been optimizing for so there is a direct relationship between the chosen parameters and the performance for this parameter set.
The different performance for the cross validation are the main criterion (performance) as well as up to three other performance measurements you might have defined in the Performance operator you have used. Typically you should only care about the main performance so going with "performance" for logging is fine.
But in order to make a statement like "my model will be x% accurate" you should just go with the performance delivered by the Optimize Parameters.
Cheers,
Ingo
1
Answers
can somebody explain to me the different performance values ? anybody got an idea?