The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
ROC curves Threshold
Legacy User
Member Posts: 0 Newbie
Ingo and team,
How do you get rapid miner to output the threshold from ROC curves?
I'm trying to boot strap a dataset, to output the ROC area and also threshold.
At the moment, the threshold datawriter will give the threshold, but if we wish to repeat this 100 times, and to calculate the confidence interval of the bootstrapped threshold, is there an easy way to output threshold into the performance evaluator?
Thanks,
Leon
How do you get rapid miner to output the threshold from ROC curves?
I'm trying to boot strap a dataset, to output the ROC area and also threshold.
At the moment, the threshold datawriter will give the threshold, but if we wish to repeat this 100 times, and to calculate the confidence interval of the bootstrapped threshold, is there an easy way to output threshold into the performance evaluator?
Thanks,
Leon
Tagged:
0
Answers
not sure if you want to come up with a result similar to the one depicted in the attached picture. It shows the ROC curve together with the confidence thresholds curve for a repeated run. The transparent regions show the standard deviation regions around the mean values (plotted with a solid line).
If yes, this is possible with the latest CVS version now.
Cheers,
Ingo
[attachment deleted by admin]
I've tried adding:
<parameter key="calculate_confidences" value="true"/>
to Performance & ROC, but I am still not seeing the confidence band.
What operator is it added to?
Thanks!
Not sure what you want to achieve and what ingo meant with "this is possible with the latest CVS version now".
Is this helpful ? regards,
Steffen
Thank you for your help! This isn't quite what I'm looking for, but in the right direction.
the image Ingo placed earlier on this thread show a lightly colored band around the ROC lines. I believe this represents a measure of confidence / precision at each threshold along the curve. I can't figure out how to turn this band on. I'd also like to know exactly what it represents; how, for instance, does it relate to the precision vs. threshold plot you provided?
I assume it is related to the confidence in the performance log. I'd like to know where this confidence comes from and how it is represented in the plot. For instance, is it the height of the shaded region?
<com.rapidminer.tools.math.ROCPoint id="246">
<falsePositives>139.0</falsePositives>
<truePositives>68.0</truePositives>
<confidence>0.19407850064775464</confidence>
</com.rapidminer.tools.math.ROCPoint>
Regarding "what is plotted"
[quote author=Ingo Mierswa]
The transparent regions show the standard deviation regions around the mean values (plotted with a solid line).
[/quote]
Mean and standard deviation according to either the roc-curve (red line) or threshold (blue line). It is related to precision in that way, that for each given threshold you can calculate true positives and true negatives and hence the precision.
I assume that you know how roc-curves are calculated, otherwise I recommend this excellent paper (here). For further clarification, please note the difference between a process using XValidation (which calculates a reliable estimate of the performance) and no validation at all (as in my demo process above), which causes severe overfitting and which was meant for demonstration purpose only.
Regarding "what is saved"
Your example represents the false positives and true positives at the given threshold (stored as "confidence" in the xml-file). If you have created multiple plots, you will gain multiple entries like this which allows you to calculate mean and deviation. Which leads us to the last question:
Regarding "how to turn on the band"
The band can be "turned on" by calculating more than one roc-curve, so that an average can be calculated. Compare this examples:
One curve: More than one curve: hope this was helpful
Steffen
PS: To gain the single values which allow the calculation of the band in a table, you have to perform a process like this:
Thank you very much! This clears up all my questions. I was confused about Ingo's standard deviation comment when I first found this thread. Now that I see this relates to multiple runs everything is clear.
Thank you again for your effort and response!
brian