The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Precision-Recall Curves
I am following up on a previous discussion on Precision-Recall Curves. I cannot seem to edit or reply to it, so am creating a new thread with the same Subject.
I have attached the test process as XML which uses the Ripley dataset for example purpose. In a nutshell, I have optimized the threshold based on f-measure and then output the model performance for that "best threshold". Also, I have used the "ROC Curve to Example" operator from the Converters extension to generate 500 datapoints as an ExampleSet. I notice some discrepancy in the results which is intriguing. If someone could help reconcile it, it would be much appreciated.
I have attached the test process as XML which uses the Ripley dataset for example purpose. In a nutshell, I have optimized the threshold based on f-measure and then output the model performance for that "best threshold". Also, I have used the "ROC Curve to Example" operator from the Converters extension to generate 500 datapoints as an ExampleSet. I notice some discrepancy in the results which is intriguing. If someone could help reconcile it, it would be much appreciated.
- The best threshold is found out to be 0.642, which corresponds to a confusion matrix with TP=112, TN=104, FN=13, FP=21 with recall=TPR=89.6%, specificity=83.2%, FPR=1-specificity=16.8%, and F-measure=86.82%
- Now, the "ROC Curve to ExampleSet" output lists the different FPRs, TPRs, and confidence thresholds. I find that for TPR=89.6% and FPR=16.8%, the confidence (threshold) is listed as 0.356 which is puzzling. Also, for the confidence (threshold) of 0.643, TPR=66.4% and FPR=4.2%.
- Why do the values from steps 1 and 2 not match up?
0
Best Answer
-
amitd Member, University Professor Posts: 49 MavenI think I figured it out. I had swapped the "first class" and "second class" parameters in the Create Threshold operator nested in the Optimize Parameters (Grid) operator. It is counterintuitive, but apparently "first class" has to be set to be the negative class (in this example 0) and "second class" has to be set to the positive class (in this example 1).2