The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Unexpected predictions
dataminer99
Member Posts: 3 Contributor I
Hello,
Despite seeing good predictions (~70% accuracy) from my training and validation sets, I am having trouble scoring my records for use. I have 250K records to score and 99% of them have the same prediction result (Y) and identical confidence scores. Yes confidences in the scored data set are always 0.818. No confidences are always 0.182.
My expectation is the predictions and associated confidences will not be identical, as they are when I score my data. I have replaced my actual / real data with the "Generate Direct Mailing Data" operator in every process. Unfortunately the generated data produces consistent training, validation, and scoring data throughout. I.e. no problems. My real training data set has 44,000 records; 2 special attributes (1 ID, 1 nominal label) and 66 regular attributes (26 integer, 12 nominal, 28 real). I would have added code from my 4 processes but it caused this message to exceed the 20K character limit. Any suggestions are much appreciated! Mike
Despite seeing good predictions (~70% accuracy) from my training and validation sets, I am having trouble scoring my records for use. I have 250K records to score and 99% of them have the same prediction result (Y) and identical confidence scores. Yes confidences in the scored data set are always 0.818. No confidences are always 0.182.
My expectation is the predictions and associated confidences will not be identical, as they are when I score my data. I have replaced my actual / real data with the "Generate Direct Mailing Data" operator in every process. Unfortunately the generated data produces consistent training, validation, and scoring data throughout. I.e. no problems. My real training data set has 44,000 records; 2 special attributes (1 ID, 1 nominal label) and 66 regular attributes (26 integer, 12 nominal, 28 real). I would have added code from my 4 processes but it caused this message to exceed the 20K character limit. Any suggestions are much appreciated! Mike
0
Answers
if you use accuracy as a performance measure, you will have to compare it with the default accuracy, that means: What accuracy would you have, if you always say it's the most frequent label. If your "yes" examples cover 70% of your examples, an accuracy of 70% does not sound too good
Greetings,
Sebastian