The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Applying New Dataset on the Model
JaspreetKaur
Member Posts: 11 Contributor I
I have built a decision tree model on RapidMiner. I get an accuracy of 96.06%. Now, I have got a new dataset and I want to apply this decision tree model on my new dataset. How should I do it to confirm that my accuracy is still at least 95% with a confidence of at least 90% ?
Please advise ASAP!
Please advise ASAP!
0
Answers
You need to store the trained model in your repository using store operator.
Then you can retrieve the stored model by dragging and dropping it to the process window and connect the new dataset and this model to apply model and performance operators.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Could you help me understand how I should do this now?
You cannot get perfomance metrics without true labels. You can just make predictions on this new dataset using trained model by using apply model operator.
You can simply connect the dataset to apply model and the trained model to mod port of apply model and make prediction on new dataset.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
The new Alarm file contains 3464 records, with 453 true alarms. Now, can you help me how I should proceed?
if you have labeled data, you can validate the model predictions.
If you have unlabeled data, there is no machine learning process to validate the predictions. They are often validated in real world later.
In validation, you compare the model prediction to the actual label. If you don't have a label, you can't compare.
As @varunm1 mentioned, you're doing a validation during model building. Experience shows that this validation result is applicable to future predictions with the same model if the data doesn't change too much (e. g. there is no concept shift). If the data generating process changes (e. g. new machines are introduced, the weather becomes warmer, ... depends on your scenario), the model starts to get worse. In this case you would retrain the model with recent data when you got the labels.
Best regards,
Balázs
With the fact that I have 453 true values in the new dataset, how can I use this info to find out which records have 453 true values?
As mentioned before by @BalazsBarany and @varunm1, the usual methodology in a data science project is :
1/ to train and validate a model by using a LABELLED dataset which allows to calculate the accuracy of the model.
2/ Then apply the validated model on the new UNLABELLED dataset to perform some predictions. BUT you can not determine the exact accuracy of the model on this UNLABELLED dataset .
Anyways, I think there is a misunderstanding with the word "True", by "True" you mean the examples which have the value "True" for your predicted label ("Alarm") right ?
Thus I have applied this methodology and by training a model (Decision tree) with your LABELLED dataset (called "Alarm file") and then I have applied this model to your UNLABELLED dataset (called " New Alarm unscored file") and I have obtained the prediction for your label "Alarm" : There are 410 values equal to "True" (maybe it is from these values you are talking about) and 3054 values equal to "False". These results were obtained with a Decision tree model but with an other model you will maybe obtain 453 values equal to " True".
In attached file the process that you need from my point of view.
Hope it is clear for you now,
Regards,
Lionel
Thank you @BalazsBarany and @varunm1 !