The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How can I apply my model with optimize parameters on a test set?
Samira_123
Member Posts: 9 Learner II
in Help
Hello,
I have a question regarding my classification assignment. I have to predict whether or not donors will donate (class 0 and class 1).
I built a model thanks to the 'optimize parameters' (as it was advised here) and I used the random forest. I got a relevant kappa, a good coefficient matrix and a costs matrix.
The performance of the model is satisfying but I have an issue.
I want to 'Apply Model' on a test set (from read csv) with the model I built with optimize parameters. However, when I try to apply the model to get the predictions of the this test set, there is an issue with rapid miner.
I need to apply the model on this test set to get the class predictions of the donors but unfortunately I can't.
I tried to find information online but didn't find anything relevant. The way I proceed is perhaps not correct.
Once I get the class predictions from this test set, I have to use the write csv.
Thank you,
Wish you all a good weekend!
I have a question regarding my classification assignment. I have to predict whether or not donors will donate (class 0 and class 1).
I built a model thanks to the 'optimize parameters' (as it was advised here) and I used the random forest. I got a relevant kappa, a good coefficient matrix and a costs matrix.
The performance of the model is satisfying but I have an issue.
I want to 'Apply Model' on a test set (from read csv) with the model I built with optimize parameters. However, when I try to apply the model to get the predictions of the this test set, there is an issue with rapid miner.
I need to apply the model on this test set to get the class predictions of the donors but unfortunately I can't.
I tried to find information online but didn't find anything relevant. The way I proceed is perhaps not correct.
Once I get the class predictions from this test set, I have to use the write csv.
Thank you,
Wish you all a good weekend!
Tagged:
0
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn@Samira_123,
The attributes in your training set and in your test set must be strictly the same.
In other words, all the preprocessing steps you applied to your training set before modelling have to be performed in your test set too.
I see, especially, that, in your training set, you use :
- 2 Generates Attributes operators : you have to generate the same attributes in your test set before scoring.
- a Nominal to Numerical operator : This operator performs "one hot encoding" on your training set and it generates new attributes.
You have to apply this operator to the attributes concerned in your test set too according the following principle :
Regards,
Lionel5
Answers
You can use, in your training process , the Store operator to store your trained model in the RapidMiner repository :
Then open a new process and retrieve the model from the RapidMiner repository and use it to score your test set via Apply Model operator :
In attached files, the 2 processes using the Titanic datasets (training and testing).
If you still encounter an error after performing the proposed solution , please describe your issue, share your process and your data in order we can reproduce, understand and fix your issue.
Regards,
Lionel
Thank you for your answer
I did these steps. Here you can find the screenshots and the datasets of my model and my database. I had to join the first 3 tables to build my model, then I needed to use donors to predict as my test set (there is only one column in this dataset 'potentional donors').
I did this initally but there is still an issue
As there is only one column in donors to predict, I should have joined the 4 tables in the beginning instead of joining only 3 of them.
I was just afraid to biaised my model by using it in the beginning but I use split data in the optimizer process.
Thank you for your answer