The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to " tell" RM what to use for training/ testing
Dear Miners,
Please help me to get the hang of this.
I have a desicion tree model with a set of data that has about 60 000 rowas of data with the lable attribute and 15 000 without. I assumed/ wanted the data with the lable attribute to be the training data and the rows with missing lable attribute should be the test values ( wich I want to export at the end for external validation)
Now my export only has 5900 rows of data and it seems not to use the " empty" rows for test, but replace missing values with mean value per default option and split the whole data into test and training set.
I am wondering how to fix this issue, without having to disassembke the entire design ( which would be painfull, since I already incorporated the modle outcome in my thesis draft)
Could you please help me?
Kind regards
A data science newbie
Please help me to get the hang of this.
I have a desicion tree model with a set of data that has about 60 000 rowas of data with the lable attribute and 15 000 without. I assumed/ wanted the data with the lable attribute to be the training data and the rows with missing lable attribute should be the test values ( wich I want to export at the end for external validation)
Now my export only has 5900 rows of data and it seems not to use the " empty" rows for test, but replace missing values with mean value per default option and split the whole data into test and training set.
I am wondering how to fix this issue, without having to disassembke the entire design ( which would be painfull, since I already incorporated the modle outcome in my thesis draft)
Could you please help me?
Kind regards
A data science newbie
1
Answers
Did you filter the data using "Filter Example" operator with a Condition Class "no_missing_lables".? This will separate the data with missing labels and no-missing labels. You can use the labeled data from training and unlabeled for testing. Below is the XML code and I also attached dataset with missing labels for you to test this code. To do this, you need to download dataset attached. Now in the XML window of rapidminer process copy the below code and paste it, then click on the green tick mark. Now you will see Read CSV operator in process, in the parameter options, point CSV file parameters to the data set you downloaded. If you are unable to find XML window, go to view --> Show Panel --> XML.
Filter example operator output ports, "exa" is the output port related to filtered values (in our case no missing labels), "unm" are unmatched values, in our case data with missing labels.
Hope this helps, please inform if you need more help.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
thanks for taking your time. Unfortunatelly, your xml example is not helping me with my problem.
I already apply the "filter example" filter create my model. However, I cannot find how to feed the "remaining"/filtered out data back into the model to test them and give me the test set results in a separate file/report.
Kind regards
Dis you try the apply model operator? That is the one applying the trained model on test dataset and providing us with the prediction. Can you provide your XML code to check?
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing