The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Hi, I tried to implement a test case in rapid miner.
Hi,
I tried to implement a test case in rapid miner.
1.Loaded the training data,since it's a regression model, tried with linear regression ..
2.After preprocessing the data and removing unnecessary column values, and applying the model and performance it has produced a result of decent accuracy.
3Now wanted to apply this model onto the testing data and check the performance and related attributes.
Kindly refer the attached doc containing the flow of operators used for both training and testing dataset for reaching the target values.
I have retrieved train and test data again and then gave used cross validation and applied the model
Can you please tell me if there is any way where the apply model only can be saved somewhere and then invoke it by giving the input as test data only ,without considering the training data. i have applied the entire operators used in the training data to testing data also which i feel is redundant .
Kindly help me in clarifying the same.
Thanks in advance.
Thanks in advance.
0
Best Answer
-
CKönig Employee-RapidMiner, Member Posts: 70 RM Team MemberAs a general rule, you should be applying the same preprocessing steps on both the training dataset and the testing dataset. This can make a huge difference, e.g. if you normalize the training dataset and the model expects values around 0, and then you feed it huge unnormalized numbers. It usually makes sense to put the preprocessing steps in a separate process that you can drag and drop into the training and scoring process. This also makes maintaining them much easier, since you only have to make changes in one place.0
Answers
After the validation of your model with the cross-validation operator you can use the apply model operator.
The Apply model operator have two entries mod and uni. Connect the mod output from the Cross-validation operator to the input of the Apply model operator and the validation dataset to the uni input port of the Apply model operator.
best,
Cesar
Thanks and regards,
you welcome. Yes, you are right, the Deals dataset is for training and testing and the Deals(2) dataset is for validation. Another option is to split your dataset, 90% for training and testing, and the use the other 10% for validation.
Best,
Cesar
Thank you once again for the update. In a nutshell, in rapidminer,we have to load two datasets training and testing in the same process for validating the performance of testing data,There is no option like we can save a model trained for training data and later on we can pullout the model alone for getting the result of testing data(without placing training data in the same process).Kindly confirm if my understanding is fine or i miss any operator that would do the same intended function i need.
Thanks and regards.
Thank you for clarifying the doubt. Will try this out !
Thanks and regards.
As mentioned ,i have saved the model of training set as a separate process and that of testing as another process .Then in new process i dragged these processes and combined with apply model.But the result we got as part of this is far different from the one which we got when these processes were created as a single process. Is that a possible case. or am i missing something here too .kindly find the latest doc along with this post, the original doc is already uploaded. Kindly help me in clarifying the sameFor your reference uploading both files again. result doc contains the latest changes made and rapidminer crossvalidation consists of the original process created.
Thanks and regards.
If you are using the same datasets in both cases, the results must be similar. Can you share your process and dataset?
Best,
Cesar
I want to use optimize parameters operator(Grid) for my models built. Can you please let me know should we apply all the parameters of the model for optimization at one go or apply each parameter one by one. This doubt I have as it's taking lots and lots of time for optimizing just one parameter.
Thanks and regards,
I have a scenario, where the number of datasets is 4 and the number of columns is different in each of the dataset. I need to pickup 2 columns from each of these dataset and create a new one. Can you please let me know if we have an option to achieve this..
Thanks and regards.
you can use the Select Attributes operator to select the columns (attributes) from each dataset and afther that use the Superset Operator, to joint them into a new dataset.
Best,
Cesar
Thanks, and regards,
nn_here
I tried using check outlier option in automodel tab of rapidminer.As the csv is having more than 2.5lakh rows,i decided to go with automodel .But it is taking more than 1.5hour and counting for the same. Can you please let me know if we need to go by this option or we have any other operator to satisfy the same purpose..?
Thanks in advance.
I tried with the operators you had suggested, 'you can use the Select Attributes operator to select the columns (attributes) from each dataset and after that use the Superset Operator, to joint them into a new dataset.'Can you tell me if the doubt i have is a valid one or not.I have
264960 rows in each of the dataset. Some of the values are missing. when i give superset from 2 datasets, it still shows the number as 264960.Shouldn't it display 264960*2 number of rows.KIndly correct me if my understanding is wrong.Also please find the attached process used.
Thanks and regards,
nn_here