The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Impute missing values using a saved model"
jmrichardson
Member Posts: 5 Contributor II
Hello,
I am trying to impute missing values using knn learner. I am working with a large dataset and saved the model. Now, I want to use the saved model for new (unseen) data in the impute operator. This is because the new data is a much smaller sample size. Unfortunately, I cannot get the saved model to impute the dataset. Can someone please help me. Here is what I am trying to do but does not work:
John
I am trying to impute missing values using knn learner. I am working with a large dataset and saved the model. Now, I want to use the saved model for new (unseen) data in the impute operator. This is because the new data is a much smaller sample size. Unfortunately, I cannot get the saved model to impute the dataset. Can someone please help me. Here is what I am trying to do but does not work:
Thanks in advance!
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Labor-Negotiations" width="90" x="313" y="30">
<parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
</operator>
<operator activated="true" breakpoints="after" class="impute_missing_values" compatibility="5.3.008" expanded="true" height="60" name="Impute Missing Values" width="90" x="514" y="30">
<process expanded="true">
<operator activated="true" class="read_model" compatibility="5.3.008" expanded="true" height="60" name="Read Model" width="90" x="246" y="30">
<parameter key="model_file" value="C:\Users\John Richardson\Desktop\test"/>
</operator>
<operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model" width="90" x="380" y="165">
<list key="application_parameters"/>
</operator>
<connect from_port="example set source" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Read Model" from_port="output" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="model" to_port="model sink"/>
<portSpacing port="source_example set source" spacing="0"/>
<portSpacing port="sink_model sink" spacing="0"/>
</process>
</operator>
<connect from_op="Labor-Negotiations" from_port="output" to_op="Impute Missing Values" to_port="example set in"/>
<connect from_op="Impute Missing Values" from_port="example set out" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
John
Tagged:
0
Answers
I made the attached work - not sure what your error was regards
Andrew
Thank you for your quick reply. I checked your code and it does work. However, I am not sure if it is what I am trying to accomplish. I would like to have the saved model from within the impute operator to be used later. Here is the code (using the tutorial) which appears to be saving the model correctly.
Now, here is the code that I am using which calls the saved model and tries to impute the same data set (using the saved model). This appears to work on all fields except bi-nomial classes (education-allowance and longterm-disability-assistance). All the other fields were imputed except for these 2. It almost seems to skip over these for some reason? Thanks again for your help,
John
As it happens the Impute Missing Values operator is a complex beast and my example is not likely to be of much use.
The operator iterates for all attributes which contain missing values and builds a prediction model using it as the label. In this case, it will iterate 16 times. One of the parameters is "learn on complete cases" which means that only data that has no missing values is used to train the model. For the data in this case there is only one example that meets this criterion.
The net result is that each iteration will create a model based on one row of training data and the last will be stored in the repository. This means that when used multiple times in the later iteration, it will be difficult to predict how the model will behave given that the attributes used to build it will generally be different. I have a suspicion that binominal predictions will be difficult if only a single row of training data is used. The lack of training data will also cause an issue of overfitting but this is a different problem.
There are two things to do. Firstly, create and increment a macro that keeps track of the attribute that is being used temporarily as a label and use that in the name of the model to be saved. Later in the second imputation loop, increment another macro to ensure the correct model is recalled.
Secondly, uncheck the "learn on complete cases" parameter. This will drag in more training data but care is needed if the model is poor at handling missing values. I believe k-nn is neither particularly good nor particularly bad when handling missing values. As usual with data mining, it depends what you are trying to achieve when working out how to get the best from your data.
Hope that helps...
regards
Andrew
Ok, I understand your solution. However, I am having a problem generating the process in Rapidminer. I have attached the "extract macro" to the base learner of the impute operator to try to extract the label in which to use in the write model operator (for each iteration). However, I am not able to figure out how to extract the label in a macro for each iteration. Can you please help me with this? You have been so helpful so far, I am hoping you can get me past this last hurdle.
Thanks again,
John
OK - here's a simple example that uses macros to cause differently named models to be written to c:\temp for later reading. regards
Andrew
THANK YOU, THANK YOU!
You are awesome!
John