The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[SOLVED] error imputing missing values using linear regression
Hi. I was assuming that this would be straightforward thing to do. I have a dataset with surprisingly few missing values in just a few of the cases, I want to compute the missing values. There is an ID field in the data but no label. I set up the following process.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>It appears to run, and when I run in debug mode it shows me the regression results for each of the 26 variables, but it appears to get to the end and throws me this error:
<process version="5.1.014">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
<process expanded="true" height="550" width="748">
<operator activated="true" breakpoints="after" class="retrieve" compatibility="5.1.014" expanded="true" height="60" name="Retrieve" width="90" x="76" y="158">
<parameter key="repository_entry" value="c14 lcq for imputation short b"/>
</operator>
<operator activated="true" breakpoints="after" class="impute_missing_values" compatibility="5.1.014" expanded="true" height="60" name="Impute Missing Values" width="90" x="313" y="255">
<parameter key="value_type" value="numeric"/>
<process expanded="true" height="617" width="950">
<operator activated="true" breakpoints="after" class="linear_regression" compatibility="5.1.014" expanded="true" height="94" name="Linear Regression" width="90" x="444" y="270">
<parameter key="feature_selection" value="none"/>
</operator>
<connect from_port="example set source" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_port="model sink"/>
<portSpacing port="source_example set source" spacing="0"/>
<portSpacing port="sink_model sink" spacing="0"/>
</process>
</operator>
<operator activated="true" breakpoints="after" class="write_excel" compatibility="5.1.014" expanded="true" height="60" name="Write Excel" width="90" x="514" y="255">
<parameter key="excel_file" value="C:\Documents and Settings\ckolar\My Documents\data model\lcq\c14 missing values mputed.xls"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Impute Missing Values" to_port="example set in"/>
<connect from_op="Impute Missing Values" from_port="example set out" to_op="Write Excel" to_port="input"/>
<connect from_op="Write Excel" from_port="through" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
That's all I get in verbose mode. Any suggestions would be appreciated, this is my first time trying to impute missing values so much of this is a learning exercise for me. Thanks, CK
Dec 6, 2011 6:05:32 PM SEVERE: Process failed: operator cannot be executed. Check the log messages...
Dec 6, 2011 6:05:32 PM SEVERE: Here: Process[1] (Process)
subprocess 'Main Process'
+- Retrieve[1] (Retrieve)
+- Impute Missing Values[1] (Impute Missing Values)
subprocess 'Replacement Learning'
==> | +- Linear Regression[26] (Linear Regression)
+- Write Excel[0] (Write Excel)
Dec 6, 2011 6:05:32 PM FINER: Parameter 'send_mail' is not set. Using default ('never').
Dec 6, 2011 6:05:32 PM SEVERE: java.lang.NullPointerException
Tagged:
0
Answers
in your posted XML code the last lines are missing. Can you please post you complete process setup?
Kind regards,
Marius
Are you using a current version of RapidMiner? If yes, the problem probably only occurs with your data, and a minimum set of data with which the error occurs would be helpful. Another helpful thing is the "Show Details" button in the error dialog you should get in debug mode. Please hit it and paste the stacktrace here.
Cheers, Marius
Still not seeing an obvious mistake. Here is one moment of brokenness from the log window: The missingXML ending is:
</process>
</operator>
</process>
C
Best regards,
Marius
EDIT: just saw your PN, trying with your data right now.
PS MissingValueImpution should read MissingValueImputation
generally you are right, of course a regression needs a label. The Impute Missing Values operator however iterates attributes with missing values. It temporarily defines the current attribute as label, splits the dataset in examples with and without missing values, learns a model on the complete examples and applies it on the examples with missing values.
When all attributes with missing values have been treated, the original label (if present) is restored.
Now the problem was indeed that the cgkolar's dataset did not contain a label, because there was a bug in Impute Missing Values. I just fixed that bug, the fix will be included in the next release. Until then, the process below can be used as a workaround.
Cheers,
Marius