The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Whats wrong with this model application
michaelhecht
Member Posts: 89 Maven
Hello,
I just wanted to find the optimum parameter set on labeled data and afterwards to apply it on new, unlabeled data.
The data has two columns x and y (called OCM here). Reading only one column for the application of model
failed, i.e. RM told me that two columns are needed (Im sure this is a beginners error ). Therefore I applied
a two column file where I set all y-values to zero. As a result I got no prediction on the x-values but all values zero.
Hmmm ... I don't really understand how RM "thinks", so what's wrong?
Here is the code:
I just wanted to find the optimum parameter set on labeled data and afterwards to apply it on new, unlabeled data.
The data has two columns x and y (called OCM here). Reading only one column for the application of model
failed, i.e. RM told me that two columns are needed (Im sure this is a beginners error ). Therefore I applied
a two column file where I set all y-values to zero. As a result I got no prediction on the x-values but all values zero.
Hmmm ... I don't really understand how RM "thinks", so what's wrong?
Here is the code:
And here is the really simple data:
<operator name="Root" class="Process" expanded="yes">
<operator name="MemoryCleanUp" class="MemoryCleanUp">
</operator>
<operator name="SimpleExampleSource" class="SimpleExampleSource">
<parameter key="filename" value="X:\HE\ModelleUntersuchungen\DataMining\PolyNomApproximation\ozm_svm.txt"/>
<parameter key="read_attribute_names" value="true"/>
<parameter key="label_name" value="OCM"/>
<parameter key="label_column" value="2"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="GridParameterOptimization" class="GridParameterOptimization" expanded="yes">
<list key="parameters">
<parameter key="Learner.N" value="true,false"/>
<parameter key="Learner.U" value="true,false"/>
<parameter key="Learner.R" value="true,false"/>
<parameter key="Learner.M" value="[4.0;8.0;4;linear]"/>
<parameter key="Learner.L" value="true,false"/>
</list>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<operator name="Learner" class="W-M5P">
<parameter key="keep_example_set" value="true"/>
<parameter key="M" value="8.0"/>
</operator>
<operator name="OperatorChain (3)" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="Performance" class="Performance">
</operator>
</operator>
</operator>
</operator>
</operator>
<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
<operator name="SimpleExampleSource (2)" class="SimpleExampleSource">
<parameter key="filename" value="X:\HE\ModelleUntersuchungen\DataMining\PolyNomApproximation\ozm_svmTest.txt"/>
<parameter key="read_attribute_names" value="true"/>
<parameter key="label_name" value="OCM"/>
<parameter key="label_column" value="2"/>
</operator>
<operator name="ParameterSetter" class="ParameterSetter">
<list key="name_map">
<parameter key="Learner" value="Applier"/>
</list>
</operator>
<operator name="Applier" class="W-M5P">
<parameter key="keep_example_set" value="true"/>
<parameter key="N" value="true"/>
<parameter key="U" value="true"/>
<parameter key="R" value="true"/>
<parameter key="L" value="true"/>
</operator>
<operator name="ModelApplier (2)" class="ModelApplier">
<list key="application_parameters">
</list>
<parameter key="create_view" value="true"/>
</operator>
</operator>
</operator>
CAE OCM
0 0.482
0.02 0.460
0.03 0.414
0.04 0.365
0.05 0.323
0.06 0.352
0.07 0.479
0.08 0.470
0.09 0.550
0.1 0.563
0.11 0.545
0.12 0.669
0.13 0.608
0.14 0.599
0.15 0.546
0.16 0.508
0.17 0.455
0.18 0.424
0.19 0.443
0.2 0.459
0.21 0.412
0.22 0.427
0.23 0.429
0.25 0.478
0.26 0.477
0.27 0.475
0.46 0.397
0.47 0.371
0.48 0.320
0.49 0.287
0
Answers
1) Find the parameters
2) Build the model with the optimal parameters
3) Apply the model to new data
Right now you have the optimal parameters from step 1). But you don't yet have the model in step 2) that enables you to generate predictions in step 3).
Something like this might be more what you're looking for. (Your original dataset was put into file RM_test_data.txt. Your new data for prediction (without OCM) was created as RM_test_data2.txt.)
so my fault was, that I applied the right operators in the wrong order, did I?
But what doesn't work is, to apply the operator to a file with only one column, i.e.
without a y-column. Is there a hint or does I always have to provide a dummy y-column?
I was happy too early. I forgot to change the original file which I modified to make my
workflow working.
After I removed the OCM column, I get again the error:
Could not read file 'c:\temp\RM_test_data2.txt': Number of columns in line 1 was unexpected, was: 1, expected: 2
So nothing changed?!
If I apply the workflow with the right number of columns I get the error:
Applier: Missing input: ExampleSet
in the Applier.
I use RM 4.4. So where is my problem?
Your first Operator Chain returns the optimal parameters, for your model, but not the model itself. Even though you have a W-M5P learner buried in the XValidation inside the GridParameterOptimization, the model doesn't get passed back out of the XVal node. That learner is "used up" just coming up with the parameters that you want to use with your eventual model. This is step 1.
Once you have the parameters, you need to have another W-M5P learner node downstream in the process for the ParameterSetter to work on. That's where the "real" model object gets created. You need the full dataset, including the label to train this model. This is step 2.
Once you have the model (with the optimized parameters) trained (with the training data), you're ready to predict new values. Unlike the example set used for parameter optimization or model training, the example set of new data you want to generate predictions on doesn't need a column for the label. It just needs the columns that are the inputs to the model. The act of applying the model to the new data (ModelApplier) will generate the prediction(label) column. This is step 3.
In your original description, it seemed like you thought that step 1 (parameter optimization) also generated a model that could be used for prediction, which it doesn't. You still needed to train the model using the original data set, and setting the parameters that you found in step 1.
Does that help clear things up?
Keith
Keith
and application chain. Well, now it's obvious.
What was missing in your example (in my opinion) was the example set in
"Build model with optimal parameters"
After implementing this, it works with the dummy column (see code below).
What you can also see in the code is, that I didn't miss to remove the label column
in the SimpleExampleSource (see again code below)
Nevertheless I get:
[tt]
Error in: Read in new data for prediction (SimpleExampleSource)
Could not read file 'c:\temp\RM_test_data2.txt': Number of columns in line 1 was unexpected, was: 1, expected: 2.
The given file could not be read. Please make sure that the file exists and that the RapidMiner process has sufficient privileges.
[/tt]
The data file I attached again below. Data without y column.
I found now the error which is more related to a strange behaviour of RM!!
I had a blank after the column name of the x-column. This forced the
SimpleExampleSource to "think" that there is more than one column!
After removing all trailing blanks it works.
So thank you again for the training on RM.
It's true that the RM's "Validate" shows it as missing an example set, but it's been known to be wrong, as it was in this case. "Validate" is helpful, but it also gets confused easily. It should be taken as a suggestion, not as the absolute truth. You should be able to run the process I posted, even with that apparent error message.
Keith