The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Interpretation of X-Validation
christian1983
Member Posts: 11 Contributor II
Hi everybody,
I´m evaluating at the moment the quality of one classification model consisting of a neural net by applying the standard 10-fold-cross-validation.
I know how the performance vector as measure of quality is claculated after the ten time learning and testing (averaging the 10 error estimates), but how are the final weights of each node determined?
Here is the process being applied on the Iris data table:
Thank you.
I´m evaluating at the moment the quality of one classification model consisting of a neural net by applying the standard 10-fold-cross-validation.
I know how the performance vector as measure of quality is claculated after the ten time learning and testing (averaging the 10 error estimates), but how are the final weights of each node determined?
Here is the process being applied on the Iris data table:
I hope someone can help.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="395" width="620">
<operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="38" y="104">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="multiply" expanded="true" height="76" name="Multiply" width="90" x="175" y="134"/>
<operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="313" y="165">
<process expanded="true" height="391" width="294">
<operator activated="true" class="neural_net" expanded="true" height="76" name="Neural Net" width="90" x="94" y="37">
<list key="hidden_layers"/>
</operator>
<connect from_port="training" to_op="Neural Net" to_port="training set"/>
<connect from_op="Neural Net" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="404" width="346">
<operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="24" y="33">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="112" y="120"/>
<operator activated="true" class="log" expanded="true" height="76" name="Log" width="90" x="112" y="255">
<parameter key="filename" value="C:\Dokumente und Einstellungen\ich\Desktop\Test.log"/>
<list key="log">
<parameter key="Performance" value="operator.Performance.value.performance"/>
<parameter key="Round" value="operator.Apply Model.value.applycount"/>
<parameter key="Average Performance" value="operator.Validation.value.performance"/>
</list>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model" from_port="model" to_op="Log" to_port="through 1"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_port="result 1"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Thank you.
0
Answers
As I see it your example will deliver the model created in the last pass through the NN learner; I see that the help tab says I should be able to make a model on the whole data set, in which case that would be where the weights get established, but I cannot see that option in the parameters tab.
if you retrieve the model from the outgoing port of the XValidation, then a model is trained on the complete data set. You will notice that in the status bar: After learning / applying the model n times, it will be learned a n+1 time.
This behavior isn't parameter dependent anymore. It will be produced if the outgoing port is connected (and hence the model will be used later on)
Greetings,
Sebastian
Given that the performance vectors are generated over data subsets within the validation, and given that the model is generated on the entire dataset, there is every chance that the delivered model will perform better than the average of performances within the validation. It actually happens with the code posted above, and if I add some noise it becomes more obvious, It is only a matter of time before a wannabe bug hunter chews this over. But there is a real point in the title of this thread, namely how should we interpret the results of validation, what do we think we get out of it, and so on. So a quick flip to Wikipedia for a concensus view...
http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29
and a bit I find relevant is this.. I can see that the performance reported fits the bill, seen against unseen etc., but what about the model? Surely it would be better calculated in the same way, as some sort of average perhaps, or the optimum. or .. Either way the data scope of the performance and the model should be matched by default, or am I missing quite a lot ( on balance the much more entertaining and likely possibility ;D ) ?
I completely agree on Hence, the main goal is the estimation of the performance and not the creation of the model. Let's just assume, RapidMiner would not provide an output port for the complete model. What would we analysts do then? Is there a natural model which we prefer over the others?
As you have said: we have several options. I am just starting a discussion about those:
- selecting the worst one: no idea why I should do this - this model is very likely to underperform and the performance is likely to be overestimated.
- selecting the best one: very risky! What if all models are not really good but at least not predicting randomly (let's assume 55% for a stratified binominal classification) and one model is predicting at random and achieves 56% just by chance. Is this one really more suitable than the others? And additionally the performance could be underestimated (which is at least probably better than in scenario 1)
- selecting an average model: ok, but how to do this for all model classes and types? And how to ensure that we don't introduce a bias by choosing the aggregation function?
- selecting a model randomly from one of the folds: seems weird but I would prefer this directly after using the model build on the complete data set since I would expect that on average this would result in the model providing a performance closest to the estimated one if you repeat this often enough
- learning the model on the complete data set (the RapidMiner way):using all information as possible for increasing the likelihood of obtaining an optimal model. Performance is more likely to be under- than overestimated (which is better in most application areas) and - more important - in the limit the estimated performance and the performance of the model become the same (consider leave-one-out, here the difference between used training data sets is minimized)
So the question can be broken down to: which model should be used? Each analyst is free to decide for one of those or a completely different way - which is possible. I believe in the last option stated above: and that's the reason why we have implemented the convenient output behaviour the way is isCheers,
Ingo
Point taken, but should it not be made clear that the performance delivered is not that of the model delivered? Folks could easily get confused...
To conclude, I would suggest to add a statement to the description of the operator making clearer that a) the performance is only an estimation of the model which is usually built on the complete data set and b) that this model is delivered at the output port for convenience reasons.
What do you think?
Cheers,
Ingo
That's cool, as long as people understand how the numbers are made, and even better why, then they can rest easy in their beds..
PS As food for thought, if you run this you'll see that a difference remains between the all-data performance and the average validation even down to leave-one-out. That difference interests me, is it actually a form of "margin of error" definition?
Cheers,
Ingo
Cheers,
Ingo
Cheers,
Ingo