The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
X-Validation runs training X+1 times
Hi,
I used the X-Validation operator in the last days quite oftern and choosed 10 as number of validations.
But as I see from the statusbar (see image below) the operator in the training section of the X-Validation operator is not executed 10 times as I would expect but one time more: 11 times.
The 11th run of the training operator needes roughly the same time as the other training runs.
Even worse: If I use the "X-Validation (Parallel)" operator and allow 32 threads (I have 32 cores) the first 10 runs get executed in parallel but the 11th run waits for the 10th runs to finish and stats after that. This doubles the execution time.
My questin is now: What is this 11th run for? Is ths a but or a feature? Is there any way how I could speed up the process e.g. run the 11th run in parallel to the other 10 runs.
Regards
I used the X-Validation operator in the last days quite oftern and choosed 10 as number of validations.
But as I see from the statusbar (see image below) the operator in the training section of the X-Validation operator is not executed 10 times as I would expect but one time more: 11 times.
The 11th run of the training operator needes roughly the same time as the other training runs.
Even worse: If I use the "X-Validation (Parallel)" operator and allow 32 threads (I have 32 cores) the first 10 runs get executed in parallel but the 11th run waits for the 10th runs to finish and stats after that. This doubles the execution time.
My questin is now: What is this 11th run for? Is ths a but or a feature? Is there any way how I could speed up the process e.g. run the 11th run in parallel to the other 10 runs.
Regards
0
Answers
no it is not a bug, it is a feature. :-)
After running the X-Validation k times it is run a k+1 time to create a model on the complete example set provided. This model is delivered at the Validation.model port.
As the last training is done on the complete data set this can in fact take quite a long time.
And unfortunately it is currently not possible to skip the last modeling phase. But I've created an internal ticket to start the last training only if the model port is connected.
Best,
Nils
THX for your reply.
In addition to the ticket you created it would be cool if the k + 1 time could be started in parallel to the other k learnings (if there are enough threads).
Or probably it is somehow possible to use the k'th model after the testing and postlearn the used testdata and use the resulting model as final result.
For fun I created the following process that "rolls its own" x-validation that you may be able to use to get the parallel execution you need (I haven't tried it to confirm this last point since I don't have a powerful enough machine to try it on). The first part stores the training and test example sets from inside a normal X-validation which uses a very simple model so there is no hold up as the example sets are partitioned. In addition, an (N+1)th example is created from the full data.
The second part uses a Loop operator to retrieve the training examples, build a model from them and then use the test examples to obtain a performance. It also builds a model on the entire data set from the (N+1)th example and trains it on itself (so it will overfit).
For 10 fold X-Validation there will be 11 entries in each collection returned. The average of the first 10 performances will be the same as the estimated performance from a normal X-Validation. The 11th model would be the one output by a normal X-Validation. The other 10 models are all different and could also be used but it is generally better to use the model made from the most data - in this case the 11th.
You'll notice that I have to use the Materialize Data operator a lot. This generally is needed since without it, the display of example sets can go wrong for reasons I can't explain.
It should be possible to run the second Loop operator in parallel and of course you can modify the process to do what you want.
regards
Andrew
some misinformation here: The k+1-th run is executed only if the model output port is connected. Otherwise, there will be only k runs, so at least the first ticket is meanwhile closed already :-) Parallelizing the execution of the k+1th is still a valid feature request though.
Best,
Simon