The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

epochs vs RMSE on validation and Learning procedure

LefterisLefteris Member Posts: 6 Learner I
Good evening
I hope you are all well.

I would like to ask you a question. I want to make a diagram about the Learning and validation procedure of Deep Learning algorithm where on the X axis I will have the epochs and on the Y axis the RMSE

I would like to run it for epochs from 1: 3000 with step 2, do you know which operator I can use to do this iteration for me.

I also did it with Optimize Parameters (Grid) but seems to be unable to change the variable epochs

Thanks in advance

Best Answer

  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    edited December 2020 Solution Accepted
    I am not sure which deep learning model you are trying to use. However, let's assume that you are exploring the use of the Deep Learning extension. 

    First of all, I'd suggest to make a single run of 30,000 epochs and use "early stopping" criteria, e.g. continue execution as long as there is some "score" improvement within the "patience" period, i.e. within a fixed number of epochs. If there was no improvement the process will stop, which could be after the first 100 epochs.

    Normally, the model is trained one mini-batch at a time, i.e. a predefined number of examples are loaded for model training (e.g. to GPU), after each batch the training performance is calculated using the loss function, and at the end of each batch the model validation is conducted, using the same loss function, if a test data was presented on input. When training stops, you will be able to see the entire history of what was happening, one epoch at a time with two measurements: the average batch performance vs end of epoch validation performance. As you can see it makes no sense to skip epochs as you need them all. If you wish you can filter some epochs out or use some aggregate statistics of your performance history. This is the simplest approach but you will not have your training vs validation performance using various metrics, of course you can get these at the end when applying the optimised model to training and validation data and calculating whatever performance measures you need once. You may need to run the whole process twice, once to figure out what is the best number of epochs and once to build the final model, save it and check its performance using variety of measures.

    If you wanted to calculate several different metrics in addition to the loss function (which was possible to do so using the obsolete Keras extension), you'd have to do it differently by creating your own epoch/batch management. This can be done by first creating the deep model, e.g. training it on all training data once only, then in a simple Loop run the number of epochs, update the model, apply it to training data and validation data, getting whatever performance measures you'd like to get, and collect them all on output (may need to convert them to data append them). This method also allows you to save each of the models (or only the ones which improved) to disk, and at the end load the best one for deployment, in this way you will not need to run the process twice.

    If you wanted to also do your own batch management, within your epoch loop split your data into batches using Generate Batch, and then loop over batches using Loop Batches, and within the batch loop (rather than epoch loop) update your model one batch at a time, apply it and check performance as above. You may need to aggregate the batch performance statistics if so required.

    Enjoy deep learning hacking -- Jacob
Sign In or Register to comment.