The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Normalization Issue
OK... hopefully this will make sense to you because I'm thoroughly confused...
Using Version 4.2, the first file I use, "ModelBuider_v42.xml," builds the model to predict the change in price. For example, the "ModelBuilder" file will import the raw data, normalize the data, create a simple linear regression model, write the model to a file, reload the model, then apply the model to the previous example set using the ModelApplier. After I run file the "Meta Data View" shows the following statistics for the label and prediction respectively, "avg = 0.390 +/- 7.132" and "avg = 0.390 +/- 0.261." In addition, the statistics of all the regular attributes are "avg = 0 +/- 1." Therefore, everything appears to look good thus far.
However, my second file, "ModelLoader_v42.xml," is used to import new raw data, load the model, apply the model, and save the results to a comma seperated file. But when I run this file using the same raw data file as before, the "Meta Data View" shows the following statistics for the label and prediction respectively, "avg = 0.390 +/- 7.132" and "avg = 8.846 +/- 1.677." In addition, the statistics for all the regular attributes do not appear to be normalized, i.e. "avg = 65.074 +/- 16.351, avg = 0.337 +/- 2.242, etc." Therefore, even though I selected "return_preprocessing_model" in the "Normalization" operator in the model builder file--none of the regular attributes or the predictions appear to remain normalized.
Now this is when it really gets confsing. Using Version 4.1, when I build the model using the same operators and the same raw data as before, the statistics are as follows for the label and prediction respectively, "avg = 0.390 +/- 7.132" and "avg = -1.238 +/- 2.720" And the statistics for the regular attributes appear really off, i.e. ""avg = -4.223 +/- 0.004, avg = -0.217 +/- 0.199, etc." for the same attributes as above. Moreover, when I load and run the model, the statistics for the label and prediction respectively are, "avg = 0.390 +/- 7.132" and "avg = 0.390 +/- 0.261." In addition, now the statistics of all the regular attributes are normalized again, i.e. "avg = 0 +/- 1."
What is really strange is that the results I got using verion 4.2, i.e. "ModelBuider_v42.xml" but could not duplicate using the "ModelLoader" file are the same results I got after creating the model and loading the model using version 4.1.
Could I have corrupted the results while trying to repeat the process. Or should I have uninstall Version 4.1 before I installed version 4.2.
Please let me know how I can transfer the xml and data file to you for verification...
Thanks again,
Darrell
Using Version 4.2, the first file I use, "ModelBuider_v42.xml," builds the model to predict the change in price. For example, the "ModelBuilder" file will import the raw data, normalize the data, create a simple linear regression model, write the model to a file, reload the model, then apply the model to the previous example set using the ModelApplier. After I run file the "Meta Data View" shows the following statistics for the label and prediction respectively, "avg = 0.390 +/- 7.132" and "avg = 0.390 +/- 0.261." In addition, the statistics of all the regular attributes are "avg = 0 +/- 1." Therefore, everything appears to look good thus far.
However, my second file, "ModelLoader_v42.xml," is used to import new raw data, load the model, apply the model, and save the results to a comma seperated file. But when I run this file using the same raw data file as before, the "Meta Data View" shows the following statistics for the label and prediction respectively, "avg = 0.390 +/- 7.132" and "avg = 8.846 +/- 1.677." In addition, the statistics for all the regular attributes do not appear to be normalized, i.e. "avg = 65.074 +/- 16.351, avg = 0.337 +/- 2.242, etc." Therefore, even though I selected "return_preprocessing_model" in the "Normalization" operator in the model builder file--none of the regular attributes or the predictions appear to remain normalized.
Now this is when it really gets confsing. Using Version 4.1, when I build the model using the same operators and the same raw data as before, the statistics are as follows for the label and prediction respectively, "avg = 0.390 +/- 7.132" and "avg = -1.238 +/- 2.720" And the statistics for the regular attributes appear really off, i.e. ""avg = -4.223 +/- 0.004, avg = -0.217 +/- 0.199, etc." for the same attributes as above. Moreover, when I load and run the model, the statistics for the label and prediction respectively are, "avg = 0.390 +/- 7.132" and "avg = 0.390 +/- 0.261." In addition, now the statistics of all the regular attributes are normalized again, i.e. "avg = 0 +/- 1."
What is really strange is that the results I got using verion 4.2, i.e. "ModelBuider_v42.xml" but could not duplicate using the "ModelLoader" file are the same results I got after creating the model and loading the model using version 4.1.
Could I have corrupted the results while trying to repeat the process. Or should I have uninstall Version 4.1 before I installed version 4.2.
Please let me know how I can transfer the xml and data file to you for verification...
Thanks again,
Darrell
0
Answers
first of all a short remark: it would be nice if you post subsequent answers to the same thread (if they are related to the same question and subject. Then it is more easy to follow.
Now concerning your problem... Well, did you actually load both the preprocessing model and the regression model to a file and loaded and applied them to your new data. As far as I understand from your process descriptions, you normalized the data in your example set, learned a model, saved it, loaded it again and applied it (on the same example set which was already normalized). Then of course the data is still normalized and the model applied on the normalized data. If you however apply the regression model on new data which has not been normalized before, you can't expect it to be normalized. What I am trying to say is: in your first file, save the preprocessing model before applying the learner, load (and apply) it in your second file before actually loading (and applying) the regression model. This should do the trick. Well simply copy and paste it into the forum post and bracket it in the tags the forum supplies for program code.
Hope this was helpful,
Tobias
you have to save / load / apply both models like Tobias have pointed out or you have to use the GroupModel operator alternatively. I have posted an answer how this works here:
http://rapid-i.com/rapidforum/index.php/topic,211.0.html
Cheers,
Ingo
Regarding the "normalization" issue, I used the "ModelGrouper" operator in version 4.2 as suggested, but I still can't get my model predictions to correlate between versions 4.1 and 4.2. However, using your suggestion I was able to get all of the attributes to normalize correctly, but the prediction values are still very different. I'm sure that I must have a logic error somewhere, but I just can't find it.
Below are copies of the files I used for testing with Version 4.1 and Version 4.2. While the avg and std of the prediction using version 4.1 appears correct, the avg and std of the prediction using version 4.2 appears faulty.
Copy of "ModelBuilder_v41" Copy of "ModelLoader_v41" After running both programs in Version 4.1, the statistics of the prediction are "avg = 0.390 +/- 0.261."
Copy of "ModelBuilder_v42" Copy of "ModelLoader_v42" After running both programs using Version 4.2, the statistics of the prediction are "avg = 8.846 +/-1.677."
Therefore, I still can't figure out why the prediction average and standard deviation is so much different in Version 4.2.
Thanks again for your great support and your fantastic product.
best regards,
Darrell
ok, I see. I think I found the cause for the problem. The ModelGrouper operator adds the model beginning with the last one ending with the first one. That means, the prediction model is applied first and the data is normalized afterwards. You could use a IOSelector to exchange the order of both models before grouping them like in the following process: You can check the order by having a view at the combined model. The models will be applied in the order they are defined in the grouped model.
Cheers,
Ingo
Cheers,
Ingo
Using the IOSelector operator as you described fixed the issue. Thanks again for all your great support! I don't know how long, or if I would have ever, figured that one out.
best regards,
Darrell