The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Auto Model Rows
Hi, I am currently trying to use Auto Model with a data set which has roughly 1300 rows.
When I load the data I can see amount of rows at 1300, in select task it also has 1300 rows, the same in prepare target however when I get the results and choose a certain model, then go into predictions I can only see scoring for around 520 rows.
Is there any reason that about half of the rows are missing or not being displayed? I wondered if it was something to do with editing the model types? Currently I am just using the default setting e.g. Use regularisation, Automatically optimise.
I am currently using an academic license and I checked if it was a row limit but I have unlimited, which makes sense as when I manually make the models I can get results for the 1300 rows.
Thanks for any help you can offer.
-Jason
When I load the data I can see amount of rows at 1300, in select task it also has 1300 rows, the same in prepare target however when I get the results and choose a certain model, then go into predictions I can only see scoring for around 520 rows.
Is there any reason that about half of the rows are missing or not being displayed? I wondered if it was something to do with editing the model types? Currently I am just using the default setting e.g. Use regularisation, Automatically optimise.
I am currently using an academic license and I checked if it was a row limit but I have unlimited, which makes sense as when I manually make the models I can get results for the 1300 rows.
Thanks for any help you can offer.
-Jason
Tagged:
0
Best Answers
-
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM FounderHi @Madcap,Glad to hear from you. That behavior is actually what is supposed to happen. We create a 40% hold-out set from your input data to evaluated the model which happens to be those 520 rows. Predictions will be created for those to calculate how well the models work. See this discussion for more details: https://community.rapidminer.com/discussion/54774/auto-model-issueThere is really no point in doing this for the 60% of the data the model was trained on by the way. For more on this, I would recommend this white paper here: https://rapidminer.com/resource/correct-model-validation/Hope this helps,
Ingo6 -
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornMaybe AutoModel should switch to cross-validation on smaller datasets.
The cross-validation is more accurate in this case. You get a higher number from AutoModel but that doesn't mean that the model is better, it just means that it got lucky when tested on less data.6 -
varunm1 Member Posts: 1,207 UnicornHi @Madcap
You can choose cross validation results as your data set is small. Automodel might have higher accuracy as its not training and testing on whole data set.
Thanks
VarunRegards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
5 -
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornI would argue that in all cases cross validation is a better performance indicator (in line with the whitepaper Ingo references above). Any split validation sample is always going to be subject to the idiosyncrasies of only a subset of the data and how it is different from the overall sample. It is true that in larger datasets this should diminish in magnitude, but cross-validation eliminates it entirely.7
Answers
Just one final thing, if that is okay, which results would I be inclined to use then? The manual decision tree (with cross validation) which takes into account all the rows or the auto model which takes 40%? The numbers are very similar maybe only 1%-2% difference, with auto model having higher accuracy.
Thanks again
-Jason
I will take the cross validation reading then, I am actually looking into RapidMiner for my honours project (dissertation) so all of this advice is really helpful gives me more to write about!
Thanks
-Jason
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts