The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Why does RapidMiner delete datarows when automatic feature selection is applied?
Maybe a very stupid question, but my input consists of 15577 data rows, my output only consists of 4500 data rows when I apply auto feature selection in data preparation.
In addition to that, can I reliably compare the confusion matrices of the baseline model (with 15577 rows) and the RapidMiner model (with +/- 4500 rows) when sizes differ but data is the same?
In addition to that, can I reliably compare the confusion matrices of the baseline model (with 15577 rows) and the RapidMiner model (with +/- 4500 rows) when sizes differ but data is the same?
Tagged:
0
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 UnicornHi @SanderMEs,
No, it's not a stupid question : :
AutoModel is splitting your dataset in 2 parts:
- 60% of the data is used to train the model
- 40% of the data is used to test the model (it is a hold out set).
Then on your test set AutoModel remove 2/7 of your data in your test set.
Your output data are the predictions and the associated confusion matrix and are based on this final test set, that's why your ouput files should represent 4500 rows (15577 x 40% x 5/7 rows)
Regards,
Lionel6