Why does RapidMiner delete datarows when automatic feature selection is applied?

SanderMEs · November 2019

Maybe a very stupid question, but my input consists of 15577 data rows, my output only consists of 4500 data rows when I apply auto feature selection in data preparation.
In addition to that, can I reliably compare the confusion matrices of the baseline model (with 15577 rows) and the RapidMiner model (with +/- 4500 rows) when sizes differ but data is the same?

lionelderkrikor · November 2019

Hi @SanderMEs,

No, it's not a stupid question : :
AutoModel is splitting your dataset in 2 parts:
- 60% of the data is used to train the model
- 40% of the data is used to test the model (it is a hold out set).

Then on your test set AutoModel remove 2/7 of your data in your test set.
Your output data are the predictions and the associated confusion matrix and are based on this final test set, that's why your ouput files should represent 4500 rows (15577 x 40% x 5/7 rows)
Regards,

Lionel

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Why does RapidMiner delete datarows when automatic feature selection is applied?

Best Answer