Auto Model process improvement suggestion

kypexin · April 2018

Hi guys,

I have noticed one small thing I would count as a subject for a potention improvement in the auto-generated auto-model process.

Before training the model, there's a filtering operator applied, which is used for the followiing:

"Model on cases with label value, apply the model on cases with a missing for the target column."

So later on, 'Explain predictions' is applied to 'unm' output of this filter:

The fact is, we don't necessarily have the data in this exact format (with missing labels for the examples to actually predict). On the contrary, I see much more often a use case where a whole labeled dataset is used for auto-modelling and evaluation thru 80/20 split.

So, what I do every time I save the process from auto-model is rewiring operators in this way:

Which makes much more sense for me as I train the model on 80% of data and then apply on the remaining 20% which then also used for explaining the predictions. If I don't rewire operators I am by default getting empty result from 'Explain predictions', because this filter operator always sends 100% of data to exa port and 0% to unm port.

Do you think this can be improved in some way so the process is generated in a more flexible way, depending on the initial dataset format?

sgenzer · April 2018

cc @IngoRM

IngoRM · April 2018

Hi,

One easy option would be to create the predictions and explanations for both, the cases with missing labels (as it is now) and for the 20% test cases as well. In the UI, we could always show the explanations for the test cases and those for the unlabeled examples only if there has been any.

What do you think about this approach? Keep in mind though that explain predictions can be a bit slow in some cases - so if your data set is very large, the creation of the explanations for the 20% test cases might take a while...

Best,

Ingo

kypexin · May 2018

Hi @IngoRM,

Yes, I think this can be an easy option - to explain predictions for both unlabeled and 20% test examples.

Another thought I had would be a bit more complicated solution where Auto Model would automatically detect dataset structure it works with (is it fully labeled or with missing labels) and generate process depending on that.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Auto Model process improvement suggestion

Fixed and Released · Last Updated May 2019

Comments