Irresolute outcome (usage of trained data model to classified data)

birteisabel_kuh · January 2019

Dear Rapidminer-Community,

at the moment we're doing a project in university and our results make us feel a little desperate.
We have a task to build a model out of training data to categeorize classified data. It's about quotes of return of customers. We are able to figure out the category of the train data (H, N + U) and alltogether there are for example ~1300 U out of 20,000 datasets. When we finished applying the model on the classified data, there are 0 U with the setting
criterion: gain_ratio and only 24 U with the setting criterion: accuracy.
Since it doesn't seem to be logical, we don't know where the mistake could be.
We used the following operators:
Retrieve Train Data, Set Role, Split Data, Decision Tree, Apply Model and in the end we added Retrieve classified data to the model.

Are there any "easy" possibilities to solve this problem? Or are there any special settings for the decision tree we have to use to get clearer results?

We would be very happy about a helpful answer!
Thanks in advance!

varunm1 · January 2019

Hi @birteisabel_kuh

Just want a clarification, so when you are classifying the data set your model is unable to predict U Label? If this is the case, I see that you are using random split which can get data that is bad. Can you try the cross-validation operator with 5 folds and see how the model is performing. As this uses all the data from both training and testing you can see if the performance increases.

Thanks
Varun

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Irresolute outcome (usage of trained data model to classified data)

Answers

Be Safe. Follow precautions and Maintain Social Distancing