The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Irresolute outcome (usage of trained data model to classified data)
birteisabel_kuh
Member Posts: 1 Learner I
in Help
Dear Rapidminer-Community,
at the moment we're doing a project in university and our results make us feel a little desperate.
We have a task to build a model out of training data to categeorize classified data. It's about quotes of return of customers. We are able to figure out the category of the train data (H, N + U) and alltogether there are for example ~1300 U out of 20,000 datasets. When we finished applying the model on the classified data, there are 0 U with the setting
criterion: gain_ratio and only 24 U with the setting criterion: accuracy.
Since it doesn't seem to be logical, we don't know where the mistake could be.
We used the following operators:
Retrieve Train Data, Set Role, Split Data, Decision Tree, Apply Model and in the end we added Retrieve classified data to the model.
Are there any "easy" possibilities to solve this problem? Or are there any special settings for the decision tree we have to use to get clearer results?
We would be very happy about a helpful answer!
Thanks in advance!
at the moment we're doing a project in university and our results make us feel a little desperate.
We have a task to build a model out of training data to categeorize classified data. It's about quotes of return of customers. We are able to figure out the category of the train data (H, N + U) and alltogether there are for example ~1300 U out of 20,000 datasets. When we finished applying the model on the classified data, there are 0 U with the setting
criterion: gain_ratio and only 24 U with the setting criterion: accuracy.
Since it doesn't seem to be logical, we don't know where the mistake could be.
We used the following operators:
Retrieve Train Data, Set Role, Split Data, Decision Tree, Apply Model and in the end we added Retrieve classified data to the model.
Are there any "easy" possibilities to solve this problem? Or are there any special settings for the decision tree we have to use to get clearer results?
We would be very happy about a helpful answer!
Thanks in advance!
0
Answers
Just want a clarification, so when you are classifying the data set your model is unable to predict U Label? If this is the case, I see that you are using random split which can get data that is bad. Can you try the cross-validation operator with 5 folds and see how the model is performing. As this uses all the data from both training and testing you can see if the performance increases.
Thanks
Varun
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing