The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Predicting Performance with Random Forest
I am doing a work for university and it is my first time on RapidMiner.
I try to predict if people will get vaccinated or not to avoid sending letters sent to people who will not be vaccinated and thus minimizing costs of sending.
I have a big database with more than 400 attributs. I need thus to classify attributs and delete useless ones. I tried the attributs Random Forest, Apply Model and Performance (Classification) but when I check the performance results, I always have 0% and 100% for class recall. I tried to use another model that I have often seen on the internet "k-NN" and with this one it is not the case. Thus I supposed that the problem is the Random Forest.
Does someone know why the model predict always the same value ?
I try to predict if people will get vaccinated or not to avoid sending letters sent to people who will not be vaccinated and thus minimizing costs of sending.
I have a big database with more than 400 attributs. I need thus to classify attributs and delete useless ones. I tried the attributs Random Forest, Apply Model and Performance (Classification) but when I check the performance results, I always have 0% and 100% for class recall. I tried to use another model that I have often seen on the internet "k-NN" and with this one it is not the case. Thus I supposed that the problem is the Random Forest.
Does someone know why the model predict always the same value ?
0
Answers
you should really watch a few introductory videos on validation to understand what happens here.
https://academy.rapidminer.com/learn/video/introduction-to-model-validation
Also, think about optimizing your model. Look at the model output. Did the random forest create trees that don't have any decisions in them? Or too complex ones? You could have underfitting (no reasonable trees) or overfitting (overly complex trees learning the incoming data, not the rules).
For a data set with many attributes, Naive Bayes and Support Vector Machines can be helpful.
You could also try AutoModel and RapidMiner Go, they would automatically determine the best modeling algorithm for your data.
Regards,
Balázs