The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Is it possible to get 100% prediction?
Joannach0ng
Member Posts: 7 Learner I
in Help
Hi everyone I was told by my tutor to have a 100% accuracy prediction for my split validation ,so I was wondering if it is possible as I have tried from 0-1 but could not get to 100% ,can adding some operator do so for me to get to 100%?Thank you!
0
Comments
I would be very concerned if I got 100% accuracy in almost any real data set. The first thing I would do is try and figure out where I made a mistake. Randomness can easily mess up your perception of your model. You might get that and it could still be by chance.
Getting higher accuracy is not bad and yes you might get 100 percent, what most of us are trying to inform you is that you should be extra cautious and do more investigation on why your model is 100 percent accurate. Remember that the sentence "Nothing's Perfect" . These kinds of accuracies are far from real-world scenarios, so most of us are kind of amazed to see this. There can be multiple reasons for this.
1. No difference in training and testing datasets: If your training and testing datasets are same, then you might get 100 percent accuracy. This because the model already saw the data and is making the exact predictions on which it trained.
2. Highly correlated column: This is one regular case I see with highly accurate models. In this scenario, in your dataset, there might be a feature/attribute that is very highly correlating with the target variable (label). The issue with this is that some complex models easily identify this and just use this highly correlated column to make predictions.
3. Confounding relations: For some datasets, there might be some confounding relation (hidden relationships) between data. This will also lead to very high accuracy. I encountered this. In my scenario, I did cross-validation on a dataset and the data has multiple samples per subject, in this case, some subject samples are in training and some are in testing and the model identified the relation between them and gave 99.99 percent accuracy. I felt something odd and then tested with Leave one subject out cross-validation and it gave just above chance performance. This is important as the earlier high accuracy results are misleading.
4. Type of validation: It is very important to select the validation for your model. There are many good practices for this, but one method is to split the data into 90:10 or 80:20 ratio, then apply k-fold cross-validation on 90 percent of data and apply the model generated on 90 percent of data to test the remaining 10 percent of data. You can verify your performance from both cross-validation and the 10 percent hold out dataset.
5. Finally, I recommend you try the exciting Automodel in rapidminer that will help you decide on different models you can use and provides with some good validation as well.
I also see that you asked about ways to improve accuracy. There are different ways you can try.
1. Model selection: Identify the model that best suits your data, this can be done with the help of auto model in rapidminer or trial and error method using different models or visualizing your raw data and identify patterns to decide on the type of models that support predictions such as linear models and nonlinear models.
2. Feature selection: All features in your dataset might not be useful for predictions, for this to pick the features/attributes that are relevant for prediction you can apply different feature selection techniques like forwards, backward, automatic feature engineering, etc.
3. Hyperparameter tuning: This is important, most of the models have different hyperparameter settings. For example, a random forest may have a number of trees, pruning, etc. It is important to test different setting of hyperparameter to see how the model is improving or decreasing in performance. You can do this with the help of optimizing parameters operator to find the best model parameters for the dataset.
These are some points that came to my mind. There can be many other aspects as well.
Hope this helps.
PS: I have one suggestion about postings on the community, please don't post multiple questions on the same topic, you can continue your discussion on a single thread. This recommendation is based on this thread you already created. This answer is a combined answer for both the threads.
https://community.rapidminer.com/discussion/55923/is-it-possible-to-get-100-for-split-validation-accuracy#latest
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing