The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Do I always need to exec. a normalization/z-trans. to compare data each other and apply a ML model?
Dear all,
First of all, I am a beginner in using RM and data science techniques. Therefore, please be patient with me. I got the attached NBA data set from Kaggle I am using for a university project work / exam.
In general, do I always need to execute a normalization (z-transformation) to compare data each other within my data set, e.g. NBA statistics in my data set > columns L - Q and W - AB, and apply a machine learning model, e.g. naive bayes or linear/logistic regression?
Is an outlier detection a real machine learning model or more a technique to filter out outliers? At which number of detected outliers is it advantageous to apply an outlier detection, e.g. 10 or more detected outliers?
I would be very grateful if someone could help me.
Regards,
Michael
0
Best Answer
-
Mike0985 Member Posts: 9 Learner IIIHello Martin,Referring to your first comment "Well, it depends on the algorithm you are using. In general Normalization never hurts and it can help quite a bit. Some algorithms simply don't care like a decision tree. Then you loose interpretability but not predictive power."I still do not know exactly which ML model to use for my data set. I´m still working on this issue. I put the data set into the auto model function and different ML models, like Naive Bayes or a Regression, could be possible reffering to e.g. the accuracy. Therefore, would you say to try both, with and without normalization, with the auto model function to see and compare which would fit best?Reffering to you second comment "Outlier techniques can be used in several ways. They can be used to:"I applied the outlier detection to my data set (more than 21.000 rows) but the detection could only reduce the data set by less than 10 outliers but it took more than 30 min. Would you say in this case an outlier detection is also useful or better leave it and spare 30 min for the data science process?Thanks in advance.Regards,Michael
0
Answers
Well, it depends on the algorithm you are using. In general Normalization never hurts and it can help quite a bit. Some algorithms simply don't care like a decision tree. Then you loose interpretability but not predictive power.
Outlier techniques can be used in several ways. They can be used to
It all depends on how you use it.
Cheers,
Martin
Dortmund, Germany