Question about data mining
Hey, I'm doing a job for a data mining discipline where the teacher gave me two training and one test csv, where the training category has the feat 8 category filled in but the test category is not and the goal is finding that value, he said he could use the rapidminer for work.
When I showed what I perceived from the tutorials and what the community responds to, my teacher said that before he could get the model he would have to arrange relationships between the other feat's of the training csv in order to know which model to use.
Can someone try to help me better understand what he meant and then what I
have to do more or less to see if I can get on with the job and I can explain
why the model I'm going to do.
Thanks
André
Best Answer
-
yyhuang Administrator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data ScientistHi @andre5007,my teacher said that before he could get the model he would have to arrange relationships between the other feat's of the training csv in order to know which model to use.
My understanding here is to try some exploration for underlying relations/correlation between the predictors/features and your label ( so called feature engineering ).
https://academy.rapidminer.com/learn/video/feature-engineering-intro
You have some useful operators to choose from:
"Generate Attribute", e.g. feat 1 * feat 2, feat1 + feat 4, feat1 / feat5
"Auto Feature Engineering", https://rapidminer.com/resource/automatic-feature-engineering/
Feature selection/reduction by "Forward selection" or "Backward elimination" or something similar
A useful documentation can be found here https://rapidminer.com/blog/data-prep-feature-generation-selection/
HTH!
YY1