Question about data mining

andre5007 · April 2021

Hey, I'm doing a job for a data mining discipline where the teacher gave me two training and one test csv, where the training category has the feat 8 category filled in but the test category is not and the goal is finding that value, he said he could use the rapidminer for work.

When I showed what I perceived from the tutorials and what the community responds to, my teacher said that before he could get the model he would have to arrange relationships between the other feat's of the training csv in order to know which model to use.

Can someone try to help me better understand what he meant and then what I have to do more or less to see if I can get on with the job and I can explain why the model I'm going to do.

Thanks

André

yyhuang · April 2021

Hi @andre5007,

my teacher said that before he could get the model he would have to arrange relationships between the other feat's of the training csv in order to know which model to use.

My understanding here is to try some exploration for underlying relations/correlation between the predictors/features and your label ( so called feature engineering ).
https://academy.rapidminer.com/learn/video/feature-engineering-intro

You have some useful operators to choose from:

"Generate Attribute", e.g. feat 1 * feat 2, feat1 + feat 4, feat1 / feat5

"Auto Feature Engineering", https://rapidminer.com/resource/automatic-feature-engineering/

Feature selection/reduction by "Forward selection" or "Backward elimination" or something similar

A useful documentation can be found here https://rapidminer.com/blog/data-prep-feature-generation-selection/

HTH!

YY

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Question about data mining

Best Answer