Letting an operator in RM know the coherences of variables?
hi,
my dataset has several attributes that are constructed from other attributes...
e.g I have radius and diameter, circumference and area of something, and those values basically can be calculated from radius alone, and therefore it just contributes as an additional weighted attribute to the dataset (besides the base attribute,radius)
is there any way to show some operators the correlations between some attributes and express those coherences as a formula or so for the operator?
therefore, it could take these correlations into account and give better results or select more relevant features for some operators...
can Rapidminer do something that intelligent?
Answers
What would be the purpose of it? Letting it the learner know? The learner is not only using correlations but all dependecies. Good learners will incorperate those dependecies. That's the trick.
~Martin
Dortmund, Germany
Adding to what @mschmitz said, the machine learning algorithms already take into account the inter-relationships between your features. You can, of course build a Correlation matrix using either the Correlation Matrix operator (read the help menu for the applied Correlation Matrix formula) and then export those feature weights. Then using a Select By Weights operator you can select the Top 5 or 10 features that you can then input into another machine learning algorithm.
Another suggestion is to investigate the "Weight By ..." operators, they will use an algorithm to determine how heavily a feature will influence your target label. There are some great ones such as Weight By SVM, Weight By Tree, or Weight by Relief. All worth investigating.
yeah, they are all worth investigating, but that's the problem...
I get a bunch of different selections of attributes, every time I try a different weighting or Feature selection algorithm for my attribute, which one of the selections should I choose therefore? there are plenty of possibilities, and it's hard to try all of the combinations out on my learner...
or is that somehow possible? some integrated approach to try out all weightors/selectors on one model or several models and choose those that have the best performance (accuracy)?