Procedurers
Hey guys,
I am currently writing a paper and i came accross a table with some procedures I'm not sure 100% what they mean. Can anyone help me with what they are and what they do? They are: descriptior scaling, descriptor selection, and parameter optimization. Here's a link to the image if it helps.
Thanks in advance.
Best Answer
-
IngoMierswa Member Posts: 1 Learner III
Hi,
Here we go:
- Descriptor Scaling: Can mean different things. Either normalization, i.e. rescaling columns in your data in a way that they all are in the same value range or follow similar statistics (e.g. all have mean 0 and standard deviation 1). Can also mean Feature Weighting following by rescaling, e.g. by calculating feature weights for example with information gain and apply those weights as scaling factors to your data columns. Finally, I heard some people using this for feature space transformations like PCA (although this does not make a lot of sense in my opinion). The first usage is certainly the most frequent meaning. BTW, RapidMiner supports all three (and more) functions.
- Descriptor Selection = Feature Selection, e.g. using greed heuristics like backward elimination or forward selection or other optimization schemes like genetic algorithms to find out which are the most important features for your learning task. Generally there are two approaches: the filter approach and the wrapper approach. The filter only takes the data into acount while the wrapper approach optimizes the set of features for a specified machine learning method. Guess what: all of those are of course also supported in RM ;-)
- Parameter Optimization: automatically find the optimal parameters for a machine learning method (or in general anything which offers parameters). Typical parameters to be optimized are the depth of decision trees, the value "k" for k-nearest neighbors, or the error-complexity trade-of "C" of a support vector machine. In general the optimization technique runs different parameter combinations for a machine learning method and evaluates which of those deliver the best result in terms of prediction accuracy. At this point I probably do not need to point out that RM supports multiple different schemes for this ;-)
You will find plenty of process examples in the Sample repository which is part of every RapidMiner installation.
General note: "Descriptor" is just another terms for attribute, feature, (independent) variable, dimensions, or influence factor (or any of the myriad of other terms used in our field for the same thing which is most frequently just a column in a data table). In machine learning you train a model based on those attributes (or: descriptors) in order to predict an outcome (called "label" in RapidMiner but you also will find the terms target, class, or dependent variable in the literature).
Hope that helps,
Ingo
3
Answers
Hello Robin,
Teh table seems to be some sort of comparison between Rapidminer, and other platforms.
Iterestingly we support R as scripting language as well incorporate many Weka algorithms,
You should explore them from our marketplace, which appears as menu in the Rapidminer Studio client
I still need to know what descriptior scaling, descriptor selection, and parameter optimization mean.