The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Feature selection stability validation"
A RapidMiner user wants to know the answer to this question: Are there any tutorials or best practices for feature selection stability validation?
Tagged:
0
Answers
Sincerely,
özge
Can you check the below link and see if this is helpful.
https://rapidminer.com/blog/multi-objective-optimization-feature-selection/
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Dortmund, Germany
unfotunately, I have no chance to find the book immediately. Actually, some resourses indicates that operator works same as x validation. The problem is, I cannot figure out which operator/model should apply in the stability operator. If an example gives answer to that could you please help me ?
Regards,
important, but also the stability of the selection has to be taken into account. The stability indicates how much the choice of a good attribute set is independent of the particular sample of examples. If the subsets of features chosen on the basis of different samples are very different, the choice is not stable. The difference of feature sets can be expressed by statistical indices.
the Feature Selection extension for RapidMiner. The operator itself is named
Feature Selection Stability Validation.
This operator is somewhat similar to a usual cross validation. It performs an attribute
weighting on a predefined number of subsets and outputs two stability measures. Detailed options as well as the stability measures will be explained later in this section.
In order to reliably estimate the stability of a feature selection, one should loop over the
number of attributes selected in a specic algorithm. For the problem at hand, the process again commences with two Read AML operators that are appended to form a single set of examples. This single example set is then connected to the input port of a Loop Parameters operator. The settings of this operator are rather simple, and are depicted in Figure 16.10.
The Feature Selection Stability Validation (FSSV) is placed inside the Loop
Parameters operator accompanied by a simple Log operator (see Figure 16.11). The
two output ports of the FSSV are connected to the input ports of the Log operators. A
Log operator stores any selected quantity. For the problem at hand, these are the Jaccard index [13] and Kuncheva's index [14]. The Jaccard index S(Fa; Fb) computes the ratio of the intersection and the union of two feature subsets, Fa and Fb:
IFa∩ FbjI / IFa∪ FbjI
:
the first one being column name. Entries can be added and removed using the Add Entry and Remove Entry buttons, respectively. The entry for column name can be basically anything.
It is helpful to document the meaning of the logged values by a mnemonic name.
The second field oers a drop-down menu from which any operator of the process can
be selected. Whether a certain value that is computed during the process or a process
parameter shall be logged, is selected from the drop-down menu in the third panel. The
fourth field offers the selection of output values or process parameters, respectively, for the selected operator.
An operator for attribute weighting is placed inside the FSSV. For the problem at hand,
used as well.
As can be seen, the process for selecting features in a statistically valid and stable
manner, is quite complex. However, it is also very effective. Here, for a number of attributes between 30 and 40, both stability measures Jaccard and Kuncheva's index lie well above 0.9. Both indices reach the maximum of 1.0, if only one attribute is selected. This indicates that there is one single attribute for the separation of signal and background that is selected under all circumstances. Since other attributes also enhance the learning performance, about 30 more attributes are selected. This substantially decreases the original number of dimensions.