The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Solved: How to implement sequential floating forward selection
Hi,
I have a binary classification problem with a lot of features. I don't only want ot use forward selection but a floating method which combines forward selection and backward elimination, but multiple times in a row. Is that possible since if I construct it, after the first backward elimination I cannot set the already chosen features as fix and go on with forward selection with the remaining features.
Thanks in advance
Kind regards,
Daniel
I have a binary classification problem with a lot of features. I don't only want ot use forward selection but a floating method which combines forward selection and backward elimination, but multiple times in a row. Is that possible since if I construct it, after the first backward elimination I cannot set the already chosen features as fix and go on with forward selection with the remaining features.
Thanks in advance
Kind regards,
Daniel
0
Answers
you can use Select by Weights on the exa and wei output of the Backward Elimination operator to remove "unselected" features.
Best regards,
Marius
thanks for the reply. You are right, I can use the "select by weight" operator after the BE, however the problem is that I want the second Forward Selection to start with the last optimal subset of the previous BE: Doin this I can overcome the disadvantage of the greedy algortihms. Here an example for a process:
Consider a total set of 100 attributes.
1. Starting with a Forward selection. Let's assume it ends after 30 attributes.
2. Since there might be bad features chosen which cannot be taken away since it is a greedy approach I do BE with the 30 features. Lets say it ends up with 25 of the 30 features.
3. Now I want to do Forward selection again STARTING already with the 25 features and the resting 75 features can be selected. Let's assume it selects t further features.
4. I do BE again with the 32 features from the subset
5. and so one until there is a possibly global maximum found.
Point 3 is the problem. How can I let the FS start with already 25 features and leave the resting 75 for selection?
This is also important for the example that I start with a random set of features and do SFFS or SFBE.
Thanks in advance
Daniel
now I got it. Step 3 is not possible out of the box, but you can work around it. Please see the attached process for reference. There, the interesting part happens after the Backward Elimination. The dataset after the BE is "remembered", then the FS is started as usual. Inside the FS, however, we "recall" the stored data and join it to the features currently tested by the FS. By checking "remove_duplicate_features" in the Join operator, we will not get duplicate attributes.
The FS, however, only includes those attributes selected by the operator itself, not the once we added artificially in its subprocess. Thus, we have to join the data again after the FS.
Hope this helps, and if you have any questions just post them here.
Best regards,
Marius
thank you for this interesting approach.
I have a few comments.
- I have changed the first selection algorithm to GA since it es easier to show it with this operator.
- The connection from the multiplier to the Forward selection is wrong, because you would give the selected set after the BE to the forward selection again, this is useless cause you have to provide additional new features. Therefore you have to deliver the original example set before the first selection. So the Forward Selection has for sure the BE selected features and the original features (which contains the BE selected features), then with the inner join operator of the FS and the final join operator after the FS it makes sure that they occur only once.
.Why do you set the generate_id operator? this does not make any sense to me since it gives only a number to every instance. but where could you lose instances in this process to make it needable to control it?
You need the ID to be able to apply the Join operators.
Best regards,
Marius