The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Compound search for attribute selection
Compound search combines intermediate search result to search the space of all possible attribute subsets more effectively.
For example in a dataset with 100 attributes, and the optimal subset containing 10 attributes, forward search can only find this optimal subset after 10 generations.
Compound search will put all promising attributes in the second generation.
So if attributes 1 to 10 have some good accuracy and the rest has 0% accuracy, it will test permutations of attributes 1 to 10 in the second generation.
A paper explaining this idea in more detail, and proving that this is indeed a good idea with several experiments:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.4640
robotics.stanford.edu/~ronnyk/fssWrapper.ps
For example in a dataset with 100 attributes, and the optimal subset containing 10 attributes, forward search can only find this optimal subset after 10 generations.
Compound search will put all promising attributes in the second generation.
So if attributes 1 to 10 have some good accuracy and the rest has 0% accuracy, it will test permutations of attributes 1 to 10 in the second generation.
A paper explaining this idea in more detail, and proving that this is indeed a good idea with several experiments:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.4640
robotics.stanford.edu/~ronnyk/fssWrapper.ps
0
Answers
thanks for the hint. Sounds interesting. Are there any implementations yet? Would be happy to incorporate that into the RM core.
Cheers,
Simon
It is just a search algorithm.
But my java experience is minor.
I have 2 university courses on Java programming, but somehow seems not enough.
But just has the algorithm?
as a start, you can look into the forward selection:
com.rapidminer.operator.features.selection.ForwardSelectionOperator
If you want to do it, you can checkout RM from SVN and build it directly into the core, so you don't have to bother with making a plugin in the first place. You can still do that later, or, if you want, you can contribute the code and we would probably build it into the core.
Cheers,
Simon
A HashMap data type all containing all states searched so far.
A Array data type top, containing the top-N states.
A function to sort top by fitness.
A function mutate, which randomly inverts attribute on/off for 1 attribute and creates a new state.
A function crossover, which randomly copies attribute on/off from source A or B and creates a new state.
Now I will see if I can somehow modify the forward selection operator, and recompile it.
And I guess to see if it works I can download some datasets from UCI and compare kappa score and runtime.
But since I need to remember all states that I once visited the memory size of this array is gonna be a problem.
boolean[] selected = new boolean[numberOfAttributes];
memory size: numberOfAttributes * 8 bit + array overhead
It think its better to use a BitSet.
BitSet selected = new BitSet(numberOfAttributes);
memory size: numberOfAttributes * 1 bit + bitset overhead
I think the part below is the heuristic / fitness part / performance?
innerExampleSource.deliver(exampleSet); // change attributes in dataset?
getSubprocess(0).execute(); // makes call to learning algorithm?
boolean[] selected = new boolean[numberOfAttributes];
PerformanceVector bestPerformance = null;
for (i = 0; i < numberOfSteps; i++) {
int bestIndex = 0;
boolean gain = false;
for (int current = 0; current < numberOfAttributes; current++) {
if (!selected[current]) {
// switching on
attributes.addRegular(attributeArray[current]);
// evaluate performance
innerExampleSource.deliver(exampleSet);
getSubprocess(0).execute();
PerformanceVector performance = innerPerformanceSink.getData();
if (bestPerformance == null || performance.compareTo(bestPerformance) > 0 ) {
bestIndex = current;
bestPerformance = performance;
gain = true;
}
// switching off
attributes.remove(attributeArray[current]);
}
}
I'm not sure if RM is compatible.
To store all search results, states need to be optimized for memory.
The trick is to use information from previous search results to guide the search.
This is done by combining the top search result.
Which is a lot like cross over in genetic search algorithms.
The big difference is, is that there is no random selection to create pairs to generate off springs.
But instead you use best first selection.
all of what you say makes a lot of sense. Looks like you already made some progress.
However, I don't understand what you mean by "I'm not sure if RM is compatible." Compatible with what?
Apart from that, I don't see any questions in your post :-) except of these two:
I think the part below is the heuristic / fitness part / performance?
innerExampleSource.deliver(exampleSet); // change attributes in dataset?
getSubprocess(0).execute(); // makes call to learning algorithm?
Yes. The first pushes the example set to the inner OutputPorts of the subprocess (the ones on the left side). It can then be used by the subprocess, which typically contains the learner and an evaluator. This is executed by the second line of code. After that, you can collect the results from the inner sinks (InputPorts).
Cheers,
Simon