The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"feature selection loop"
I'm new in rapidminer , so maybe that is why, I couldn't find a way :
lets suppose, in term of significance, rank of features is known:
t100,t12,t25,t16,.....
now, I want to make an iteration such that :
t(i)=features(1:i):
t(1)=1st element of ordered features -> t100
t(2)=2 first elements of ordered features -> t100,t12
repeat ( there is no any improvement in classification performance) {
performance of classification t(i)=p1
performance of classification t(i-1)=p2
if (p1<p2
t(i-1)=t(i+1)
i=i+1
}
}
here, there is three problems :
1- making a loop
2- whereas my classifier is Neural network and it's sensitive to initial conditions so each above iteration should be checked several times (say 100 times) and if mean of performances violate the condition, loop should be break.
3- how can I have a table of accuracy of mean of performance at the end
please help me as much as possible
thanks
lets suppose, in term of significance, rank of features is known:
t100,t12,t25,t16,.....
now, I want to make an iteration such that :
t(i)=features(1:i):
t(1)=1st element of ordered features -> t100
t(2)=2 first elements of ordered features -> t100,t12
repeat ( there is no any improvement in classification performance) {
performance of classification t(i)=p1
performance of classification t(i-1)=p2
if (p1<p2
t(i-1)=t(i+1)
i=i+1
}
}
here, there is three problems :
1- making a loop
2- whereas my classifier is Neural network and it's sensitive to initial conditions so each above iteration should be checked several times (say 100 times) and if mean of performances violate the condition, loop should be break.
3- how can I have a table of accuracy of mean of performance at the end
please help me as much as possible
thanks
Tagged:
0
Answers
I don't know which RM Version you are using but supposing your using RM5 the answer is rather easy:
Ad (1) The loop adding features according to their weight is "Optimize Selection (Weight-Guided)" and can be found at Data Transformation/Attribute Set Reduction and Transformation/Selection/Optimization. Unfortunatelly the docu doesn't fit its true parameters i think it will work the way you want.
Ad (2) Use "Loop and Average" (Process Control/Loop)
Ad (3) Use "Log" (Utility/Logging)
If you're using RM4: As far as I remember all those operators existed in previous versions. Unfortunatelly with different names I don't remember right now.
Hope I could help,
chero
thanks for the reply.
"Optimize Selection (Weight-Guided)" uses forward and backward feature selection methods while I have the list of best features in order and question is about minimum numbers of them which produces best performance
Graham
well it uses forward selection. In my documentation backward elimination is not mentioned (perhaps different builds?).
Anyhow isn't that what you want? You want to add features one after another (with a given sequence) while your performance improves. Despite the part in brackets that is the definition of sequential forward selection -- imho. The sequence of feature addition can -- with this operator -- be given as attribute weights. If your ranking isn't some kind of attribute weights you can use the operator Weight by User Specification (Modeling/Attribute Weighting) to create attribute weights suiting your needs.
Best regards,
chero
thanks alottttttttttttttttttttttttttt for the reply.
I could find a solution for my problem with your hint, but I donot know how to do this procedure 100 times and make an average of performances
and confusion matrix table
With best
Graham
Greetings,
chero
you are right, IteratingPerformanceAverage was exactly something which I was looking for that,
but about my main question:
I run the program but the output is wrong. it stops after selection 3 features while I know there is improvement in performance with at least 10 features
features: att1 performance: .65
features: att1,att2 performance: .67
features: att1,att2,att3 performance: .69
but program stops here
any idea???
Regards
Graham
So I expect that having 4 features doesn't improve your performance so the algorithm stops correctly.
first of al thank you for your consideration
I checked it manually, I mean by adding features perofmance develops until 18th features. that is a point the codes should report
but it reports 3rd festures and I don't know why?
as a side pointm, I increased number of generations without improval but it causes changing the order of the best features and some features are removed, while I want to evalute the perfmance in such way that depeneds on my features that I listed in order..
again thanks
With the best
Graham
actually I don't knwo why the order is changed neither why features are removed (after beeing added). Maybe one of the developers can answer your question.
Nevertheless it would be good to have some real data to reproduce your results. Perhaps you can provide some anonymized data.
Best regards,
chero
I got it, suppose best number of features is 15 and you start a loop from 2 features: there is no guarantee that performance improves in all steps of adding new feature until 15th features. I mean step by step adding feature and looking at the performance just consider local optimum while , data should be considered as a package.
now , is there any way in RM to increase certain features one by one and save their performances and
this procedure is done until last feature without stopping in reducing performance cases
this one my code but it stops b/c it is WeightGuidedFeatureSelection ::) any idea?
thanks
Graham
I'm not quite sure if this fit's in here, but we have a Forward attribute selection, which would add one attribute after another until a stopping criterion like performance decrease is fulfilled. There is one version in the core and we have a plugin, that delivers an extended and more efficient solution.
If you need to be able to define the start attribute set, we would have to extend this version with this function, but this would be possible for a relative small fee. If you are interested, feel free to contact me.
If I missed the topic completely, because I have only read the last post, just ignore this reply
Greetings,
Sebastian