The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
need to check if the solution is correct (machine learning techniques with ensemble approach)
twilight_baya
Member Posts: 10 Learner III
Dear all,
I am a very beginner with rapidminer. My task is to perform three classification techniques (ANN, DT, and SVM). Then apply the ensemble technique on the three models together to improve accuracy. (i am supposed to get a final score in %)
I need someone to have a look at the solution and let me know if it is correct.
Thank you very much for your help.
Tagged:
0
Answers
Hi @twilight_baya
It would be helpful for all community members if you could also post your dataset and the formulation of the problem itself. In other words, we'd need understanding of what kind of data you are working with and what exactly metric you need to predict.
Vladimir
http://whatthefraud.wtf
Thank you for your reply, kypexin.
Here is my data set. I have one dependent variable YGPA (pass (1) or fail (0)). I have 16 independent variables including gender (male, female). X1 to X15.
X1 is a score from 100. X2 - X15 is a score from 5.
I would like to perform classification techniques for the given dataset (ANN, DT, and SVM). Then I would like to apply stacking ensemble method to the three techniques together.
Thomas_Ott has kindly provided me with the XML codes. But i am not sure which parts of the coding i need to change in accordance to my data.
The XML codes provided by Thomas_Ott are attached here.
Hi @twilight_baya
Your attached process XML does not seem to be valid so I can't open it in RapidMiner; could you please copy XML source of the process directly from RapidMiner's 'XML' tab and post it here? Thanks.
Vladimir
http://whatthefraud.wtf
I have received these codes from tthomas in a private message.
I will copy and paste them here.
Hi @twilight_baya
I have modified the procees for you and built all-in-one process with the following logic:
Vladimir
http://whatthefraud.wtf
So I took a look at @twilight_baya's homework and find it to be wrong in sense that I would not analyze it that way. @kypexin's approach is the correct way IMHO. He' uses CV for each algorithm and them makes an overall 'performance' result.
The process I shared with you in private is just a simple esemble application within a cross validation. You just used that and didn't even ask "what's this Cross Validation" about? Why did you use Vote? Voting and Stacking apply the data in an ensembling environment differently.
My first reaction to the ANN model is that it's overfitting, the DT *might be overfitting* because the recall for Atrisk is pisspor, and SVM is god awful. The worst mistake a new machine learning practictioner can make, IMHO, is to solely rely on the accuracy results. I always evaluate Area Under Curve (if two classes), accuracy, and precision/recall at a minimum.
My two cents.
Thank you very much, @kypexin for your help. I will try out your codes and let you know.
Thank you @Thomas_Ott
Well, I needed much help with the codes i was embarrased to ask you again :-)
I know what cross-validation is but I couldn’t apply the codes to my data.I wanted to use stacking as it is proved to be best among other ensemble techniques.
Thank you for all the other comments. I am taking them into consideration.
Best regards.
Hi @twilight_baya
You are welcome. I
don't know what exactly issue you experienced with your certain dataset but I can guess that most likely the issue could be with NN and SVM algorithms as they cannot handle polynomial attributes (gender in your case), so if applied straight on the dataset you would get an error; this is why I used 'nominal to numerical' transformation on 'gender' attribute.
Vladimir
http://whatthefraud.wtf