The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
predict age and gender with Multilabel calssification for twitter's user
Hello,
I need to predict age and gender for twitter's user regarding thier tweets,
I just collect mor than 300 known user profile with thier age and gender.
and i deivde the file to 4 gropus (Female over 20,Femlae under 20,Male over 20,male under20).
I finished process text (tokenize,remove stopword,stem,replace token).
Now how can do that in Rapidminer?
Tagged:
0
Answers
Did you check out this KB article? http://community.rapidminer.com/t5/Text-Analytics-in-RapidMiner/Sentiment-Analysis-as-a-supervised-learning-problem/ta-p/31827
Thanks Thomas_Ott
this solution seems binary positive or negative ,
but it dosent sutable for my case because i think first i have to train model to predict gender(Male,Female) then age(Over 20,Under20) and the last prediction shoud be combine tow labels like (Male ove20 , Male ubder 20,Female over 20 or Female under 20) .
I'm sure as you have already learned in your studies, some algorithms can only be applied to binary labels, some to regression (numbers), but did you know that many algorithms can handle multiple categories? For example, a kNN algorithm can predict for all 4 categories in your label without much trouble.
Have an explore of http://mod.rapidminer.com/#app and use it to help understand a small selection of the algorithms available for your solution.
(As this is RapidMiner, there are a large number of different ways to solve your problem, but first let's begin here as it is a very simple way to get you started).
Happy mining!
i think first we need to predict first label (Gender:Male/Female) after that we can predict age(Over 20Y,Under20Y).
i try browse your link but i dont know process steps to do that,
Could anyone help me please?
As @JEdward pointed out, there are several algorithms that can handle multi-label. My link shows how the process would work.
For your example, I would make labels of male_under20, male_over20, female_under20, and female_over20. This way the label is all in one attribute column and you can test the predictions and measure the performance of the classification. Assuming the model is good, then the testing (scoring) data set will spit out those labels with confidences.
You can build a model that will first classifiy the gender via a Cross Validation, then pipe that information to another Cross Validation. You'd have to use a Set Role operator and Select Attribute operator to remove the confidence attributes and change the label role to an regular attribute, but that seems very complicated.
Thanks @Thomas_Ott ,
I appreciate that but how can i apply multi-label with best accuracy and performance for more than 184000 tweets or 3000000 tokens.do you have any fully example that explain handle MLC in Rapidminer
Yes, here is a process that uses 3 classes.