The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Mining binary data
neilduggan
Member Posts: 18 Contributor II
Hi
I have some opinion poll data on internet usage (26 columns i.e. questions, approx 800 rows i.e. respondents) with answers of "true" or "false" or blank for 24 of the questions. I've binned the "age" column into three bins and the final column is location / state (which has ~ 40 values, currently numerical)
A couple of questions:
1. At the moment, the data is setup such that "sex" contains "true" for female & "false" for male - should this be separate true / false columns for male and female?
1. What's the best RapidMiner operator to mine this data for trends (e.g. young / old women / men are more likely to XXX)? I've tried using "w-apriori" but it gives me very basic rules. I've also tried "FP-growth" + "Create Association Rules" and it works slightly better but still not great. I've different attributes to "label" and it makes some difference but nothing major.
3. Is it possible to use RapidMiner to create rules in relation to the respondents location as the data stands? Or do I need to create a column for each state with true / false for each respondent?
Apologies if these are stupid questions!!
Thanks
Neil
I have some opinion poll data on internet usage (26 columns i.e. questions, approx 800 rows i.e. respondents) with answers of "true" or "false" or blank for 24 of the questions. I've binned the "age" column into three bins and the final column is location / state (which has ~ 40 values, currently numerical)
A couple of questions:
1. At the moment, the data is setup such that "sex" contains "true" for female & "false" for male - should this be separate true / false columns for male and female?
1. What's the best RapidMiner operator to mine this data for trends (e.g. young / old women / men are more likely to XXX)? I've tried using "w-apriori" but it gives me very basic rules. I've also tried "FP-growth" + "Create Association Rules" and it works slightly better but still not great. I've different attributes to "label" and it makes some difference but nothing major.
3. Is it possible to use RapidMiner to create rules in relation to the respondents location as the data stands? Or do I need to create a column for each state with true / false for each respondent?
Apologies if these are stupid questions!!
Thanks
Neil
0
Answers
If you would like to train a model I would suggest a decision tree.
But dont forget to set the role "label" to one of your 24 question columns.
Do I need to change the way I've setup the "sex" column (and other columns)? Or is it ok the way it is?