Mining binary data

neilduggan · April 2014

Hi

I have some opinion poll data on internet usage (26 columns i.e. questions, approx 800 rows i.e. respondents) with answers of "true" or "false" or blank for 24 of the questions. I've binned the "age" column into three bins and the final column is location / state (which has ~ 40 values, currently numerical)

A couple of questions:

1. At the moment, the data is setup such that "sex" contains "true" for female & "false" for male - should this be separate true / false columns for male and female?

1. What's the best RapidMiner operator to mine this data for trends (e.g. young / old women / men are more likely to XXX)? I've tried using "w-apriori" but it gives me very basic rules. I've also tried "FP-growth" + "Create Association Rules" and it works slightly better but still not great. I've different attributes to "label" and it makes some difference but nothing major.

3. Is it possible to use RapidMiner to create rules in relation to the respondents location as the data stands? Or do I need to create a column for each state with true / false for each respondent?

Apologies if these are stupid questions!!

Thanks

Neil

neilduggan · April 2014

Anyone?? ???

fras · April 2014

To see some trends in your data you should try one of RapidMiners Charts.
If you would like to train a model I would suggest a decision tree.
But dont forget to set the role "label" to one of your 24 question columns.

neilduggan · April 2014

Thanks fras, I'll give the decision tree a go.

Do I need to change the way I've setup the "sex" column (and other columns)? Or is it ok the way it is?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Mining binary data

Answers