Mining binary data

neildugganneilduggan Member Posts: 18 Contributor II
edited November 2018 in Help

I have some opinion poll data on internet usage (26 columns i.e. questions, approx 800 rows i.e. respondents) with answers of "true" or "false" or blank for 24 of the questions. I've binned the "age" column into three bins and the final column is location / state (which has ~ 40 values, currently numerical)

A couple of questions:

1. At the moment, the data is setup such that "sex" contains "true" for female & "false" for male - should this be separate true / false columns for male and female?

1. What's the best RapidMiner operator to mine this data for trends (e.g. young / old women / men are more likely to XXX)? I've tried using "w-apriori" but it gives me very basic rules. I've also tried "FP-growth" + "Create Association Rules" and it works slightly better but still not great. I've different attributes to "label" and it makes some difference but nothing major.

3. Is it possible to use RapidMiner to create rules in relation to the respondents location as the data stands? Or do I need to create a column for each state with true / false for each respondent?

Apologies if these are stupid questions!!  :o




  • Options
    neildugganneilduggan Member Posts: 18 Contributor II
    Anyone??  ???
  • Options
    frasfras Member Posts: 93 Contributor II
    To see some trends in your data you should try one of RapidMiners Charts.
    If you would like to train a model I would suggest a decision tree.
    But dont forget to set the role "label" to one of your 24 question columns.
  • Options
    neildugganneilduggan Member Posts: 18 Contributor II
    Thanks fras, I'll give the decision tree a go.

    Do I need to change the way I've setup the "sex" column (and other columns)? Or is it ok the way it is?

Sign In or Register to comment.