Setup and picking the correct learner
I'm a beginner with datamining in general and rapidminer in specific, so I hope you will forgive me if I am less then clear about what help I need exactly. The truth is I'm not sure. I'll be paying attention to this thread so I can answer any questions.
I've got a data set of survey responses on breastfeeding from women at four different hospitals, with 30 or so variables which are either nominal or ordinal, and an ordinal outcome variable for most of the participants at 6 months with three levels (no breastfeeding, any breastfeeding, and exclusive breastfeeding (no formula)). I say most because I have some survey data for participants who couldn't be found 6 months later, or who stopped breastfeeding earlier and were dropped from the study. I've set that as the label attribute.
I'm a SAS user, so the first thing I did was do a logistic regression, removing non significant variables one by one. That showed four significant variables. I would have liked to do a survival analysis, but unfortunately the date data was badly coded.
Still with so many variables, many that are somewhat correlated (like language and country of origin) my supervisor suggested that a signal detection methodology that automatically established cutpoints for variables would be helpful for understanding the data, based on some papers she had read. I eventually realized that that was a form of datamining, and that lead me to rapidminer, which appears to be a great program. Eventually I figured out how to get sas data into it correctly, but now i'm somewhat stuck.
I've encountered some difficulty in using Rapidminer to understand the data I have. I've tried turning the ordinal into a nominal variable, both with three levels and with two, and using decision tree, but it doesn't produce a tree, just one single bar listing one of the values of the label variable.
I've removed all variables but the ones logistic regression in SAS indicated was significant, and still didn't get a tree. In any case I'm not even sure that decisiontree is what I want to use for a learner, save that it seemed closest to what the papers my supervisor suggested I look at used.
In any case, any guidance regarding how to proceed with this analysis, assumptions I need to check etc. would be much appreciated.