Predictions based on US baby names data?
I'm new to Rapid Miner and predictive analytics. I'm trying to move beyond the tutorials (which are great!) by using the US baby names (state-by-state) found on Kaggle. I'm able to load a random sample (1000 records) of the state-by-state data in:
- id (ID type)
- name (nominal type)
- gender (binominal type)
- state (nominal type)
- year (integer type)
- count (weight type)
Then I use another random selection to get 20 records without the state attribute. I'd like make a prediction of birth state based on name, gender, and birth year. I'm sure this is a contrived example, but I thought I'd give it a try. Alternatively, I'd like to predict birth year given name, gender, and state. What would be some interesting models to try in this case?
I've tried using Decision Tree to generate a model from the training data and Apply Model to the random Test Data. As best I can tell, Decision Tree is only working on year and gender, ignoring name. Is there anyway to get this model to consider name? Perhaps the issue is that I can't train on more than 1000 records due to licensing?
Process so far...Decision tree on year, then sometime gender.
Thanks in advance,