Predicting ints on titanic dataset.

VenomSwitchVenomSwitch Member Posts: 5 Contributor I
edited March 2020 in Help
Excuse my noob level of understanding please, I'm brand new.
I am trying to predict mortality on the titanic dataset using cross-validation and linear regression. As you can only use numbers with linear regression, I have converted selected attributes (such as survived) using the 'nominal to numerical' operator. I can see it is working most of the time from looking at the data and rounding it to 1 or 0 however the predicted value is coming back as a double so it's showing as 0 correct predictions.

I suppose my question is how do I make rapidminer return an int instead of a double? I have tried using the 'real to integer' operator but it doesn't like me putting it anywhere!
Open to any suggestions.

Best Answer

Answers

  • VenomSwitchVenomSwitch Member Posts: 5 Contributor I
    edited March 2020
    When I say 'double' I actually mean 'real'.
  • VenomSwitchVenomSwitch Member Posts: 5 Contributor I
    Hi Martin,

    This is brilliant, GLM is exactly what I needed!
    I wanted the integer because it was giving me values with the decimal point and becuase they didn't exactly match the '1'/'0'  in the survived column it just told me every one was wrong with 0% accuracy (as it wasn't rounded to the '1'/'0' format in the dataset).
    I couldn't figure out where to place the generate attributes operator but it doesn't really matter as GLM has sorted out my problem.

    A very handy operator.

    Cheers!
    Joel
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    you would basically put it after each and every apply model operator you are using. Great that the GLM worked.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • VenomSwitchVenomSwitch Member Posts: 5 Contributor I
    I have got it working but now it seems to have a 100% accuracy rate which seems suspicious. I'm just going to stick with the GLM process I had before I think! If it isn't broke, don't fix it haha.
    You've helped me out today though! :smiley:

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Are you sure you applied the round on the prediction and not on the label attribute? That would explain it :)
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • VenomSwitchVenomSwitch Member Posts: 5 Contributor I
    Here is my current process using generate attributes with linear regress instead of GLM.
    My label is 'Survived = Yes'.
    I tried using the same operator inside the cross-val aswell but same result; 100% correct prediction.
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    then the other idea, are you sure that Survived = No is not part of the training ? That would also explain good results
    cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.