Predicting ints on titanic dataset.

VenomSwitch
VenomSwitch New Altair Community Member
edited November 2024 in Community Q&A
Excuse my noob level of understanding please, I'm brand new.
I am trying to predict mortality on the titanic dataset using cross-validation and linear regression. As you can only use numbers with linear regression, I have converted selected attributes (such as survived) using the 'nominal to numerical' operator. I can see it is working most of the time from looking at the data and rounding it to 1 or 0 however the predicted value is coming back as a double so it's showing as 0 correct predictions.

I suppose my question is how do I make rapidminer return an int instead of a double? I have tried using the 'real to integer' operator but it doesn't like me putting it anywhere!
Open to any suggestions.

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answer

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hi,

    First, if you use a GLM operator it can handle binominal data. it uses the same trick you are doing here but without any hazzle for you.

    Then, why exactly do you want a int over a double?

    Anyway, one way to do it is to use Generate Attributes with

    round([prediction(Survived=Yes)])

    Best,
    Martin

Answers

  • VenomSwitch
    VenomSwitch New Altair Community Member
    edited March 2020
    When I say 'double' I actually mean 'real'.
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hi,

    First, if you use a GLM operator it can handle binominal data. it uses the same trick you are doing here but without any hazzle for you.

    Then, why exactly do you want a int over a double?

    Anyway, one way to do it is to use Generate Attributes with

    round([prediction(Survived=Yes)])

    Best,
    Martin
  • VenomSwitch
    VenomSwitch New Altair Community Member
    Hi Martin,

    This is brilliant, GLM is exactly what I needed!
    I wanted the integer because it was giving me values with the decimal point and becuase they didn't exactly match the '1'/'0'  in the survived column it just told me every one was wrong with 0% accuracy (as it wasn't rounded to the '1'/'0' format in the dataset).
    I couldn't figure out where to place the generate attributes operator but it doesn't really matter as GLM has sorted out my problem.

    A very handy operator.

    Cheers!
    Joel
  • MartinLiebig
    MartinLiebig
    Altair Employee
    you would basically put it after each and every apply model operator you are using. Great that the GLM worked.

    Best,
    Martin
  • VenomSwitch
    VenomSwitch New Altair Community Member
    I have got it working but now it seems to have a 100% accuracy rate which seems suspicious. I'm just going to stick with the GLM process I had before I think! If it isn't broke, don't fix it haha.
    You've helped me out today though! :smiley:

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Are you sure you applied the round on the prediction and not on the label attribute? That would explain it :)
  • VenomSwitch
    VenomSwitch New Altair Community Member
    Here is my current process using generate attributes with linear regress instead of GLM.
    My label is 'Survived = Yes'.
    I tried using the same operator inside the cross-val aswell but same result; 100% correct prediction.
  • MartinLiebig
    MartinLiebig
    Altair Employee
    then the other idea, are you sure that Survived = No is not part of the training ? That would also explain good results
    cheers,
    Martin

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.