Logistic Regression

emannemann Member Posts: 2 Newbie
edited June 17 in Help
Has any one got a clue how to run logistic regression on Titanic Dataset? I've tried this literally all day but i don't think im getting the right accuracy so i must be missing a step. In Set Role my attribute name is Sex, in Split Data my ratio is 0.1 and 0.1 for the two partitions and i'm getting 64.53 accuracy - same test is ran on Orange and it was 91.7%

Screenshot attached.

Tagged:

Answers

  • varunm1varunm1 Moderator, Member Posts: 918   Unicorn
    Hello @emann

    Just for clarification. You are trying to predict attribute Sex from Titanic data set. Then you split the data into 90:10 ratio (train:test). Then you applied logistics regression and found that you got an accuracy of 64.53 percent on test data. Am I correct?

    I tried similar to what you did and got 35 percent accuracy. It depends on how your data was split. I assume that you are not changing the settings in Logistic Regression in RapidMiner; my results are with default settings. I am not sure what the settings in Orange software you were using for logistic regression. Are the settings in both software for logistic regression same?

    Also, Random (90:10) split is not recommended to compare performance as the train, and test data vary when you do it multiple times (my results are an example for this). You need to use cross validation with either 5 fold or 10 fold to test the performance of an algorithm. Also, the settings should be the same when you want to compare different software or algorithms.

    Thanks
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 280   Unicorn
    Hi @emann

    I suggest that you go thru all the parameters of logistic regression and understand their meaning (there are quite a few!). Help section for the operator explains them quite well.

    I have reproduced exactly the same process with the following parameters of logistic regression and got 80,92% accuracy 'out of the box', see below.

    Otherwise it's hard to tell not knowing your parameters settings (also I have no idea how Orange sets up logistic regression by default).




    varunm1emann
  • emannemann Member Posts: 2 Newbie
    edited March 26
    Hi @kypexin ;

    Thanks for your input. Initially i actually didn't make any changes to the regression parameters but having replicated your parameter setting the accuracy is now 76.76%. 

    Note: I'm very new to RapidMiner and Data Analytics in general so i'm not to familiar with the parameters and what they should actually be so I'm currently researching this for report purposes. Attached is a screenshot of my current parameter.
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,270   Unicorn
    Also in your OP you mentioned that you put in 0.1 and 0.1 for the split, but I think you actually need 0.9 and 0.1.  Perhaps it was a typo, though.  If not, that could definitely be affecting your model since you would only be using 10% of the data for training!
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.