Re:-turbo model and outlier removal

guptashaguptasha Member Posts: 8 Newbie
edited June 2019 in Help
I am analyzing wine reviews dataset from kaggle in the rapid miner. Please search the dataset from google.
1-I am not able to use turbo model in this dataset ? My laptop got hanged ? any solution how I can run 150k dataset successfully?
2-how to remove outlier in the price column?any suggestion?
Tagged:

Comments

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hello @guptasha - it is good to have you here on the community. Let me try to help you step-by-step...
    Please search the dataset from google.
    So generally here on the community you want to make our lives easy to help you. Asking to google data sets is not likely to get answers from generous people. :smile:

    And by the way - the wine data set is of course already built into RapidMiner. Just type the word "wine" into the global search bar:



    I am not able to use turbo model in this dataset ?
    So I'm a little confused. We have "Turbo Prep" and "Auto Model" - which one are you referring to?

    My laptop got hanged ? 
    It's possible for sure - especially if you have a small laptop and a large data set. Have you looked at our System Requirements for RapidMiner Studio?

    any solution how I can run 150k dataset successfully?
    So I run 150k data sets every day successfully. If your laptop is hanging, most likely your computer is either not in spec or close to it. Increasing your RAM and CPU cores can help a lot.

    how to remove outlier in the price column?any suggestion?
    So this is hard to answer when you have not provided (a) your XML, and (b) your data set. Perhaps you overlooked these instructions when you posted your question?



    Scott
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Also, you can always try Sample as an initial workaround.  You don't really need 150k records to build a preliminary scorecard.  A 10% or 20% sample should be perfectly adequate to get you going...
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.