🎉 🎉   RAPIDMINER 9.5 BETA IS OUT!!!   🎉 🎉

GRAB THE HOTTEST NEW BETA OF RAPIDMINER STUDIO, SERVER, AND RADOOP. LET US KNOW WHAT YOU THINK!

CLICK HERE TO DOWNLOAD

🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

Any RapidMiner experts out there want to help a n00b?

mohinipoojari16mohinipoojari16 Member Posts: 1
edited November 2018 in Help

I've been struggling with this for a few days and could use some gentle prodding in the right direction.

I have approx. 24k example rows, with 25 real (double's) attributes, and a nominal label. Each example represents a snapshot of scientific measurements at a moment in time (g-forces, magnometer, etc.) and the nominal label is essentially a boolean ("was the event happening?"). I'm trying to build a model (preferably a formula) that can predict the boolean output, or provide some sort of numerical "confidence".

Here's the issues I'm having:

Almost everything I do is running out of memory, I have 3GB of RAM devoted to the RapidMiner JVM.
In the event that I do get a model to self learn, I end up with something that has "97% accuracy", but always predicts one of the boolean values (e.g. it's 97% accurate and not 100% accurate, because it always predicts "false" and never predicts a "true")
I'm thinking some of my attributes are insignificant to the boolean result, but I don't understand how to identify which ones and eliminate them. I also think I'm wasting a lot of time trying each model type out (LibSVM, Neural Net, etc.) when the guru's would probably know which model applies to this type of data/problem.

Thanks for any help.

Tagged:

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 783   Unicorn

    Hi @mohinipoojari16,

     

     - It seems that you have imbalanced data. In this case, you can Sample your data : Your global accuracy will decrease, but 

    your Recall (was the event happening? = True) will increase : your model will be able to predict some "true".

    Take a look at this thread.

     

     - To perform "feature selection", RapidMiner propose many operators : 

    Feature_Selection_Operators.png

    Personnaly, I find Optimize Selection (Evolutionary) operator performant.

     

    I hope it helps,

     

    Regards,

     

    Lionel

     

Sign In or Register to comment.