questions on optimization and faster processing

cyph00cyph00 Member Posts: 2 Contributor I
edited November 2018 in Help
Hi,
I have RapidMiner running on a Xeon quad-code @ 2.67 Ghz, with 12GB Ram on a Win 7 Pro OS.

I am piping in a CSV so I expect the data is all in mem. at this point.  I have fed in roughly 20,000  labeled records with around 50 variables. The process is as follows CSV --> Validation  ( Bayesian Boost ( W- Ridor) ) --> ( Apply Model + Performance)

I am running the Weka Ridor with a Bay Boost, and Val-X (Apply model/Perfm). 
Since I'm looking for the lowest error rate possible I have bumped up the Ridor shuffle to 9, the Bayesian interations to 10, and the number of validations to 10. 

I know is somewhat a demanding process but I am finding run times to be slow, going on 23hrs processing.  Any tips on how to speed this up?  Do these processing time sound reasonable given the process and hardware ?

If there any tips at all, even switching OSs to Linux, etc. I am open to them (as long as its not hardware related)....thanks in advance.

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    yes, the validation and the boosting steps are quite time consuming, and your data is rather large. So the timings are quite realistic. Switching the OS will not have a significant (if at all) impact on RapidMiner's performance.
    If you have several analyses to do, you can easily start RapidMiner more than once and run them in parallel (or consider installing RapidAnalytics - one of its core features is running several processes in the background).
    Otherwise, you could try to reduce the dimensionality of your data - reduce either the number of examples with means of sampling, or reduce the number of features with a feature selection algorithm, e.g. Forward Selection.

    Best,
    Marius
  • fritmorefritmore Member Posts: 90 Contributor II
    Hi

    I suppose you are running 64 bit windows 7 and 64bit java to take advantage of the above 4GB ram?

    Also make sure you run rapidminer with e.g. -Xmx6000m parameter (6000 reserves 6GB of ram for the process you can change it to whatever up to available mem),
    but from my experience around 5GB should be enough for 50x20000 dataset (but if it starts using e.g. 5gb out of 5 then increase the reserved mem).

    Other suggestion: use mightier CPU(s) ;)  ,BUT it may not help at all, see this  http://rapid-i.com/rapidforum/index.php/topic,5470.0.html

    good luck
    f
    cyph00 wrote:

    Hi,
    I have RapidMiner running on a Xeon quad-code @ 2.67 Ghz, with 12GB Ram on a Win 7 Pro OS.

    I am piping in a CSV so I expect the data is all in mem. at this point.  I have fed in roughly 20,000  labeled records with around 50 variables. The process is as follows CSV --> Validation  ( Bayesian Boost ( W- Ridor) ) --> ( Apply Model + Performance)

    I am running the Weka Ridor with a Bay Boost, and Val-X (Apply model/Perfm).   
    Since I'm looking for the lowest error rate possible I have bumped up the Ridor shuffle to 9, the Bayesian interations to 10, and the number of validations to 10.   

    I know is somewhat a demanding process but I am finding run times to be slow, going on 23hrs processing.  Any tips on how to speed this up?  Do these processing time sound reasonable given the process and hardware ?

    If there any tips at all, even switching OSs to Linux, etc. I am open to them (as long as its not hardware related)....thanks in advance.
Sign In or Register to comment.