Options

[Solved]Running RapidMiner on a MultiCore Server

aryan_hosseinzaaryan_hosseinza Member Posts: 74 Contributor II
edited November 2018 in Help
Hi everybody ,

I am working with a large data set (4500 attributes, 580,000 instances) & I am running rapidminer on a server with 25 cores and 74 GB of RAM but it still takes a lot of time to do a task (e.g. the following code)

What should I do , I've already set the memory to -Xmx50GB and set he rapidminerrc file to handle 25 threads (which I am not sure if it works or not) , the following is the result of `top` command in linux,

top - 02:36:37 up 35 days, 13:18,  6 users,  load average: 1.00, 1.00, 1.04
Tasks: 228 total,   1 running, 225 sleeping,   0 stopped,   2 zombie
Cpu(s):  4.5%us,  0.1%sy,  0.0%ni, 95.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  74222664k total, 54743028k used, 19479636k free,    51392k buffers
Swap: 75485180k total,   412024k used, 75073156k free, 29113180k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
4238 user     20   0 52.2g  12g  13m S  105 17.1 103:54.10 java      

What do you think is the best I could do , I really need this process to run in a "not very long" time ,


Thanks ,
Arian

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
   <process expanded="true" height="631" width="949">
     <operator activated="true" class="generate_massive_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Massive Data" width="90" x="45" y="75">
       <parameter key="number_examples" value="580000"/>
       <parameter key="number_attributes" value="4500"/>
       <parameter key="sparse_fraction" value="0.95"/>
     </operator>
     <operator activated="true" class="nominal_to_binominal" compatibility="5.2.008" expanded="true" height="94" name="Nominal to Binominal" width="90" x="179" y="75">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="label"/>
       <parameter key="include_special_attributes" value="true"/>
     </operator>
     <operator activated="true" class="weka:W-ReliefFAttributeEval" compatibility="5.1.001" expanded="true" height="76" name="W-ReliefFAttributeEval" width="90" x="313" y="75">
       <parameter key="sort_direction" value="descending"/>
     </operator>
     <operator activated="true" class="weights_to_data" compatibility="5.2.008" expanded="true" height="60" name="AttributeWeights2ExampleSet (4)" width="90" x="447" y="75"/>
     <operator activated="true" class="write_csv" compatibility="5.2.008" expanded="true" height="76" name="Write CSV" width="90" x="581" y="75">
       <parameter key="csv_file" value="/home/arian/result.csv"/>
       <parameter key="quote_nominal_values" value="false"/>
       <parameter key="format_date_attributes" value="false"/>
     </operator>
     <connect from_op="Generate Massive Data" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
     <connect from_op="Nominal to Binominal" from_port="example set output" to_op="W-ReliefFAttributeEval" to_port="example set"/>
     <connect from_op="W-ReliefFAttributeEval" from_port="weights" to_op="AttributeWeights2ExampleSet (4)" to_port="attribute weights"/>
     <connect from_op="AttributeWeights2ExampleSet (4)" from_port="example set" to_op="Write CSV" to_port="input"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
   </process>
 </operator>
</process>

Answers

  • Options
    earmijoearmijo Member Posts: 270 Unicorn
    You have to use an operator that takes advantage of the multiple cpu. Download the library Parallel Processing. I've played with the operator in this library and they do cut the processing time enormously.
  • Options
    aryan_hosseinzaaryan_hosseinza Member Posts: 74 Contributor II
    But it seems that it's not available for all operators (e.g. W-ReliefFAttributeEval) , right ?

    Thanks ,
    Arian
  • Options
    earmijoearmijo Member Posts: 270 Unicorn
    That is correct. Not all operators will benefit from parallization. Cross-validation, trees and some forms of searches can be parallized.
  • Options
    aryan_hosseinzaaryan_hosseinza Member Posts: 74 Contributor II
    Aha , Ok,

    Thanks ,
    Arian
Sign In or Register to comment.