[Solved]Running RapidMiner on a MultiCore Server

aryan_hosseinza · November 2012

Hi everybody ,

I am working with a large data set (4500 attributes, 580,000 instances) & I am running rapidminer on a server with 25 cores and 74 GB of RAM but it still takes a lot of time to do a task (e.g. the following code)

What should I do , I've already set the memory to -Xmx50GB and set he rapidminerrc file to handle 25 threads (which I am not sure if it works or not) , the following is the result of `top` command in linux,

top - 02:36:37 up 35 days, 13:18, 6 users, load average: 1.00, 1.00, 1.04
Tasks: 228 total, 1 running, 225 sleeping, 0 stopped, 2 zombie
Cpu(s): 4.5%us, 0.1%sy, 0.0%ni, 95.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 74222664k total, 54743028k used, 19479636k free, 51392k buffers
Swap: 75485180k total, 412024k used, 75073156k free, 29113180k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4238 user 20 0 52.2g 12g 13m S 105 17.1 103:54.10 java

What do you think is the best I could do , I really need this process to run in a "not very long" time ,

Thanks ,
Arian


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="631" width="949">
      <operator activated="true" class="generate_massive_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Massive Data" width="90" x="45" y="75">
        <parameter key="number_examples" value="580000"/>
        <parameter key="number_attributes" value="4500"/>
        <parameter key="sparse_fraction" value="0.95"/>
      </operator>
      <operator activated="true" class="nominal_to_binominal" compatibility="5.2.008" expanded="true" height="94" name="Nominal to Binominal" width="90" x="179" y="75">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="weka:W-ReliefFAttributeEval" compatibility="5.1.001" expanded="true" height="76" name="W-ReliefFAttributeEval" width="90" x="313" y="75">
        <parameter key="sort_direction" value="descending"/>
      </operator>
      <operator activated="true" class="weights_to_data" compatibility="5.2.008" expanded="true" height="60" name="AttributeWeights2ExampleSet (4)" width="90" x="447" y="75"/>
      <operator activated="true" class="write_csv" compatibility="5.2.008" expanded="true" height="76" name="Write CSV" width="90" x="581" y="75">
        <parameter key="csv_file" value="/home/arian/result.csv"/>
        <parameter key="quote_nominal_values" value="false"/>
        <parameter key="format_date_attributes" value="false"/>
      </operator>
      <connect from_op="Generate Massive Data" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
      <connect from_op="Nominal to Binominal" from_port="example set output" to_op="W-ReliefFAttributeEval" to_port="example set"/>
      <connect from_op="W-ReliefFAttributeEval" from_port="weights" to_op="AttributeWeights2ExampleSet (4)" to_port="attribute weights"/>
      <connect from_op="AttributeWeights2ExampleSet (4)" from_port="example set" to_op="Write CSV" to_port="input"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

earmijo · November 2012

You have to use an operator that takes advantage of the multiple cpu. Download the library Parallel Processing. I've played with the operator in this library and they do cut the processing time enormously.

aryan_hosseinza · November 2012

But it seems that it's not available for all operators (e.g. W-ReliefFAttributeEval) , right ?

Thanks ,
Arian

earmijo · November 2012

That is correct. Not all operators will benefit from parallization. Cross-validation, trees and some forms of searches can be parallized.

aryan_hosseinza · November 2012

Aha , Ok,

Thanks ,
Arian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

[Solved]Running RapidMiner on a MultiCore Server

Answers