GPU slower than CPU

varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
edited January 2019 in Help
Hi,

I switched Deep learning to use GPU instead of CPU(1 core), but this runs slower. I see that the GPU utilization is very less (2 to 3%) while the process is running. When I use CPU the CPU utilization is 70% approx. I am using a batch size of 32. Is it because of the smaller batch size?

Thanks,
Varun
Regards,
Varun
https://www.varunmandalapu.com/

Be Safe. Follow precautions and Maintain Social Distancing

Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,505 RM Data Scientist
    Hi @varunm1,
    on how many examples are you learning? Keep in mind that the cost of getting it on the GPU is fairly high for small data sets. GPUs are useful if your data gets a bit larger.

    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • hughesfleming68hughesfleming68 Member Posts: 323 Unicorn
    edited January 2019
    I have seen this as well but it does not seem to be specific to any particular DL software. The last time I tested this with tensorflow, my CPU with 28 threads was 2x faster than the GPU. For my data sets, I have not found the GPU to help much so I guess it really depends on what you are trying to do. I have also noticed the low gpu utilization, I was under the impression at the time that Windows was not reporting those stats very well.
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hi @mschmitz @hughesfleming68

    Ya true what you said but the datasets are 400k and 1 million samples with 102 attributes. Thats the reason why I felt something wrong after looking at the utilization rates comparing both cpu and gpu. One interesting observation is that earlier for a similar data set gpu utilization is around 30 to 40 percent.

    One more thing is that the dataset is sparse

    Thanks
    Varun


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research
    Hi @varunm1,

    could you perhaps share your network setup with us? It would be interesting to see if there is room for improvements?

    Best,
    David
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hi @David_A

    Do you mean the xml code of neural network process?

    Regards,
    Varun
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research
    Yes,

    with that it's easier to compare the CPU vs. GPU performance.
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited January 2019
    @David_A

    <?xml version="1.0" encoding="UTF-8"?><process version="9.1.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Retrieve Subject_Assistment_Concentration_Clean_100" width="90" x="45" y="187">
            <parameter key="repository_entry" value="../../data/AIED_2019_100/Subject_Assistment_Concentration_Clean_100"/>
          </operator>
          <operator activated="true" class="concurrency:cross_validation" compatibility="9.1.000" expanded="true" height="166" name="Cross Validation" width="90" x="514" y="493">
            <parameter key="split_on_batch_attribute" value="false"/>
            <parameter key="leave_one_out" value="false"/>
            <parameter key="number_of_folds" value="5"/>
            <parameter key="sampling_type" value="automatic"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="deeplearning:dl4j_sequential_neural_network" compatibility="0.9.000" expanded="true" height="103" name="Deep Learning" width="90" x="179" y="34">
                <parameter key="loss_function" value="Cross Entropy (Binary Classification)"/>
                <parameter key="epochs" value="20"/>
                <parameter key="use_miniBatch" value="true"/>
                <parameter key="batch_size" value="32"/>
                <parameter key="updater" value="Adam"/>
                <parameter key="learning_rate" value="0.01"/>
                <parameter key="momentum" value="0.9"/>
                <parameter key="rho" value="0.95"/>
                <parameter key="epsilon" value="1.0E-6"/>
                <parameter key="beta1" value="0.9"/>
                <parameter key="beta2" value="0.999"/>
                <parameter key="RMSdecay" value="0.95"/>
                <parameter key="weight_initialization" value="ReLU"/>
                <parameter key="bias_initialization" value="0.0"/>
                <parameter key="use_regularization" value="false"/>
                <parameter key="l1_strength" value="0.1"/>
                <parameter key="l2_strength" value="0.1"/>
                <parameter key="optimization_method" value="Stochastic Gradient Descent"/>
                <parameter key="backpropagation" value="Standard"/>
                <parameter key="backpropagation_length" value="50"/>
                <parameter key="infer_input_shape" value="true"/>
                <parameter key="network_type" value="Simple Neural Network"/>
                <parameter key="log_each_epoch" value="true"/>
                <parameter key="epochs_per_log" value="10"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
                <process expanded="true">
                  <operator activated="true" class="deeplearning:dl4j_convolutional_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Convolutional Layer" width="90" x="45" y="340">
                    <parameter key="number_of_activation_maps" value="32"/>
                    <parameter key="kernel_size" value="102.5"/>
                    <parameter key="stride_size" value="1.1"/>
                    <parameter key="activation_function" value="ReLU (Rectified Linear Unit)"/>
                    <parameter key="use_dropout" value="true"/>
                    <parameter key="dropout_rate" value="0.5"/>
                    <parameter key="overwrite_networks_weight_initialization" value="false"/>
                    <parameter key="weight_initialization" value="Normal"/>
                    <parameter key="overwrite_networks_bias_initialization" value="false"/>
                    <parameter key="bias_initialization" value="0.0"/>
                  </operator>
                  <operator activated="true" class="deeplearning:dl4j_pooling_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Pooling Layer" width="90" x="179" y="340">
                    <parameter key="Pooling Method" value="max"/>
                    <parameter key="PNorm Value" value="1.0"/>
                    <parameter key="Kernel Size" value="2.2"/>
                    <parameter key="Stride Size" value="1.1"/>
                  </operator>
                  <operator activated="true" class="deeplearning:dl4j_dense_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Fully-Connected Layer" width="90" x="112" y="85">
                    <parameter key="number_of_neurons" value="256"/>
                    <parameter key="activation_function" value="ReLU (Rectified Linear Unit)"/>
                    <parameter key="use_dropout" value="true"/>
                    <parameter key="dropout_rate" value="0.5"/>
                    <parameter key="overwrite_networks_weight_initialization" value="false"/>
                    <parameter key="weight_initialization" value="Normal"/>
                    <parameter key="overwrite_networks_bias_initialization" value="false"/>
                    <parameter key="bias_initialization" value="0.0"/>
                    <description align="center" color="transparent" colored="false" width="126">You can choose a number of neurons to decide how many internal attributes are created.</description>
                  </operator>
                  <operator activated="true" class="deeplearning:dl4j_dense_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Fully-Connected Layer (2)" width="90" x="514" y="85">
                    <parameter key="number_of_neurons" value="2"/>
                    <parameter key="activation_function" value="Softmax"/>
                    <parameter key="use_dropout" value="false"/>
                    <parameter key="dropout_rate" value="0.25"/>
                    <parameter key="overwrite_networks_weight_initialization" value="false"/>
                    <parameter key="weight_initialization" value="Normal"/>
                    <parameter key="overwrite_networks_bias_initialization" value="false"/>
                    <parameter key="bias_initialization" value="0.0"/>
                    <description align="center" color="transparent" colored="false" width="126">The last layer needs to be setup with an activation function, that fits the problem type.</description>
                  </operator>
                  <connect from_port="layerArchitecture" to_op="Add Convolutional Layer" to_port="layerArchitecture"/>
                  <connect from_op="Add Convolutional Layer" from_port="layerArchitecture" to_op="Add Pooling Layer" to_port="layerArchitecture"/>
                  <connect from_op="Add Pooling Layer" from_port="layerArchitecture" to_op="Add Fully-Connected Layer" to_port="layerArchitecture"/>
                  <connect from_op="Add Fully-Connected Layer" from_port="layerArchitecture" to_op="Add Fully-Connected Layer (2)" to_port="layerArchitecture"/>
                  <connect from_op="Add Fully-Connected Layer (2)" from_port="layerArchitecture" to_port="layerArchitecture"/>
                  <portSpacing port="source_layerArchitecture" spacing="0"/>
                  <portSpacing port="sink_layerArchitecture" spacing="0"/>
                  <description align="center" color="yellow" colored="true" height="254" resized="false" width="189" x="60" y="45">First Hidden Layer</description>
                  <description align="center" color="yellow" colored="false" height="254" resized="false" width="189" x="470" y="49">Output Layer</description>
                </process>
                <description align="center" color="transparent" colored="true" width="126">Open the Deep Learning operator by double-clicking on it, to discovere the layer setup.</description>
              </operator>
              <connect from_port="training set" to_op="Deep Learning" to_port="training set"/>
              <connect from_op="Deep Learning" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="9.1.000" expanded="true" height="82" name="Apply Model" width="90" x="112" y="187">
                <list key="application_parameters"/>
                <parameter key="create_view" value="false"/>
              </operator>
              <operator activated="true" class="multiply" compatibility="9.1.000" expanded="true" height="103" name="Multiply" width="90" x="112" y="289"/>
              <operator activated="true" class="performance" compatibility="9.1.000" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="340">
                <parameter key="use_example_weights" value="true"/>
              </operator>
              <operator activated="true" class="performance_classification" compatibility="9.1.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
                <parameter key="main_criterion" value="first"/>
                <parameter key="accuracy" value="true"/>
                <parameter key="classification_error" value="false"/>
                <parameter key="kappa" value="true"/>
                <parameter key="weighted_mean_recall" value="false"/>
                <parameter key="weighted_mean_precision" value="false"/>
                <parameter key="spearman_rho" value="false"/>
                <parameter key="kendall_tau" value="false"/>
                <parameter key="absolute_error" value="false"/>
                <parameter key="relative_error" value="false"/>
                <parameter key="relative_error_lenient" value="false"/>
                <parameter key="relative_error_strict" value="false"/>
                <parameter key="normalized_absolute_error" value="false"/>
                <parameter key="root_mean_squared_error" value="true"/>
                <parameter key="root_relative_squared_error" value="false"/>
                <parameter key="squared_error" value="false"/>
                <parameter key="correlation" value="false"/>
                <parameter key="squared_correlation" value="false"/>
                <parameter key="cross-entropy" value="false"/>
                <parameter key="margin" value="false"/>
                <parameter key="soft_margin_loss" value="false"/>
                <parameter key="logistic_loss" value="false"/>
                <parameter key="skip_undefined_labels" value="true"/>
                <parameter key="use_example_weights" value="true"/>
                <list key="class_weights"/>
                <description align="center" color="transparent" colored="false" width="126">Calculate model performance</description>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Multiply" to_port="input"/>
              <connect from_op="Multiply" from_port="output 1" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Multiply" from_port="output 2" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="performance 2"/>
              <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_test set results" spacing="0"/>
              <portSpacing port="sink_performance 1" spacing="0"/>
              <portSpacing port="sink_performance 2" spacing="0"/>
              <portSpacing port="sink_performance 3" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Subject_Assistment_Concentration_Clean_100" from_port="output" to_op="Cross Validation" to_port="example set"/>
          <connect from_op="Cross Validation" from_port="model" to_port="result 3"/>
          <connect from_op="Cross Validation" from_port="performance 1" to_port="result 1"/>
          <connect from_op="Cross Validation" from_port="performance 2" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
          <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="45" y="40">Creating a simple neural network with one hidden layer and an output layer.</description>
          <description align="center" color="green" colored="true" height="331" resized="true" width="275" x="285" y="79">Iris is a multi-class classification problem, therefore the network loss is set to &amp;quot;multiclass cross entropy&amp;quot;.</description>
        </process>
      </operator>
    </process>
    


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research
    Thanks a lot.

    I'll investigate it, but I can't promise anything on the short term.
    As @hughesfleming68 already mentioned, that's nothing RapidMiner specific and happens at a lot of Deep Learning frameworks.


  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    @David_A

    Sure no problem, I just want to bring it to your notice.

    Thanks,
    Varun
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.