The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
GPU slower than CPU
Hi,
I switched Deep learning to use GPU instead of CPU(1 core), but this runs slower. I see that the GPU utilization is very less (2 to 3%) while the process is running. When I use CPU the CPU utilization is 70% approx. I am using a batch size of 32. Is it because of the smaller batch size?
Thanks,
Varun
I switched Deep learning to use GPU instead of CPU(1 core), but this runs slower. I see that the GPU utilization is very less (2 to 3%) while the process is running. When I use CPU the CPU utilization is 70% approx. I am using a batch size of 32. Is it because of the smaller batch size?
Thanks,
Varun
Regards,
Varun
https://www.varunmandalapu.com/
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Tagged:
0

Unicorn
Answers
on how many examples are you learning? Keep in mind that the cost of getting it on the GPU is fairly high for small data sets. GPUs are useful if your data gets a bit larger.
BR,
Martin
Dortmund, Germany
Ya true what you said but the datasets are 400k and 1 million samples with 102 attributes. Thats the reason why I felt something wrong after looking at the utilization rates comparing both cpu and gpu. One interesting observation is that earlier for a similar data set gpu utilization is around 30 to 40 percent.
One more thing is that the dataset is sparse
Thanks
Varun
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
could you perhaps share your network setup with us? It would be interesting to see if there is room for improvements?
Best,
David
Do you mean the xml code of neural network process?
Regards,
Varun
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
with that it's easier to compare the CPU vs. GPU performance.
<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Retrieve Subject_Assistment_Concentration_Clean_100" width="90" x="45" y="187"> <parameter key="repository_entry" value="../../data/AIED_2019_100/Subject_Assistment_Concentration_Clean_100"/> </operator> <operator activated="true" class="concurrency:cross_validation" compatibility="9.1.000" expanded="true" height="166" name="Cross Validation" width="90" x="514" y="493"> <parameter key="split_on_batch_attribute" value="false"/> <parameter key="leave_one_out" value="false"/> <parameter key="number_of_folds" value="5"/> <parameter key="sampling_type" value="automatic"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="deeplearning:dl4j_sequential_neural_network" compatibility="0.9.000" expanded="true" height="103" name="Deep Learning" width="90" x="179" y="34"> <parameter key="loss_function" value="Cross Entropy (Binary Classification)"/> <parameter key="epochs" value="20"/> <parameter key="use_miniBatch" value="true"/> <parameter key="batch_size" value="32"/> <parameter key="updater" value="Adam"/> <parameter key="learning_rate" value="0.01"/> <parameter key="momentum" value="0.9"/> <parameter key="rho" value="0.95"/> <parameter key="epsilon" value="1.0E-6"/> <parameter key="beta1" value="0.9"/> <parameter key="beta2" value="0.999"/> <parameter key="RMSdecay" value="0.95"/> <parameter key="weight_initialization" value="ReLU"/> <parameter key="bias_initialization" value="0.0"/> <parameter key="use_regularization" value="false"/> <parameter key="l1_strength" value="0.1"/> <parameter key="l2_strength" value="0.1"/> <parameter key="optimization_method" value="Stochastic Gradient Descent"/> <parameter key="backpropagation" value="Standard"/> <parameter key="backpropagation_length" value="50"/> <parameter key="infer_input_shape" value="true"/> <parameter key="network_type" value="Simple Neural Network"/> <parameter key="log_each_epoch" value="true"/> <parameter key="epochs_per_log" value="10"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <process expanded="true"> <operator activated="true" class="deeplearning:dl4j_convolutional_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Convolutional Layer" width="90" x="45" y="340"> <parameter key="number_of_activation_maps" value="32"/> <parameter key="kernel_size" value="102.5"/> <parameter key="stride_size" value="1.1"/> <parameter key="activation_function" value="ReLU (Rectified Linear Unit)"/> <parameter key="use_dropout" value="true"/> <parameter key="dropout_rate" value="0.5"/> <parameter key="overwrite_networks_weight_initialization" value="false"/> <parameter key="weight_initialization" value="Normal"/> <parameter key="overwrite_networks_bias_initialization" value="false"/> <parameter key="bias_initialization" value="0.0"/> </operator> <operator activated="true" class="deeplearning:dl4j_pooling_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Pooling Layer" width="90" x="179" y="340"> <parameter key="Pooling Method" value="max"/> <parameter key="PNorm Value" value="1.0"/> <parameter key="Kernel Size" value="2.2"/> <parameter key="Stride Size" value="1.1"/> </operator> <operator activated="true" class="deeplearning:dl4j_dense_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Fully-Connected Layer" width="90" x="112" y="85"> <parameter key="number_of_neurons" value="256"/> <parameter key="activation_function" value="ReLU (Rectified Linear Unit)"/> <parameter key="use_dropout" value="true"/> <parameter key="dropout_rate" value="0.5"/> <parameter key="overwrite_networks_weight_initialization" value="false"/> <parameter key="weight_initialization" value="Normal"/> <parameter key="overwrite_networks_bias_initialization" value="false"/> <parameter key="bias_initialization" value="0.0"/> <description align="center" color="transparent" colored="false" width="126">You can choose a number of neurons to decide how many internal attributes are created.</description> </operator> <operator activated="true" class="deeplearning:dl4j_dense_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Fully-Connected Layer (2)" width="90" x="514" y="85"> <parameter key="number_of_neurons" value="2"/> <parameter key="activation_function" value="Softmax"/> <parameter key="use_dropout" value="false"/> <parameter key="dropout_rate" value="0.25"/> <parameter key="overwrite_networks_weight_initialization" value="false"/> <parameter key="weight_initialization" value="Normal"/> <parameter key="overwrite_networks_bias_initialization" value="false"/> <parameter key="bias_initialization" value="0.0"/> <description align="center" color="transparent" colored="false" width="126">The last layer needs to be setup with an activation function, that fits the problem type.</description> </operator> <connect from_port="layerArchitecture" to_op="Add Convolutional Layer" to_port="layerArchitecture"/> <connect from_op="Add Convolutional Layer" from_port="layerArchitecture" to_op="Add Pooling Layer" to_port="layerArchitecture"/> <connect from_op="Add Pooling Layer" from_port="layerArchitecture" to_op="Add Fully-Connected Layer" to_port="layerArchitecture"/> <connect from_op="Add Fully-Connected Layer" from_port="layerArchitecture" to_op="Add Fully-Connected Layer (2)" to_port="layerArchitecture"/> <connect from_op="Add Fully-Connected Layer (2)" from_port="layerArchitecture" to_port="layerArchitecture"/> <portSpacing port="source_layerArchitecture" spacing="0"/> <portSpacing port="sink_layerArchitecture" spacing="0"/> <description align="center" color="yellow" colored="true" height="254" resized="false" width="189" x="60" y="45">First Hidden Layer</description> <description align="center" color="yellow" colored="false" height="254" resized="false" width="189" x="470" y="49">Output Layer</description> </process> <description align="center" color="transparent" colored="true" width="126">Open the Deep Learning operator by double-clicking on it, to discovere the layer setup.</description> </operator> <connect from_port="training set" to_op="Deep Learning" to_port="training set"/> <connect from_op="Deep Learning" from_port="model" to_port="model"/> <portSpacing port="source_training set" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> <process expanded="true"> <operator activated="true" class="apply_model" compatibility="9.1.000" expanded="true" height="82" name="Apply Model" width="90" x="112" y="187"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="multiply" compatibility="9.1.000" expanded="true" height="103" name="Multiply" width="90" x="112" y="289"/> <operator activated="true" class="performance" compatibility="9.1.000" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="340"> <parameter key="use_example_weights" value="true"/> </operator> <operator activated="true" class="performance_classification" compatibility="9.1.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34"> <parameter key="main_criterion" value="first"/> <parameter key="accuracy" value="true"/> <parameter key="classification_error" value="false"/> <parameter key="kappa" value="true"/> <parameter key="weighted_mean_recall" value="false"/> <parameter key="weighted_mean_precision" value="false"/> <parameter key="spearman_rho" value="false"/> <parameter key="kendall_tau" value="false"/> <parameter key="absolute_error" value="false"/> <parameter key="relative_error" value="false"/> <parameter key="relative_error_lenient" value="false"/> <parameter key="relative_error_strict" value="false"/> <parameter key="normalized_absolute_error" value="false"/> <parameter key="root_mean_squared_error" value="true"/> <parameter key="root_relative_squared_error" value="false"/> <parameter key="squared_error" value="false"/> <parameter key="correlation" value="false"/> <parameter key="squared_correlation" value="false"/> <parameter key="cross-entropy" value="false"/> <parameter key="margin" value="false"/> <parameter key="soft_margin_loss" value="false"/> <parameter key="logistic_loss" value="false"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> <list key="class_weights"/> <description align="center" color="transparent" colored="false" width="126">Calculate model performance</description> </operator> <connect from_port="model" to_op="Apply Model" to_port="model"/> <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/> <connect from_op="Apply Model" from_port="labelled data" to_op="Multiply" to_port="input"/> <connect from_op="Multiply" from_port="output 1" to_op="Performance" to_port="labelled data"/> <connect from_op="Multiply" from_port="output 2" to_op="Performance (2)" to_port="labelled data"/> <connect from_op="Performance (2)" from_port="performance" to_port="performance 2"/> <connect from_op="Performance" from_port="performance" to_port="performance 1"/> <portSpacing port="source_model" spacing="0"/> <portSpacing port="source_test set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_test set results" spacing="0"/> <portSpacing port="sink_performance 1" spacing="0"/> <portSpacing port="sink_performance 2" spacing="0"/> <portSpacing port="sink_performance 3" spacing="0"/> </process> </operator> <connect from_op="Retrieve Subject_Assistment_Concentration_Clean_100" from_port="output" to_op="Cross Validation" to_port="example set"/> <connect from_op="Cross Validation" from_port="model" to_port="result 3"/> <connect from_op="Cross Validation" from_port="performance 1" to_port="result 1"/> <connect from_op="Cross Validation" from_port="performance 2" to_port="result 2"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> <portSpacing port="sink_result 4" spacing="0"/> <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="45" y="40">Creating a simple neural network with one hidden layer and an output layer.</description> <description align="center" color="green" colored="true" height="331" resized="true" width="275" x="285" y="79">Iris is a multi-class classification problem, therefore the network loss is set to &quot;multiclass cross entropy&quot;.</description> </process> </operator> </process>Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
I'll investigate it, but I can't promise anything on the short term.
As @hughesfleming68 already mentioned, that's nothing RapidMiner specific and happens at a lot of Deep Learning frameworks.
Sure no problem, I just want to bring it to your notice.
Thanks,
Varun
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing