NVIDIA CUDA GPU support addon to speed-up rapidminer

sukhoi47sukhoi47 Member Posts: 3 Contributor I
edited June 2019 in Help
Hi guys,

I am sure you are aware about the progressive use of the processors of the NVIDIA GPU graphic boards to speed-up many softwares that require lots of computing power.

Recently the nd.com included a CUDA plug-in for its NeuroSolutions software.
In this link you can check how CUDA speed-up the processing:

I would like to suggest to create a rapidminer add-on to allow users to use the CPU together with the available CUDA GPUs to increase the rapidminer performance. But remember to create a plug-in compatible with ALL CUDA graphic cards, and not only with heavy GPU systems like TESLA processor.

Most of the students have notebooks with cuda gpu cores. My own notebook have 32 cuda cores; other notebook have 96 or even more. Nvidia graphic cards for desktops have even more cores CUDA, hundreds.
So, do not repeat the NeuroSolutions software error, with a cuda add-on compatible just with some CUDA GPU versions (the most expensive types).

Matlab is another software with CUDA support. But, again, compatible just with the most expensive CUDA systems. And the students do not have access to such top GPU hardware.

Some information about the CUDA versions are available here:

I am sure most of the rapidminer users will appreciate this new feature.



  • crappy_vikingcrappy_viking Member Posts: 16 Maven

    I may suggest that it will not cause prejudice to RapidAnalytics product since it may inherit the same plugin on the server side.
  • wesselwessel Member Posts: 537 Maven

    It is possible to compile Java code into Native code, and set an optimization flag so it uses the GPU.

    Best regards,

  • sukhoi47sukhoi47 Member Posts: 3 Contributor I
    If I understood the wessel's comments... it is possible to enable GPU as standard feature.
    So, why not to include this in the next rapidminer release.

    (Lack of) Computer power is the main problem for neural networks use.
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    @Wessel: How should this be possible? If you know a way or program, please give us a link to that.

    @all: It is very complex to adapt algorithms to CUDA environment, because you simply have to rewrite each single algorithm in a C-like, non debugable language. This will need an enormous man power and I doubt we can stem it ourselves without being paid for this. So unless the community either wants to pay for this or wants to contribute heavily, there won't be such an extension in the near future.

    By the way: With the CUDA extension we would loose all operating system independence we have, since this will only work on certain graphic cards supporting the most recent CUDA version and only on operating systems, CUDA drivers are available.


  • wesselwessel Member Posts: 537 Maven
    Hmm, maybe I spoke too soon.

    I was in a project where we compiled Java to Native Machine Code using:
    But now that I look it up on the web I think we also used jCUDA, to make use of the GPU.

    Best regards,

  • PrekoPreko Member Posts: 21 Contributor II
    Hi All,

    We are doing some work on GPU-based data analytics, but you should not expect many algorithms implemented to GPU. The problem is that even a simple algorithm takes months to implement and optimize in CUDA.

    We have started with a Random Forest implementation, but that was too complicated for the first time, so we switched back to simple Nearest Neighbor which is basically just the calculation of a distance matrix. Even for this, there are many tweaks that you can use and these small tweaks really make a difference. Right now, compared to the CPU, we are 2-10x faster for Euclidean distance calculations and 20-35x faster for the DTW distance. These were measured with an average graphics card that you can buy today.

    I agree with Sebastian that this is a huge effort and it is very unlikely that someone will come up with a full solution. The only way this could happen is that some devoted people start a separate open-source project. If this happens, we will be very happy to contribute.

    Best, Zoltan
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Zoltan, hi all,

    what do you think about the following idea:
    We could add some interfaces to define some easy to use plugin-mechanism for calculations like distance calculations, Matrix calculations and so on, so that we could build an RapidMiner Extension in a separate project. When you have the extension installed RapidMiner will automatically chose to use the graphics card if available and everything is fully transparent to the user, he will receive the same results and can work the the same operators. Even if he changes to his laptop the process will continue to work, even if no supported graphics card is present.

    Would anybody be interested in that?

  • PrekoPreko Member Posts: 21 Contributor II
    We are in, but I have some concerns that need to be tested.
    Copying Java memory objects to GPU memory might be quite slow, so I am not sure if it makes sense to always move data back and forth. It might be better to keep it in GPU memory all the time, but then this needs a different IOObject.
    I would suggest to spend significant time on testing different architecture plans, as a bad architecture can lose all the processing improvement that the GPU has.
  • PrekoPreko Member Posts: 21 Contributor II
    Hi All,

    We have a huge progress in this issue.
    We have implemented a GPU version of the Cross-distance operator by using JCUDA for moving Java objects to the GPU memory and it seems to be quite fast and reliable. We have about an order of magnitude speedup for Euclidean distance as long as the data fits into GPU memory.

    The model that works best for the graphics card is to move a large (at least several MBs) data chunk to the GPU, do significant amount of computation on it, and move the output data back in one step. Otherwise if you are just pushing small tasks to the GPU, then the performance is degraded because of accessing the GPU many times.

    We plan to work on other distances for the Cross-distance operator, DTW being our main target as it includes many computations so we expect a more significant speedup. We also plan to create a GPU version for the Nearest Neighbor operator.

    @all: What do you think the next target should be for speeding up with GPU computations? It has to be processor-intensive.

    @Sebastian: Do you think that these interfaces can be created, so we do not need to create separate operators, but each built-in operator will be able to call the GPU version if present?
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Zoltan,

    I think the requirements of the interface will be that a) It is compatible with existing distance measures and b) it supports a buffered mode, for copying chunks of several mb to Graphicscard and back.

    The problem is not of implementing such interfaces, but to change all the operators that make use of it to be able to use the batch mode. Think of the Agglomerative Clustering operator, K-Means, K-Medoids or K-NN for example.
    We have to wait until Simon is back from vacation to discuss this issue closer. I would suggest, we move this discussion then to the development mailing list, and perhaps meet in WebEx for a discussion.

    With kind regards,
      Sebastian Land
Sign In or Register to comment.