How to run processes from data stored 100% in the cloud?

artavia_eduardoartavia_eduardo Member Posts: 3 Contributor I
Hi all. 

I've been working with RapidMiner Studio for a while now. Have a little experience working with predictive models and such.

Right now my company is asking me to analyze some medical data from real world patients. However, because of privacy and laws, I can't have these data stored in my physical computer not even for a single minute. I know how to connect my RapidMiner Studio to a SQL Server and access data from the cloud, however, when running a process, the data gets downloaded to my computer.

How would you guys recommend I tackle this issue? Is there a way to use RM 100% in the cloud? or have it access data that is 100% in the cloud? Not sure if RapidMiner Server would help me, I've never used it.

Thank you.


Best Answer


  • Options
    SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    Does that mean that you cannot explore the data? That would make sense, aside from whether the data sits on the disk or in memory.

    I think you can build a procedure on the database to train and test the performance of a model, then you would only receive the results (e.g. confusion matrix) on your computer. I imagine that you can set up a solution with Postgresql and Python, but it needs help from the data provider.

    Solutions with RM Server don't seem too apply, unless the provider of the data is allowed to install the server locally. Once you copy data away from the originator, it is the same whether it sits on your computer or in a RM Server on the cloud.


  • Options
    NikouyNikouy Member Posts: 22 Contributor II
    Hi Fabian, Sebastian,

    Do you know how does Rapidminer interact with data stored in Amazon Redshift or in Azure data lake? Does it always pull/ download this data and load it onto memory in order to analyse it?

  • Options
    tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
    Hi @Nikouy

    As I already wrote in my first response, a program (not only RapidMiner) is not able to execute any analysis on data without accessing it. So, if you want to execute RapidMiner locally, it has to load the data in memory to analyse it. Everything what I wrote in my first response is also true for Amazon Redshift or Azure data lake. You can use our "Pay as you Go" licences for RM Server (https://rapidminer.com/pricing/ under RapidMiner Server (Cloud). This would use a RM Server instance on either Amazon AWS or Microsoft Azure and connect to Redshift or Azure data lake. Than the execution will happen on the cloud servers of AWS/Azure.

    Best regards,
Sign In or Register to comment.