How to run processes from data stored 100% in the cloud?

artavia_eduardoartavia_eduardo Member Posts: 1 Learner I
Hi all. 

I've been working with RapidMiner Studio for a while now. Have a little experience working with predictive models and such.

Right now my company is asking me to analyze some medical data from real world patients. However, because of privacy and laws, I can't have these data stored in my physical computer not even for a single minute. I know how to connect my RapidMiner Studio to a SQL Server and access data from the cloud, however, when running a process, the data gets downloaded to my computer.

How would you guys recommend I tackle this issue? Is there a way to use RM 100% in the cloud? or have it access data that is 100% in the cloud? Not sure if RapidMiner Server would help me, I've never used it.

Thank you.



  • tftemmetftemme Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 102  RM Research
    Hi @artavia_eduardo

    Is it not even allowed to be loaded into the memory of your computer (so not stored on the disk)? If even loading in memory is not allowed it is impossible for a program running on your computer to do anything with the data, because obviously it need to be able to access the data. 
    If this is the case I have a few suggestions which might work, but have to be investigated:
    - You could use the In-Database extension. With this extension you can create complex SQL commands which are then executed in the SQL database. Unfortunately you will be of course limited to the functionality SQL is providing. There is no possibility to leverage RM specific functionality through the SQL commands. But you could use if you can perform an anonymisation of your data in the SQL database before loading it to your PC and applying any RM logic on it. After that you could use the In-Database extension again to update the original data with for example scored values. Don't know if you are allowed to use anonymised data on your computer
    - You can install RM Server on the same Cloud Hardware were the Database is located. Then the execution of any RM Process on this RM Server is in the same "Cloud" as the data itself
    - You can use our "Pay as you Go" licences for RM Server ( under RapidMiner Server (Cloud). This would use a RM Server instance on either Amazon AWS or Microsoft Azure. Would be in the cloud, but probably not in the same Cloud structure as your data.

    If it is allowed to load the data in memory, just don't use Store (or Write) operators. Load the data from SQL, process it and update the SQL-DB again all in one process.

    Hopes this helps
    Best regards
Sign In or Register to comment.