The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to run processes from data stored 100% in the cloud?
artavia_eduardo
Member Posts: 3 Learner II
in Help
Hi all.
I've been working with RapidMiner Studio for a while now. Have a little experience working with predictive models and such.
Right now my company is asking me to analyze some medical data from real world patients. However, because of privacy and laws, I can't have these data stored in my physical computer not even for a single minute. I know how to connect my RapidMiner Studio to a SQL Server and access data from the cloud, however, when running a process, the data gets downloaded to my computer.
How would you guys recommend I tackle this issue? Is there a way to use RM 100% in the cloud? or have it access data that is 100% in the cloud? Not sure if RapidMiner Server would help me, I've never used it.
Thank you.
Eduardo.
I've been working with RapidMiner Studio for a while now. Have a little experience working with predictive models and such.
Right now my company is asking me to analyze some medical data from real world patients. However, because of privacy and laws, I can't have these data stored in my physical computer not even for a single minute. I know how to connect my RapidMiner Studio to a SQL Server and access data from the cloud, however, when running a process, the data gets downloaded to my computer.
How would you guys recommend I tackle this issue? Is there a way to use RM 100% in the cloud? or have it access data that is 100% in the cloud? Not sure if RapidMiner Server would help me, I've never used it.
Thank you.
Eduardo.
0
Best Answer
-
tftemme Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM ResearchHi @artavia_eduardo
Is it not even allowed to be loaded into the memory of your computer (so not stored on the disk)? If even loading in memory is not allowed it is impossible for a program running on your computer to do anything with the data, because obviously it need to be able to access the data.
If this is the case I have a few suggestions which might work, but have to be investigated:
- You could use the In-Database extension. With this extension you can create complex SQL commands which are then executed in the SQL database. Unfortunately you will be of course limited to the functionality SQL is providing. There is no possibility to leverage RM specific functionality through the SQL commands. But you could use if you can perform an anonymisation of your data in the SQL database before loading it to your PC and applying any RM logic on it. After that you could use the In-Database extension again to update the original data with for example scored values. Don't know if you are allowed to use anonymised data on your computer
- You can install RM Server on the same Cloud Hardware were the Database is located. Then the execution of any RM Process on this RM Server is in the same "Cloud" as the data itself
- You can use our "Pay as you Go" licences for RM Server (https://rapidminer.com/pricing/ under RapidMiner Server (Cloud). This would use a RM Server instance on either Amazon AWS or Microsoft Azure. Would be in the cloud, but probably not in the same Cloud structure as your data.
If it is allowed to load the data in memory, just don't use Store (or Write) operators. Load the data from SQL, process it and update the SQL-DB again all in one process.
Hopes this helps
Best regards
Fabian
8
Answers
As I already wrote in my first response, a program (not only RapidMiner) is not able to execute any analysis on data without accessing it. So, if you want to execute RapidMiner locally, it has to load the data in memory to analyse it. Everything what I wrote in my first response is also true for Amazon Redshift or Azure data lake. You can use our "Pay as you Go" licences for RM Server (https://rapidminer.com/pricing/ under RapidMiner Server (Cloud). This would use a RM Server instance on either Amazon AWS or Microsoft Azure and connect to Redshift or Azure data lake. Than the execution will happen on the cloud servers of AWS/Azure.
Best regards,
Fabian