Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Running concurrent processes
Can anyone confirm or deny that RM 5.3 is capable of processing multiple processes concurrently?
I am integrating RM in a Java project and am having an issue when attempting to run multiple RM Processes concurrently. Here is an overview of what I am doing:
1. Main thread: initialize RapidMiner
2. Create a runnable that executes a RM Process via a File
3. Submit 20 instances of this runnable to a thread pool
4. All 20 results show different results in the resulting IOContainer <-- THIS IS THE PROBLEM
NOTES:
"a warning: Be careful with static references. RM 6 will probably allow multiple processes opened, and RapidAnalytics certainly does."
http://rapid-i.com/rapidforum/index.php/topic,2917.msg11719.html#msg11719
I am integrating RM in a Java project and am having an issue when attempting to run multiple RM Processes concurrently. Here is an overview of what I am doing:
1. Main thread: initialize RapidMiner
2. Create a runnable that executes a RM Process via a File
3. Submit 20 instances of this runnable to a thread pool
4. All 20 results show different results in the resulting IOContainer <-- THIS IS THE PROBLEM
NOTES:
- The Process contains only static data.
- The Process produces the same result on every run when running it serially.
- The Process uses the Series extension (rmx_series-5.3.0.jar)
- The Runnables do not share any data.
"a warning: Be careful with static references. RM 6 will probably allow multiple processes opened, and RapidAnalytics certainly does."
http://rapid-i.com/rapidforum/index.php/topic,2917.msg11719.html#msg11719
Tagged:
0
Answers
RapidMiner 5.3 is not capable of that.
Regards,
Marco
Most of the new frontiers in training systems (like finding interesting bits in huge problem spaces) depend on how ubiquitous parallel processing has become.
To really access RM's power on machines with 4-16+ cores, when working on embarrassingly parallel problems, clean process separation seems an absolute necessity. I haven't dug deep enough into the code to see exactly where objects (iterators, RNGs, etc?) are being shared (assuming that is the issue), but I would like to express how useful it would be to clean up those dependencies in a future release of RapidMiner.
Think of me at 2 AM, grasping for that break through, the sense of achievement and pride welling up within me as the points of data plot a beautiful line... and having to fall asleep hurt, distraught and confused in the fetal position. Why RapidMiner, why?! I thought you were my friend!
Ahem. Anyway... right now I have to do some evil things, like starting up separate JVMs depending on how many cores are on the machine. It's cave-man threading. Ungabunga! Somehow I know we can do better!
If you have any ideas on where the problem(s) might be in more concrete terms, I am certainly all ears and would happily dive into the code to try and help solve the mystery...
the RapidMiner5 design was done quite some years ago, in a time where multiple processes and multi-threading in general where not yet that big a deal (sadly )
We will however rectify these shortcomings in the future
Regards,
Marco