Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Running concurrent processes

weslee3weslee3 Member Posts: 1 Learner III
edited November 2018 in Help
Can anyone confirm or deny that RM 5.3 is capable of processing multiple processes concurrently?

I am integrating RM in a Java project and am having an issue when attempting to run multiple RM Processes concurrently. Here is an overview of what I am doing:
1. Main thread: initialize RapidMiner
2. Create a runnable that executes a RM Process via a File
3. Submit 20 instances of this runnable to a thread pool
4. All 20 results show different results in the resulting IOContainer <-- THIS IS THE PROBLEM

NOTES:
  • The Process contains only static data.
  • The Process produces the same result on every run when running it serially.
  • The Process uses the Series extension (rmx_series-5.3.0.jar)
  • The Runnables do not share any data.
I did come across a response from Simon Fischer in this thread which seems to imply that RM 5 is not capable of having multiple processes open:
"a warning: Be careful with static references. RM 6 will probably allow multiple processes opened, and RapidAnalytics certainly does."
http://rapid-i.com/rapidforum/index.php/topic,2917.msg11719.html#msg11719
Tagged:

Answers

  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering
    Hi,

    RapidMiner 5.3 is not capable of that.

    Regards,
    Marco
  • polaris2600polaris2600 Member Posts: 1 Learner III
    Ah.. I was afraid that was going to be the case. Same problem, same basic setup, same symptoms. Very confusing. Many late nights getting different ExampleSets each run. Traumatizing.

    Most of the new frontiers in training systems (like finding interesting bits in huge problem spaces) depend on how ubiquitous parallel processing has become.

    To really access RM's power on machines with 4-16+ cores, when working on embarrassingly parallel problems, clean process separation seems an absolute necessity. I haven't dug deep enough into the code to see exactly where objects (iterators, RNGs, etc?) are being shared (assuming that is the issue), but I would like to express how useful it would be to clean up those dependencies in a future release of RapidMiner.

    Think of me at 2 AM, grasping for that break through, the sense of achievement and pride welling up within me as the points of data plot a beautiful line... and having to fall asleep hurt, distraught and confused in the fetal position. Why RapidMiner, why?! I thought you were my friend!

    Ahem. Anyway... right now I have to do some evil things, like starting up separate JVMs depending on how many cores are on the machine. It's cave-man threading. Ungabunga! ;) Somehow I know we can do better!

    If you have any ideas on where the problem(s) might be in more concrete terms, I am certainly all ears and would happily dive into the code to try and help solve the mystery...
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering
    Hi,

    the RapidMiner5 design was done quite some years ago, in a time where multiple processes and multi-threading in general where not yet that big a deal (sadly :( )
    We will however rectify these shortcomings in the future ;)

    Regards,
    Marco
Sign In or Register to comment.