Options

Parallelization Operator

cherokeecherokee Member Posts: 82 Maven
edited November 2018 in Help
Hi!

Though I only recently started writing RapidMiner Operators I want to tackle the parallization issue. I want to write a OperatorChain that runs it's inner Operators in parallel. As I know that parallel operation brings aroung much trouble I want to do it work around. All IOObjects shall be duplicated and each inner Operator run in a seperate RapidMiner instance.

I have got two questions:
1. Would anybody besides me use such an Operator?
2. Have I missed anything? or Does anybody see a problem why my idea wouldn't work?

Greetings,
Michael
Tagged:

Answers

  • Options
    steffensteffen Member Posts: 347 Maven
    Hello

    this is quite an interesting idea. However, most process I use (as an example) have a very sequential character, i.e. operator i+1 waits from the output of operator i or work on different subsets of the data (e.g. crossvalidation). So ... which use cases do you have in mind when you speak of duplication ?

    regards,

    Steffen
  • Options
    cherokeecherokee Member Posts: 82 Maven
    Hi Steffen,

    I'm sorry, but I did discribe it wrong. I changed my earlier pos right now.

    What I had in mind was dupplicating all Operator input so that I do not have to syncronize anything. Each Operator than shall be executed in its own RapidMiner instance. Therefore I would duplicate the Operators, too. Just to not get into trouble creating the new process instances. Of course every Operator should be executed only once. In fact only one duplicate of each operator.

    The primary usecase i had in mind was grid parameter optimization. When you want to optimize your parameters and one rans from 1 to 10, you could split it up in two operators with once paramers from 1 to 5 and the other one from 6 to 10. Probably you would have to compare the two results by hand but that would do.

    Best regards,
    Michael
  • Options
    steffensteffen Member Posts: 347 Maven
    Hello Michael,

    ah me stupid. Here is another thought (I dont want to discourage you, I just like to be the devil's advocate):
    If you just want to copy all operators and input etc. you could just start rapidminer twice (from console, with different parameter ranges for GridParameterOptimization) and  write the parameters and final performance vectors out to compare them manually (in a separate rapidminer process).

    kind regards,

    Steffen
  • Options
    cherokeecherokee Member Posts: 82 Maven
    Hello Steffen,

    your absolutly right. That was exactly the reason why I asked that question. I personally would prefere having all parts of my process in one process file.

    And perhaps somebody can think of an other even more useful use case.

    Greeting,
    Michael
Sign In or Register to comment.