Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Estimate Experiment Time Feature, Amazon EC2

EricDEricD Member Posts: 1 Learner III
edited June 2019 in Help
I am interested in a feature that would allow the user to estimate the time required to complete the experiment before launching the experiment.

Many of the experiments I conduct use the 'Optimize Selection (Evolutionary)' operator with a variable number of generations. Adding the above feature would allow me to reduce the maximum number of generations in order to conduct an initial test of an idea and only add additional generations if the test is successful.

I am also working on developing an Amazon EC2 instance that is configured with Ubuntu, RapidMiner, R, Amazon AWS Tools that I would provide free to the RapidMiner community. This would allow those of us with data mining problems that are easy to spread across multiple instances a quick way to conduct larger scale experiments.  Having the functionality to estimate the length of time that an experiment would take would allow the user determine how many instances need to be launched in order to complete th processing in the desired amount of time.

I would be happy to provide additional details and/or help test the functionality I described above.

Regards,
Eric

Answers

  • steffensteffen Member Posts: 347 Maven
    Hello EricD

    I like the idea ... my first idea was to build a database consisting of
    predictors: number of attributes (total and according to type), average size of attributes (i.e. number of nominal values), number of examples (and many more...)
    response: execution time for a certain pair of (operator, operator-parameter-settings)

    Problem: Since the computer architectures can vary, rm is required to create a database for each user (although one could try to build a prior using core/cpu-power + ram ... ).

    So as a result, the longer a user uses rm, the better get the predictions. On the other side, at this stage the user maybe already knows which execution time to expect ;).

    just my 2 cents,

    steffen
  • fischerfischer Member Posts: 439 Maven
    Hi,

    Wrong board. This doesn't go to "Feature Requests", but rather to "Research Proposals".  :-)

    Seriously, that's one of the things we are working on within the e-LICO project: www.e-lico.eu. I think that's a very interesting thing. Partially, operators are annotated, e.g., with their running time as a function of number of examples, attributes, etc. But than it's a matter of finding out the coefficients, etc. We're open to your ideas here.

    Best,
    Simon
Sign In or Register to comment.