Options

Comparing multiple methods with cross validation

delendelen Member Posts: 4 Contributor I
edited November 2018 in Help
I would like to compare three prediction model types (e.g., ANN, SVM and DT) all at once under a x-fold cross validation. Can I do that in RapidMiner? Or, do I need to do it one at a time by replacing the modeler in the lower-level process?

Thanks,

Delen

Answers

  • Options
    SebastianLohSebastianLoh Member Posts: 99 Contributor II
    Hi Delen,

    you could use a Multiplier followed by all the predictionmodels + x-validation you want to use. Then all models are applied simultanious and you can compare the validations.

    Maybe you can post you process, if you need some more help.

    Ciao Sebastian
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    I am not sure if the multiplier approach will work here. Anyway, the following approach should work which I have just uploaded with our new Community Extension under the name "Evaluating Multiple Models with Looped X-Validation (Loops + Macros)". Just download the myExperiment Community Extension within RapidMiner and you will get access to this process and can simply download and run it with a few clicks. Here is the website of this process: http://www.myexperiment.org/workflows/1273

    The process shows how multiple different models can be evaluated with cross validation runs. This allows for the comparison of, for example, three prediction model types (e.g., ANN, SVM and DT) all at once under a x-fold cross validation. Having said that, the process actually performs the same cross validation several times using each time a different modeling scheme. It makes use of loops, collections, subprocess selection and macros and is therefore also an interesting showcase for more complex process designs within RapidMiner.

    The process begins with the definition of the number of models which should be evaluated (Set Macro) which sets the macro "max" to the value of 3 in this example. The next step is the generation of data which would normally be replaced by loading your own data from the repository or by some ETL subprocess.

    The interesting parts begins now: The Loop operator now iterates the defined number of times over its inner process consisting of a macro definition for the current iteration and the cross validation. The cross validation itself is defined with a local random seed in its parameters in order to ensure that in each iteration exactly the same data splitting is performed. For each iteration, the training subprocess will select a different learner according to the value of the macro "current_iteration". Please make sure that the number of models here and the number defined in "max" are the same!

    The results are automatically collected within a Performance Vector Collection by the Loop operator and will be delivered as final result.

    Cheers,
    Ingo
Sign In or Register to comment.