How to perform cross-validation correctly when selecting Multi-Objective features?

xiaoniuniuxiaoniuniu Member Posts: 4 Contributor I
edited December 2018 in Help



Hello everyone, I am a master student in China and a fan of RapidMiner. However, the time to enter the community is still relatively short and has always been a self-study phase. Thanks to Rapidminer staff, Allie's help, taught me how to post in the RM community. So yes, this is the first time I asked me in such a warm community.

I have recently read a few blog posts, including four articles on multi-objective optimization feature selection and four articles on correct cross-validation (these blog posts are all from Mr. Ingo). I feel inspired. This is worth my further study. My current confusion is that in the fourth blog post on correct cross-validation (https://rapidminer.com/blog/learn-right-way-validate-models-part-4-accidental-contamination/), Ingo Mr. Ms. said that it is to avoid accidentally contaminating data through feature selection. As a result, Ingo conducted cross-validation outside, and internally also had a cross-validation.

In the third multi-objective optimization feature selection blog (https://rapidminer.com/blog/multi-objective-optimization-feature-selection/), Ingo provided a process that directly selects evolutionary features and does not Perform cross-validation. I have been wondering whether it is necessary to conduct cross-validation on the outside in order to achieve the above mentioned correct cross-validation blog mentioned in the avoidance of feature selection to bring about data pollution.
But in multi-objective optimization I do not know how to establish such a process. I want to ask
1. Do you need to add the correct cross-validation step outside? If necessary, how to establish this process? I hope partners and experts help me establish such a correct process. (I have included processes provided by Mr Ingo's blog .It is multi-objective optimization feature selection, and the other is correct cross-validation to avoid  accidental contaminationdue to feature selection. How do I merge them?) 

2. If there is no need to merge, I also want to listen to the reasons given by my partners and teachers.
Sincerely thanks


  • Options
    xiaoniuniuxiaoniuniu Member Posts: 4 Contributor I

    This is the two accompanying rmps for the above text, uploaded. It is also Mr. Ingo's process, and I hope to get everyone's help.

  • Options
    xiaoniuniuxiaoniuniu Member Posts: 4 Contributor I




    Sorry, the above was not sent completely, this is this

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    cc @IngoRM



  • Options
    xiaoniuniuxiaoniuniu Member Posts: 4 Contributor I

    still nobody reply me..

  • Options
    RNarayanRNarayan Member Posts: 4 Contributor I
    edited May 2021
    I've struggled with understanding and applying this too. Is the outer cross-validation suggested in the Data Contamination blog a purist view?
    While it seems to make logical sense, is it that the practical implementation of such a nested cross-validation results in too many iterations that make the run-time prohibitive?

    What's more, the processes generated by AutoML also don't seem to have nested CVs both for Parameter Optimisation and Feature Engineering, just a single inner CV within the PO and FE operators which is consistent with all other examples of PO and FE provided.

    Can someone please clear the air on this?
Sign In or Register to comment.