Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"How to access train and test instances in each fold for a N-fold cross validatin"

kashif_khankashif_khan Member Posts: 19 Contributor II
edited June 2019 in Help
Hi Folks,

I am working on a data mining problem in RapidMiner where i have to access instances in each fold for a N-fold cross validation with a classifiers. I can access the instances in "Test" subprocess of Validation operator as it gives me an instance of "ExampleSet" but cannot access the same for "Training" subprocess which yields an instances of "DistributionModel". I am trying to iterate over them in my code. How can i get the instances in test and train split for each fold separately ? How can i cast DistrubutionModel to an ExampleSet ?

I really appreciate your help ...

Answers

  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering
    Hi,

    1) When you open the X-Validation operator in your process in RapidMiner Studio GUI, you see a "Training" subprocess on the left and a "Testing" subprocess on the right side. Notice the ports on the top right side of each subprocess. If you want to access data from them in your code, they need to be connected. So if you want to access the training data, you will have to pipe it to the "thr" port.
    Another option would be to access the input ports on the left instead of the output ports on the right. That way you can access whatever comes into each subprocess.

    2) You cannot cast DistributionModel to an ExampleSet. An ExampleSet is your actual data (think database table) and the DistributionModel is a model which is used to generate predictions based on your actual data. They are completely different things.

    Regards,
    Marco
Sign In or Register to comment.