RapidMiner

Automatized pipeline for attribute and model selection

zmk
Contributor II

Automatized pipeline for attribute and model selection

Hi there,

I am new to Rapidminer, but I already tried some models on my data out and it worked perfectly.

I have the following data set:

40 examples and about 100 attributes (numbers: real), 1 label (binominal).

 

My aim is to find a good model that predicts the label using just a few attributes like eg 4.

I tried using the tutorial “Finding the right Model” (https://www.youtube.com/watch?v=uN1I4yrNNuQ) multiple models (Decision Tree, Naive Bayes, k-NN, Neural Net, Linear Regression…) using “Compare ROCs”. This worked good and inspired me for this question.

 

I want to set up a pipeline that does the following tasks:

  1. Randomly, or weight based selects attribute combinations (eg only 4 attributes: 2 attributes that I manually selected and the other two are randomly selected)
  2. Forwards them to a X-Validation function that uses multiple models on the data (Decision Tree, Naive Bayes, k-NN, Neural Net, Linear Regression...)

At the end I get a report for each tested attribute combination, used model and the performance measurements of the model (eg. accuracy). At best ordered by accuracy.

Is there such a pipeline?

Does anyone know what I have to put together to realize such a pipeline?

 

Thanks for your help.

2 REPLIES
Highlighted
Elite III

Re: Automatized pipeline for attribute and model selection

This certainly isn't anything that is already built into the software.  But what you describe is something that could be built, using loops and macros.  It's probably more than a trivial effort, and if the reports you are talking about are outputs to external software packages, that might be tricky.  But in principle everyone you have requested is something that RapidMiner can do.

 

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
zmk
Contributor II

Re: Automatized pipeline for attribute and model selection

Great. Sounds good.

So can you give me some hints how to tackle this problem?

I will be more specific. This is what I need:

A function that performs a loop: "take n(eg 4) randommly features" and "forward" them to a set of models with saving the name of the features and the result of each algorith into eg. a file (eg CSV).

Stop when there are no more features to select.

 

Or even simpler: Using just one model (eg logistic regression): take random 4 features out of 100 and test them on the model (X-Validation) and save the performance in correlation to the features taken.