Automatized pipeline for attribute and model selection

zmk · September 2017

Hi there,

I am new to Rapidminer, but I already tried some models on my data out and it worked perfectly.

I have the following data set:

40 examples and about 100 attributes (numbers: real), 1 label (binominal).

My aim is to find a good model that predicts the label using just a few attributes like eg 4.

I tried using the tutorial “Finding the right Model” (https://www.youtube.com/watch?v=uN1I4yrNNuQ) multiple models (Decision Tree, Naive Bayes, k-NN, Neural Net, Linear Regression…) using “Compare ROCs”. This worked good and inspired me for this question.

I want to set up a pipeline that does the following tasks:

Randomly, or weight based selects attribute combinations (eg only 4 attributes: 2 attributes that I manually selected and the other two are randomly selected)
Forwards them to a X-Validation function that uses multiple models on the data (Decision Tree, Naive Bayes, k-NN, Neural Net, Linear Regression...)

At the end I get a report for each tested attribute combination, used model and the performance measurements of the model (eg. accuracy). At best ordered by accuracy.

Is there such a pipeline?

Does anyone know what I have to put together to realize such a pipeline?

Thanks for your help.

Telcontar120 · September 2017

This certainly isn't anything that is already built into the software. But what you describe is something that could be built, using loops and macros. It's probably more than a trivial effort, and if the reports you are talking about are outputs to external software packages, that might be tricky. But in principle everyone you have requested is something that RapidMiner can do.

zmk · September 2017

Great. Sounds good.

So can you give me some hints how to tackle this problem?

I will be more specific. This is what I need:

A function that performs a loop: "take n(eg 4) randommly features" and "forward" them to a set of models with saving the name of the features and the result of each algorith into eg. a file (eg CSV).

Stop when there are no more features to select.

Or even simpler: Using just one model (eg logistic regression): take random 4 features out of 100 and test them on the model (X-Validation) and save the performance in correlation to the features taken.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Automatized pipeline for attribute and model selection

Answers