How to build a prediction model by weight

Contributor II

How to build a prediction model by weight

[ Edited ]

Hi there,

I am Rapidminer beginner. I have the following problem:


I have a small dataset: only 11 rows, but 102 attributes. The label is binominal: 1 or 2.

The decision tree finds only one attribute that discriminates between 1 and 2 in the 11 rows with 100% accuracy - which has a accuracy of about 51% tested on a  second validation data set.


Using "Weight by correlation" and by manual visual comaprison of the graphs I was able to find about 6 attributes that discriminate very good between 1 and 2.


Now I want to generade a model out of the top 6 weighted attributes and test it on a unlabled data set.

How do I do this?






here is my process so farhere is my process so far

See more topics labeled with:


Re: How to build a prediction model by weight



have a look at the last 4 videos of our getting started: https://rapidminer.com/training/videos/

that should explain it.




Head of Data Science Services at RapidMiner
Contributor II

Re: How to build a prediction model by weight

[ Edited ]

Actually I did and constructed the training processes step by step (I really enjoyed the videos). Then I replaced the training data with my own data. Because the decision tree results in only one attribute that can discriminate between my two label values it performed really bad with the validation dataset, not known to the algorithm before.

So I used "select by weights" to visualize the data and realized that the decision  tree took only the one top attribute with the highest weight value. But instead the top six are great.

So now I want to build a model forced using all six attributes and test it on my validation data set.

Something like "3 out of six must be altered to predict label"



I guess the tree model is too complex or tight if it uses just one value. This seems to result in overfitting. I am looking for a way to increase the generalization performance.

Elite III

Re: How to build a prediction model by weight

The fundamental problems are not enough examples and too many attributes.  DT is going to be suceptible to overfitting the training in this circumstance.  You would be better off to do a combination of dimensionality reduction / feature engineering to reduce the number of attributes, and simultanously see if you can acquire more data (examples) for model building.  Otherwise I think you are going to have to use a more judgmental model building strategy.


Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts