Compete in RapidMiner's 3rd Competition: Fantasy Football. Top prize is $750. Deadline December 19.
Download RapidMiner Studio or Server 8.0 Public Beta. Let us know how you like it! Ends November 27.
Watch RapidMiner's "Getting Started" videos on YouTube. Everything you need to do data science - fast and simple!
I am Rapidminer beginner. I have the following problem:
I have a small dataset: only 11 rows, but 102 attributes. The label is binominal: 1 or 2.
The decision tree finds only one attribute that discriminates between 1 and 2 in the 11 rows with 100% accuracy - which has a accuracy of about 51% tested on a second validation data set.
Using "Weight by correlation" and by manual visual comaprison of the graphs I was able to find about 6 attributes that discriminate very good between 1 and 2.
Now I want to generade a model out of the top 6 weighted attributes and test it on a unlabled data set.
How do I do this?
have a look at the last 4 videos of our getting started: https://rapidminer.com/training/videos/
that should explain it.
Actually I did and constructed the training processes step by step (I really enjoyed the videos). Then I replaced the training data with my own data. Because the decision tree results in only one attribute that can discriminate between my two label values it performed really bad with the validation dataset, not known to the algorithm before.
So I used "select by weights" to visualize the data and realized that the decision tree took only the one top attribute with the highest weight value. But instead the top six are great.
So now I want to build a model forced using all six attributes and test it on my validation data set.
Something like "3 out of six must be altered to predict label"
I guess the tree model is too complex or tight if it uses just one value. This seems to result in overfitting. I am looking for a way to increase the generalization performance.
The fundamental problems are not enough examples and too many attributes. DT is going to be suceptible to overfitting the training in this circumstance. You would be better off to do a combination of dimensionality reduction / feature engineering to reduce the number of attributes, and simultanously see if you can acquire more data (examples) for model building. Otherwise I think you are going to have to use a more judgmental model building strategy.