Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
How to build a prediction model by weight
Hi there,
I am Rapidminer beginner. I have the following problem:
I have a small dataset: only 11 rows, but 102 attributes. The label is binominal: 1 or 2.
The decision tree finds only one attribute that discriminates between 1 and 2 in the 11 rows with 100% accuracy - which has a accuracy of about 51% tested on a second validation data set.
Using "Weight by correlation" and by manual visual comaprison of the graphs I was able to find about 6 attributes that discriminate very good between 1 and 2.
Now I want to generade a model out of the top 6 weighted attributes and test it on a unlabled data set.
How do I do this?
Thanks,
ZMK
here is my process so far
Tagged:
0
Answers
Hi,
have a look at the last 4 videos of our getting started: https://rapidminer.com/training/videos/
that should explain it.
Cheers,
Martin
Dortmund, Germany
Actually I did and constructed the training processes step by step (I really enjoyed the videos). Then I replaced the training data with my own data. Because the decision tree results in only one attribute that can discriminate between my two label values it performed really bad with the validation dataset, not known to the algorithm before.
So I used "select by weights" to visualize the data and realized that the decision tree took only the one top attribute with the highest weight value. But instead the top six are great.
So now I want to build a model forced using all six attributes and test it on my validation data set.
Something like "3 out of six must be altered to predict label"
P.s.
I guess the tree model is too complex or tight if it uses just one value. This seems to result in overfitting. I am looking for a way to increase the generalization performance.
The fundamental problems are not enough examples and too many attributes. DT is going to be suceptible to overfitting the training in this circumstance. You would be better off to do a combination of dimensionality reduction / feature engineering to reduce the number of attributes, and simultanously see if you can acquire more data (examples) for model building. Otherwise I think you are going to have to use a more judgmental model building strategy.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts