Options

Determining value for parameter

joandcruzjoandcruz Member Posts: 10 Contributor II

Please Help Me. I am stuck.

I have a general decision tree and also CHAID and ID-3.

The parameters are

- minimal size for split
- minimal leaf size
- minimal gain
- maximal depth
- confidence

My training data is 400.
Ny features are 6707
My amount of total text is 27910

How can I determine a good value for the parameter without testruns. Testruns would take too much time due to the high enourmous amount of data.
Who has an idea for me?

Thank you!!!

Answers

  • Options
    mafern76mafern76 Member Posts: 45 Contributor II
    What do you mean by total text?

    If you are working with text and a lot of attributes and short on time you could give Naive Bayes a try.

    Also you can try pruning some of your text vectors and removing correlated attributes.
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    Hello joandcruz,

    the data is not as big as you might think. It sounds pretty reasonable to use a parameter optimization on that. You can do this either by grid or with an evolutionary approach.

    If this is text mining, i would recommend a SVM. Usually they score better and you only have one parameter to optimize for in the linear case (C).

    Cheers,

    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.