RAPIDMINER 9.7 BETA ANNOUNCEMENT

The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!

CLICK HERE TO DOWNLOAD

Determining value for parameter

joandcruzjoandcruz Member Posts: 10 Contributor I

Please Help Me. I am stuck.

I have a general decision tree and also CHAID and ID-3.

The parameters are

- minimal size for split
- minimal leaf size
- minimal gain
- maximal depth
- confidence

My training data is 400.
Ny features are 6707
My amount of total text is 27910

How can I determine a good value for the parameter without testruns. Testruns would take too much time due to the high enourmous amount of data.
Who has an idea for me?

Thank you!!!

Answers

  • mafern76mafern76 Member Posts: 45 Contributor II
    What do you mean by total text?

    If you are working with text and a lot of attributes and short on time you could give Naive Bayes a try.

    Also you can try pruning some of your text vectors and removing correlated attributes.
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,408  RM Data Scientist
    Hello joandcruz,

    the data is not as big as you might think. It sounds pretty reasonable to use a parameter optimization on that. You can do this either by grid or with an evolutionary approach.

    If this is text mining, i would recommend a SVM. Usually they score better and you only have one parameter to optimize for in the linear case (C).

    Cheers,

    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.