The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!


Determining value for parameter

joandcruzjoandcruz Member Posts: 10 Contributor I

Please Help Me. I am stuck.

I have a general decision tree and also CHAID and ID-3.

The parameters are

- minimal size for split
- minimal leaf size
- minimal gain
- maximal depth
- confidence

My training data is 400.
Ny features are 6707
My amount of total text is 27910

How can I determine a good value for the parameter without testruns. Testruns would take too much time due to the high enourmous amount of data.
Who has an idea for me?

Thank you!!!


  • mafern76mafern76 Member Posts: 45 Contributor II
    What do you mean by total text?

    If you are working with text and a lot of attributes and short on time you could give Naive Bayes a try.

    Also you can try pruning some of your text vectors and removing correlated attributes.
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,408  RM Data Scientist
    Hello joandcruz,

    the data is not as big as you might think. It sounds pretty reasonable to use a parameter optimization on that. You can do this either by grid or with an evolutionary approach.

    If this is text mining, i would recommend a SVM. Usually they score better and you only have one parameter to optimize for in the linear case (C).


    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.