[SOLVED] Text mining: Does pruning make sense at all?

chaosbringerchaosbringer Member Posts: 21 Contributor II
edited November 2018 in Help
i have a question (of cause):
The process document from text-operator can create fectors using the tf-idf-measure.
Further, it allows pruning the text beforehand based on e.g. the occurence of terms.
So, does it make sense at all to prune the text from frequen terms, when i want to use the tf-idf-measure?
Does pruning beforehand bias the resulting tf-idf-values?

Thank you very much,


  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Julian,

    often pruning does help, but there is no general answer. Just put the Process Documents operator into a Parameter Optimization and experiment with the parameter settings until you get good results.

    Best, Marius
  • Options
    chaosbringerchaosbringer Member Posts: 21 Contributor II
    Thank you for your answer.
    It seems to me that this is a bit fishing/dredging for data, but obviously i have to live with that. Thank you.

Sign In or Register to comment.