Auto Categorization of documents

Contributor II

Re: Auto Categorization of documents

any updates ?

Elite III

Re: Auto Categorization of documents

Hi @sangeet,


@Thomas_Ott and I have already provided some direction about how we'd approach the problem.  Based on your described use case, we recommended the following:

  1. Process your text documents and create a word vector and wordlist
  2. Cluster your documents (Tom provided a sample process to cover these 2 steps)
  3. Once you have those clusters defined, assign them as labels (use the "Set Role" operator for that)
  4. Then use that same dataset to create a supervised learning model to predict the clusters (as I noted earlier, there are plenty of tutorials available for this step)
  5. You can then store that model and apply that model to any new documents (you'll need to do the same set of text processing and use the same wordlist as well)


So I'm not sure what else you are expecting at this point. Did you have a more specific question, or a problem that you ran into when you tried to complete the steps above?  Please remember that this is a free user community forum.  If you are interested in a more detailed consulting project, you can feel free to PM me.  



Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts