Using Rapidminer to mine service ticket data to product top customer questions

dcarissimidcarissimi Member Posts: 3 Contributor I
Would like to use Rapidminer to mine raw text entered in help desk service ticket data to product a list of top customer questions/issues.

Does anyone have some good process maps I can duplicate to get me started.  I'm a newbie with all the text process algorithms.

What I'm looking for is to pull out the common phrases that are related to the customer issues so we can take this data and product better training materials for subject matters we have not captured.  Unfortunately our current ticket management system is not really designed to capture good customer question data.  I'd be grateful for any assistance that could be provided.



  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Dominic,

    to understand the basic concepts of Text Processing in RapidMiner I'd suggest the Vancouver Data videos on this topic. They are linked on our video tutorials section on our website.

    To get the customer questions out of your ticketing system you have at least two, maybe three possibilities:
    1. connect RapidMiner directly to the ticket system's database and extract the data from the respective tables
    2. use the web extension to crawl the webinterface of the ticket system
    3. if the ticket system does have a web api, the best way is to use this api, again via RapidMiner's web extension

    Best regards,
  • dcarissimidcarissimi Member Posts: 3 Contributor I
    Thanks for the reply!

    Yes, I have looked at VancouverData's videos and they were very helpful.  I was able to mine out/discover specific data categories.  However I'm now looking more closer to key phrases or semantics related to the same data.  Our training team loved getting the categories pertaining to our service calls but now want the root cause issues that are associated with the categories.  I think what I need to to as well is train rapid minder on some data that I've manually sorted then let it interpret the rest of the calls and bucket tickets based on my manual review.  If someone has some good process steps to share that are good in pulling out semantic or common phase information from raw text data that would be killer.

    Thanks again....
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    As far as I understand, until now you don't have any categories/labels attached to your tickets. So reasonable steps would be:

    - transform the documents into a structured format with Process Documents (most certainly you have already done so)
    - run a clustering algorithm to detect/find groups of similar documents
    - transform the cluster attribute into a label
    - if you have k clusters, train k SVMs, where each SVM discriminates the i-th cluster from all other documents

    By inspecting the weights the SVM assigns to each term you can determine which term has which impact. The most interesting terms are those with a very high or very low weight (in other words, where the absolute weight is far away from zero).

    Of course you can perform the last step also with manually or predefined categories.

    Best regards,
  • dcarissimidcarissimi Member Posts: 3 Contributor I
    Thanks a million for the reply!  I will look at the steps you've outlined and give it a try. 

Sign In or Register to comment.