mine service tickets

DataFighterDataFighter Member Posts: 3 Contributor I
edited November 2018 in Help

We have an old ticket management system that has very few structured fields.

The only fields where valuable info is, are Summary, Remarks and a Memo field which contains a detailled description of ticket (problem, observable cause, failure modes, planning details, execution details as well as worker's feedback)

I'm looking for a way to spit out the main causes for these tickets as well as other types of information.

Any ideas on how I can do this using RapidMiner?


P.S.: I'm new to machine learning.  So please don't be too hard on me!  ;)


  • Options
    bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist

    Hello DataFighter,


    This a blog that talks about how to do text mining with Rapidminer.


    Its pretty detailed and should cover all aspects of text mining that will be needed for your case.

    please let us know how you progress


    Additonally there are dozens of other resources on textmining available when you search. Not seen them, but could be handy.




  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    You can try this sample process. It uses word clustering and association rules.


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.001">
    <operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.1.001" expanded="true" height="68" name="Retrieve StoredMinutesPDF (2)" width="90" x="45" y="34">
    <parameter key="repository_entry" value="../data/StoredMinutesPDF"/>
    <operator activated="true" class="set_role" compatibility="7.1.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="30">
    <parameter key="attribute_name" value="label"/>
    <list key="set_additional_roles"/>
    <operator activated="true" class="select_attributes" compatibility="7.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="30">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="include_special_attributes" value="true"/>
    <operator activated="true" class="select_attributes" compatibility="7.1.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="447" y="30">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="sentiment"/>
    <parameter key="invert_selection" value="true"/>
    <operator activated="true" class="multiply" compatibility="7.1.001" expanded="true" height="103" name="Multiply" width="90" x="514" y="210"/>
    <operator activated="true" class="transpose" compatibility="7.1.001" expanded="true" height="82" name="Transpose" width="90" x="715" y="30"/>
    <operator activated="true" class="x_means" compatibility="7.1.001" expanded="true" height="82" name="X-Means" width="90" x="849" y="30"/>
    <operator activated="true" class="select_attributes" compatibility="7.1.001" expanded="true" height="82" name="Select Attributes (3)" width="90" x="648" y="300">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value="cluster"/>
    <parameter key="attributes" value="|sentiment|cluster"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>
    <operator activated="true" class="numerical_to_binominal" compatibility="7.1.001" expanded="true" height="82" name="Numerical to Binominal" width="90" x="648" y="390"/>
    <operator activated="true" class="fp_growth" compatibility="7.1.001" expanded="true" height="82" name="FP-Growth" width="90" x="648" y="480">
    <parameter key="min_number_of_itemsets" value="10"/>
    <parameter key="max_items" value="5"/>
    <operator activated="true" class="create_association_rules" compatibility="7.1.001" expanded="true" height="82" name="Create Association Rules" width="90" x="782" y="480"/>
    <operator activated="true" class="item_sets_to_data" compatibility="7.1.001" expanded="true" height="82" name="Item Sets to Data" width="90" x="916" y="544"/>
    <connect from_op="Retrieve StoredMinutesPDF (2)" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Transpose" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Select Attributes (3)" to_port="example set input"/>
    <connect from_op="Transpose" from_port="example set output" to_op="X-Means" to_port="example set"/>
    <connect from_op="X-Means" from_port="cluster model" to_port="result 1"/>
    <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
    <connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
    <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
    <connect from_op="Create Association Rules" from_port="rules" to_port="result 2"/>
    <connect from_op="Create Association Rules" from_port="item sets" to_op="Item Sets to Data" to_port="frequent item sets"/>
    <connect from_op="Item Sets to Data" from_port="example set" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
  • Options
    DataFighterDataFighter Member Posts: 3 Contributor I

    Thanks TBone,

    What format is attribute "label".

    WIthout sample data, it's hard for me to understand what's going on and what I should be using in which operators


    Sorry, as I mentionned earlier, I'm new to machine learning and text mining

  • Options
    DataFighterDataFighter Member Posts: 3 Contributor I

    Thanks Bhupendra_patil,


    I've looked at some of the videos and I got stuck at stemming.

    Our database is in french.


    Are there any stemming operators made for french language?

    ... Nevermind, just found Snowball stemming!

  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    As mentioned start with text mining & clustering to try to get summaries of the problems grouped together in categories. 


    One thing you don't mention having is timestamps of the tickets if you do maybe you can also use association rules or clustering to see what problems seem to happen around certain times and investigate potential correlations & causes.  (for example on humid days the electronics of the computers run slowly and crash more often)


    Have a look at the website www.rapidprom.org for inspiration on what you'll be able to do when you have the tickets all cleaned up.  It might give you some nice ideas for the next step of your internal ticket management systems.

Sign In or Register to comment.