Extracting data and learning

Legacy UserLegacy User Member Posts: 0 Newbie
edited November 2018 in Help

What we got are mostly text which consist out of customer opinions. For example a customer writes "...bla bla and I had a problem with the credit card thing...". This information is stored in different databases - for example excel and sorted in parts like "problems with paying", "problems with webpage".
So what we wanted to do is getting more information out of the user opinions. For that we can read all of them or maybe we can data mine them.
But how? How can I tell a software, that all "problem with payment" should also be sorted by "all problem with payment" that have the word "credi" inside - to get all credit card problems, and also sorted by all problems with the word "paypal" inside to get all paypal problems.

Do you know what I mean? By seeing that I have 500 problems with payment I cannot be sure if this is 499 credit card problems and 1 paypal or 499 paypal and 1 credit card. I have to read them all.
In my opinion one way could be to tell the software to sort by "credit" + "credit card" + "visa" + "american express" to maybe get all problems regarding a credit card.

I have a lot of information (8.000 a month) but I cannot read them all. I have to sort them, data mine them, whatever!
Any good idea please? I was able to get the excel thing into rapid miner. But then I am stuck. What do I have to do? Or is rapidminer the wrong tool for something like that?

Kind regards
Michael E.


  • Options
    TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi Michael,

    well no, RapidMiner is definetely not the wrong tool for that, on the contrary. RapidMiner is actually quite a capable solution for the problem you are facing. Assuming you are not spontaneously able to visit our text mining traning course on Monday/Tuesday next week, I will try to sketch what you will have to do:

    First of all you have to download and install the text plugin for RapidMiner. This will allow you to extract information and analyse unstructured data. Having loaded your excel data you have to transform it into what we call word vectors which basically are statistics of word occurances. This can be done e.g. by using the [tt]StringTextInput[/tt] of the text plugin. Once you have accomplished that, you should consider to cluster the texts on the basis of the word vectors. This can be done by an appropriate clustering algorithm in RM. The cluster algorithm determines groups of texts, that appear to be similar in the groups, but different among the groups. Hence, in your example, texts containing complaints about the payment might be put into one cluster.

    Well, that is how it works basically. I know that you probably might not be able to set up such a process by just sticking to the description I just gave you. Nevertheless, I wanted to point out, that in general it is possible and RM is absolutely the right choice for performing such an analysis and it is not even that complecated to establish. If you want to start for yourself, just download the text plugin and have a look at the example processes shipped with it (and the online tutorial shipped with RM for the general part of data mining with RM). If you would like to have help, we can offer you training courses for text mining with RM, data mining with RM in general, etc. as well as consulting services. I am quite sure, that this would be the quickest way to obtain experience in the dealing with analyses you want to perform or to even obtain a working solution to your problem.

    The following link considering our training courses and our consulting services, respectively, might be interesting in that context (the sites are in german):



    If you would like to have more information on our services, please let me know and write to malbrecht@rapid-i.com or sales@rapid-i.com. If you first want to try yourself, the forum is of course the right (and free ;)) choice for asking questions if you have specific problems.

  • Options
    Legacy UserLegacy User Member Posts: 0 Newbie
    Wow, thanks. Fast and looks more than I expected! :-)
    But I need some time to read and try it out.

    As you are from Germany (isn't it?) you might know Quelle.de ;-) and we got a lot of customer informations like phone calls, chats, forum etc. that contain a lot of good information about where the problems are and what the customers want.
    So far they are just sorted in something around 10 categories and inside of different databases. And not every info has a category because the support team was not able to set a category because of a lack of time or information.

    Now I do the MacGyver job and try to find a good and cheap :-( way to get the information out of this data.
    But the information is very unstructured as the customers use their own words and phrases.

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    you might be interested in the work we had done for mobilkom austria in the field of mail classification. This was a large-scale automatic e-mail routing problem which probably has a lot in common to your problem:


    After a short project which kept total costs really low, the system was ready for production. If you are interested, you can of course contact us at sales@rapid-i.com in order to discuss details or ideas.

  • Options
    Legacy UserLegacy User Member Posts: 0 Newbie
    Please understand that so far I am on my own. I would like to implemented such thing like data mining, but for that I have to prove if this is necessary and if this works. Strange thing, but I just want to take a look around for a few more days, try to get some other guys interested and then start to get serious.

    So far I managed to get a word list which also counts the words. That is a nice thing. Right now I know how often for example the word "problem" appears. So I am on the right way.
    The next step is to see if data set no. 1 includes the word "problem" + "payment". if yes, the maybe sort it or mark it as a "payment problem"-thing. And so on.
    Looks like this is no "rocket science" and I am able to solve this as a "dumb managment guy" ;-)
    But it needs some time, some googleing etc.!

    I'll keep you posted!
Sign In or Register to comment.