GSP Output Format

swissrussswissruss RapidMiner Certified Expert, Member Posts: 11 Contributor II
edited November 2018 in Help

I got the GSP operator running nicely with a pre-processed data from the transaction data generator. My question is: how can I use the generated patterns? For Frequent Item Sets, new attributes can be generated for each item set, flagging whether an example supports that itemset - is there an equivalent for the GSP sequential patterns or can the patterns only be output to results at present?

Thanks for any help anyone can offer!



  • Options
    StefanGSPStefanGSP Member Posts: 4 Contributor I
    Hi Russ,

    I unfortunately have no answer to your question, but I will have the same question as I get the GSP running for my data.

    This is exactly, why I am addressing you directly as it seems you already have quite some experience with the GSP, and I hope you don't mind.

    I just started working with the GSP algorithm and ran into a problem when it comes to displaying the reuslts.

    I ran a rather simple main process consisting of a retrieve-data-operator and the gsp-operator.

    Regarding the data format, I believe, from what I have seen in earlier discussions, that my data is in the right format for the GSP:

    Customer | Time_label | Sequence
    customerA, 1, TypA
    customerA, 2, TypA
    customerA, 3, TypB
    customerB, 1, TypB
    customerB, 2, TypA
    customerB, 3, TypB
    customerC, 1, TypA
    customerC, 2, TypA
    customerC, 3, TypB

    Although, the GSP-operator seems to work fine (no error messages), the result patterns seem not being correctly displayed to me. All I get is:

    0.320: <Sequence>  <Sequence>
    0.135: <Sequence>  <Sequence>  <Sequence>
    0.040: <Sequence>  <Sequence>  <Sequence>  <Sequence>
    0.012: <Sequence>  <Sequence>  <Sequence>  <Sequence>  <Sequence>

    What I have expected or should be displayed instead is e.g.:

    0.320: <TypA>  <TypA>
    0.135: <TypA>  <TypA>  <TypB>

    Why is only the column-name displayed in the patterns instead of the actual values?
    What do I have to adjust or change?

    Any help is highly appreciated. Thank you in advance.

  • Options
    swissrussswissruss RapidMiner Certified Expert, Member Posts: 11 Contributor II
    Hi Stefan,

    Very strange! Your input data looks correct to me - what are your parameter settings? Be warned that I currently have a bug posted relating to GSP (http://bugzilla.rapid-i.com/show_bug.cgi?id=936) as I can't understand the results I'm getting! But I'm sure the guys at Rapid-i will set me straight or fix the bug if it is one, so let me know your parameter values and I'll try to get you as far as I am!



    P.S. If it's ok for you, we can continue in this thread - can you link to it from yours? Thanks.
  • Options
    StefanGSPStefanGSP Member Posts: 4 Contributor I
    Hi Russ,

    the results I described above I reached with the following parameters:
    window size=11 (as my data is a sequence of 11 years)
    max gap=11
    min gap=0
    min support=0.01

    I also tested various other parameters, however, the way results have been displayed has not changed.

    Another thing that strikes me:
    meanwhile I managed to get the gsp running with weka. there, even with higher min support several hunderts of patterns (all seem reasonable) are returned. With RM I only get four patterns, which I don't understand.

    Do you have any ideas?


Sign In or Register to comment.