Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Join step in SPM for post-processing

jpbeierjpbeier Member Posts: 1 Learner I
Hi Folks,

I recently ran GSP to identify frequent sequential patterns in a dataset where I would like to run some post-processing on the results.

I have a strong suspicion that some of my resulting frequent sequences are nested within parent sequences (i.e., sequence <a, b, c> and sequence <b, c, d> are actually a part of the same sequence <a, b, c, d>.)

How can I either: 1) visually inspect the resulting patterns to identify which rows of my dataset were included as a part of each frequent sequence or 2) run a post-processing step that joins daughter sequences such that only the parent sequence remains as a part of the results.

In other words, how do I:

1) Print the results of the GSP analysis such that I can review the rows from my dataset that were identified as a part of each frequent sequence therefore allowing me to anectdotally identify and eliminate subsequences that are a part of parent sequences.

2) Run a post-processing step that joins the daughter sequences before running the same process that I had previously written to identify whether the sequences meet the appropriate criterion (support, etc.,). This is following advice by Perrera and Colleagues, (2008) <Apparently I am too novice to link the article (or even leave the URL). Therefore, the title of the article is: "Clustering and Sequential Pattern Mining of Online Collaborative Learning Data" in IEEE>. The intention of this join step is to eliminate subsequences from the results so that only parent sequences remain. To quote Perrera et al., "A sequence s1 joins with s2 if the subsequence obtained by dropping the first item of s1 is the same as the subsequence obtained by dropping the last item of s2. For example, <a; b; c; d> is a 4-sequence candidate of the 3-sequences <a; b; c> and <b; c; d>." (p 766).

Any advice/guidance is appreciated. If RapidMiner is not an appropriate tool for such an analysis, I am happy to receive direction with advise using R or another tool.

Furthermore, if this question has already been answered elsewhere, please accept my apologies by linking the appropriate page.

Thank you!
Joel



Sign In or Register to comment.