Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Generalized Sequential Patterns (GSP) dataset format
Hello,
i have seen some posts about this subject but i didn't see any good answer.
Can anyone say the format of the input dataset for GSP???
The only format that i have some results (bad ones) is like this:
Client_id, time , feature 1, feature 2, ....
1,1,0,1,0,...
1,2,1,1,1,....
2,1,0,0,0
i have seen some posts about this subject but i didn't see any good answer.
Can anyone say the format of the input dataset for GSP???
The only format that i have some results (bad ones) is like this:
Client_id, time , feature 1, feature 2, ....
1,1,0,1,0,...
1,2,1,1,1,....
2,1,0,0,0
0
Answers
this is already the correct format, you only need to turn the feature 1, feature 2, ... attributes into binominal ones. Use the Numerical To Binominal for this.
Greetings,
Sebastian
Can you post the XML of how you got your data in the format:
Client_id, time , feature 1, feature 2, ....
1,1,0,1,0,...
1,2,1,1,1,....
2,1,0,0,0
Everytime I try to pivot my data from this format:
Customer, Time, Item
1,1,a
1,1,b
1,2,a
2,1,c
etc
I fail to get your format.
Thanks,
Will
Best regards,
Marius
Thanks for the timely response, I will examine the code you provided.
Will
I actually applied your logic to my SQL and concat'd before rapid miner which speeds up processing.
The trouble I have now is, when I pivot and attempt to replace missing values, that process doesn't work.
I result in a green lighted process but still have '?' values in my pivot table.
Example of my data:
Time_Customer Item Count
1_9 a 1
2_9 b 1
3_9 c 1
3_9 d 1
3_9 e 1
3_9 f 1
3_9 e 1
3_9 b 1
4_9 c 1
4_9 b 1
1_22 c 1
1_27 c 1
1_27 a 1
1_27 g 1
2_27 c 1
2_27 h 1
2_27 g 1
3_27 c 1
My code is below:
I greatly appreciate any help you all can offer.
Will
please examine your Replace Missing Values operator. You are replacing the values of only one attribute, but in reality you probably want to replace missing values in *all* attributes, right?
Best regards,
Marius
Thank you for your help, I got it to work. The code for reference is provided below. I do have one more snag, the output of the GSP Set works in a Mac OSX install but not in Windows 7.
In the Win7, I see summary data in the results overview tab, but when moving to the GSPSet(GSP) tab, all I see are the annotations options. In the Mac OSX instance, everything appears as one would expect.
Not sure if I should submit a bug report or what.
Thanks for your help!
Will
are you using the RapidMiner 5.3.7 on both your machines?
Best regards,
Marius
Will
Best regards,
Marius
Thank you for your assistance!
Will
Another question concerning GSP. I receive the same result sets regardless of my Window, Min and Max Gap setting.
My raw data is using days between events as the time element.
Is this a function of the same bug we previously found?
Thanks,
Will
Did you inspect your data and make sure that the entered values actually would make a difference?
Best regards,
Marius
we've just fixed the "empty GSP results" bug. You can either checkout the latest SVN version (see here, updated around midnight) and build RapidMiner yourself, or wait for the next release.
Regards,
Marco
Thanks for the response, I'll check my updates!
Will
my GSP empty problem still exists till now, how can i update my Rapidminer? or do I need to wait until next official update? Could anyone tell me at what time?
Thank you!
Thanks,
Will
Best regards,
Marius
I have combined the time (in day of year format) with my customer ID per your instructions. I have a column for item and a binomial value for the "qty".
When I import the excel sheet, pivot, replace the missing values with value "false" and then split, everything looks good.
When I attempt to convert the split columns for time and customer from nominal to numerical per the GSP operator requirements, my pivot is ruined.
I expect :
Customer, time, item a, item b, ......
1,1,TRUE, FALSE
1,3,TRUE, FALSE
2,4, FALSE, FALSE
etc
however it turns time into multiple columns within the pivot as well.
I can provide a larger example data if required for trouble shooting.
Any help that can be provided is appreciated.
Will
Do we have any operator to apply GSP rules
Thanks
this is a really good question