Sequential Supervised Learning

nurmannurman Member Posts: 8 Contributor II
hi all,

I need help on how to model this particular problem in RapidMiner. Here's sample data I'm trying to model in RapidMiner:

id sequence rank
1 1020110201 40
2 0010120100 34
3 2110100110 18
4 0120010110 -13
5 0101010020 -98
6 0101210010 -21

As you can see the sequence consists of 10 digits with '0', '1', and '2' items but a sequence is associated with a single rank value (either positive or negative) e.g. 1020110201->40. As for the example above, the intention is to classify all past sequences with their corresponds ranks. So for example, given a new sequence 0101211010 the classifier should be able to predict the rank.

What is the best way to model in this rapidminer? Right now (following the neural trend tutorial) I assigned rank as the label and i used 10 different attributes to capture a single sequence but i'm not sure if this is the correct way as in my case, the sequence string exhibits significant sequential correlation.

Your help is very much appreciated.

regards,
nurman

Answers

  • nurmannurman Member Posts: 8 Contributor II
    Anyone?
  • haddockhaddock Member Posts: 849  Guru
    Hi Nurman,

    As your premise sequence can have a very large number of permutations, each representing a signed integer label, it would help if you knew whether you could at least get the sign right. So just keep it simple, 10 nominal attributes, binary label plus/minus. Also, there is a whole heap of supporting videos and tutorials to tell you how to optimise.

    In my own work I am constantly surprised at my own bias, and always go through a brutal self beat-up when reality checks in. I dread to think how many years have been spent on roads that lead nowhere, but that's another story..

  • SharTeelSharTeel Member Posts: 5 Contributor II
    The fact that the user is posting on a forum indicates he has not yet stumbled upon elements from the 'heap'. I also have that problem, moreover, I am struggling on how to import my variable length time series in the allowed CSV format without all learners stubbornly classifying only per row, despite timestamps and ids.
    So, a concrete proposal would be most welcome, Hero. We are just noobs...
    Regards,
    ST
  • haddockhaddock Member Posts: 849  Guru
    Hi there,

    Looking at your posts I think you'll find the work of Kadous and Sammut on sign language recognition rather interesting  ;D Rapidminer is a sort of propositional Lego; once you know what you want you clip the bits together, but 'knowing what you want' is easier said than done! Here's the link to some really good work...

    http://www.springerlink.com/content/wp2506r752qv1623/
Sign In or Register to comment.