Newbie Needs Direction

dhcdhc Member Posts: 7 Contributor I
edited November 2018 in Help

New to this. Within my data set there are subsets of rows defined by a unique ID. Each ID represents an independent event. How do I set up a scenario that first treats each ID independently and then apply models accross the events?  

Tagged:

Best Answer

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted

    Correct, but you can easily create that using the "Generate ID" operator first, which will assign a unique id to every row, and then run the Pivot operator after that.  And your problem should be solved!

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Hello @dhc and welcome!  It would probably help if you post a small example of your data to make sure we are interpreting your explanation properly and to understand the specific structure of your data.  In general, it sounds like what you want to do is either pivot the data (use the "Pivot" operator) so you take multiple sub-events and put them together into a single row based on the unique id for the event and keep all the detailed data associated with each sub-event in separate variables.  Or if they are all numeric attributes and you want to take only certain formulations such as the sum or average or count, then you can do that via the "Aggregate" operator.  Either way, you will end up with a dataset that has only as many rows as you have unique event ids, and at that point you should be able to apply standard modeling techniques.  Don't forget for supervised learning that you'll need to define your label (outcome) variable using the "Set Role" operator as well.

     

    @stevefarr you may want to move this to the product help section rather than community news.

     

    Best,

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • dhcdhc Member Posts: 7 Contributor I

    Yes I agree i should have started in other topic - how do I move?  

     

    Brian - thanks. Here is screen shot (doesn't show the label attribute.)

    Im mining horse racing data…. Each value in column A represents a race, so the remaining attributes are relevant in the context of that race only. 

     

     

    .xx.jpg

     

     

  • dhcdhc Member Posts: 7 Contributor I

    I just explored the Pivot operator - looks like I need a uniqed identifier within each group - correct?

  • stevefarrstevefarr Member Posts: 93 Maven

    Thanks @Telcontar

     

    And may I add my wlecome here too @dhc

  • dhcdhc Member Posts: 7 Contributor I

    The results were unwieldy. The ID's need to be seeded with 1 for each "primary key", I added an attribute that serves the purpose.  Not sure pivot is way to go.  Anyway - thanks for help. I'll keep trying

Sign In or Register to comment.