[Solved]Sort by Pareto Ranking doesn't work

aryan_hosseinzaaryan_hosseinza Member Posts: 74 Contributor II
edited November 2018 in Help
Hi everybody ,

I have a dataset which we can say it's large , I want to do a ranking based on two attributes in an ordered fashion (first by id then by date), so I guess Pareto must be my choice ,

but when I sort it with SORT operator it works in a reasonable time but when I do it with the Pareto operator (even in the case when there's only 1 attribute : ID/DATE) it works really slow , the same when I define two attributes in the pareto ,

is there a problem with this operator in rapidminer ?

Thanks
Arian

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Arian,

    just use to Sort operators in a row. In your case, the first one should sort by date, the second by id.

    Best regards,
    Marius
  • aryan_hosseinzaaryan_hosseinza Member Posts: 74 Contributor II
    Hi ,

    But I want to do a nested sorted (first round by ID and second one by DATE , sort of group by in SQL) , but I guess if I do two consecutive sorts it won't result what I want , right ?
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Just try it, and come back to tell me if you like what you see ;)

    Seriously, an explanation of why it will work: the Sort operator in RapidMiner keeps the order of attributes in a "group". Thus first sorting by date, then by id will result in a dataset sorted by id, where rows with identical id are sorted by date.

    Best regards,
    Marius
  • aryan_hosseinzaaryan_hosseinza Member Posts: 74 Contributor II
    Yes , It worked ! So why is it necessary to have Sort By Pareto Ranking then ?

    Thanks ,
    Arian
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi, Pareto Ranking actually is a completely different concept which has nothing to do with your problem. It can be used for multi-objective optimization problems. The closer all selected attributes are to the optimum, the lower the pareto rank. (Keyword pareto front). In RapidMiner, the optimum is positive infinity.

    In [url=http://www.daylight.com/meetings/emug01/Gillet/img13.htm]this (click here)[url] image the blue points have the same pareto rank, since they have the same distance to the optimum.

    Best regards,
    Marius
  • aliasgarscoolaliasgarscool Member Posts: 2 Contributor I

    Hi, I've similar problem only difference is that I want nested sorting for 3 attribute eg. ID wise (assending), date wise (assending) and amount wise (assending). So when I use sort 3 times in the following order amount wise, date wise and ID wise - resluts is not nested a nested sorting insted it only give nested sorting of two level i.e., ID wise and Date wise - Please guide - Thanks in advance

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Hi!

     

    Sorting by multiple attributes is not implemented in the Sort operator in RapidMiner.

     

    A workaround is to create a new attribute (e. g. "sortkey") from the attributes you're trying to sort on and then use that in the sort operation. 

     

    I frequently use Generate Attributes for this, converting numerical and date attributes using a fixed pattern (e. g. 000000 or yyyy-MM-dd HH:mm:ss) and concatenating them with an additional separator. Your sort attribute could then have values like this: 000001/2016-11-07 14:30:51/113.30. 

     

    Regards,

     

    Balázs

Sign In or Register to comment.