Filtering examples based on number of occurences in attribute

BaskiBaski Member Posts: 1 Contributor I
edited November 2018 in Help


For example I have examples that containts information about visits. Every visit is defined to visitor_id.  I want to filter the examples(rows) where the visitor_id occure more than 5 times. So there will be no more then 4  rows for every visitor_id. I tried filter, but that was not helpfull. 

Any idea how to do this in rapid miner ? 



  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder



    While I am pretty sure that the answer to this question will involve the operators "Aggregate", "Pivot", and "Filter Examples", I am unfortunately not sure if I fully got the problem.  Can you give us a small data sample (original data) as well as how the desired output for this sample should look like?




  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager the Filter Examples operator is not going to help you here (as you saw).  The way I see it, you need to first create an attribute that lists # of occurrences, and then you can filter for n > 5 or whatever.  Personally I would use the Aggregate operator where you group by visitor_id and aggregate by visitor_id.  Then join this with your original data set on the visitor_id attribute.



Sign In or Register to comment.