How to dynamically select columns based on filter of rows ?

krishnakrishna Member Posts: 6 Contributor II
edited November 2018 in Help

Hi,

 

I've data like a pivot table where Countries (India, China, Japan, Singapore, etc.) on rows, Sales reps in respective countries on columns, the amount of sale they have done on the values(intersection of rows and columns). When I do it in excel as a pivot, when I filter for India(Filter Rows/Examples), both columns(sales reps in India) and rows(India) will be filtered. 

 

Attached some sample data for reference. Can I replicate the same using Rapidminer ? Can someone please help ?

 

Thanks

Krishna

 

Tagged:

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Did you check out the Aggregate and Pivot operators?

     

    With Aggregate you can group by Region and Sales Rep, then Sum or Count, etc. You can then Pivot after that. 

  • krishnakrishna Member Posts: 6 Contributor II
  • krishnakrishna Member Posts: 6 Contributor II

    Hey Thomas,

     

    Thanks for your reply. I'm  working on a dataset similar to the above, where I've 90 % categorical and 10% numerical. Hence, aggreagate doesn't work.

     

    Little brief about the data :

    The data is coming from different sensors, each sensor captures specific info. I've used the union operator to get the info in this format. Now, I've 278 attributes with 12500 records in which 240 attributes are categorical and rest numerical. Now, I'm trying to select particular sensors to see any correlation/dependency with other attributes and do exploratory analysis.

     

    Is there any way, I can get only the columns related to sensors I filter ?

     

    P.S.: PGN-ID is sensor name here.

     

    Thanks 

    Krishna

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Yes, in the Select Attribute operator you can do that by Subset or Regular Expression. If the sensor columns always start with "PGN" then you can just use regular expressions and do "PGN.*" (without the quotes). :)

     

Sign In or Register to comment.