The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Conditional Replace functionality?

DWJamesDWJames Member Posts: 1 Learner III
edited November 2018 in Help
Hi all,
very fresh new user here :)
I'm looking to use Rapidminer to convert/translate data files  provided to us by a variety of different suppliers into the format we need for our system.
Currently we're doing this manually each time using excel, but I would like to change to using stored procedures which can be run quickly to convert the data for us each time.

Could someone help me find the function I need for this specific job please?

an example:

We have some rows where the main 4 attributes that define the product are identical to other rows.
We need each row to be a unique combination of these attributes.
The reason for this is that the 'colour' attribute is not specific enough. So there are actually 4 distinct types of Black, but they all just have the colour attribute = 'Black' 

So I need an operator with the following type of rule(s)

if 'colour' = 'Black'
lookup the value for that row in the 'name' column
if 'name' contains '%gloss%' then set 'colour' for that row = 'Gloss Black'
if 'name' contains '%matt%' then set 'colour' for that row = 'Matt Black'

I've had a search through the help but couldn't find an obvious answer, how do you recommend I do this?
thanks in advance,
James

Answers

  • mobmob Member Posts: 37 Contributor II
    Maybe you could accomplish it with a combination of the filter attribute operator and the replace operator but maybe a guru here will show how to do it with 1 operator

    Why don't you just create a new attribute called ColourType that combines the attribute together for every row and then filter the result set to only process the ones you are interested in. You could use the generate attributes operator
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    i think all of this can be doable with one generate attributes in RM 7.0.

    but i need to think a bit on it. Maybe david is faster :D
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Actually I think you can do something much deeper with a RapidMiner process. 

    You have two parts to the problem. 
    The one you mentioned is applying rules to the different product lines to ensure that they have some uniqueness across the four attributes; this can definitely be solved in various ways with a single RapidMiner operator.  I personally would favour splitting it up a little so that new rules can be added and managed just by creating a new operator and dropping into the Process flow. 

    The second problem which you mentioned, is that you are getting products which when broken down by those 4 main attributes are not unique.  (& you want them to be unique).  There is a possibility, that a new product could be added named Extra Gloss with a colour Black which would create a duplicate with the record Gloss Black. 
    I think you should also add a part at the beginning of the dataset cleaning which highlights (& possibly attempts to resolve) any duplicates that it finds. 

    If I get time later I'll try to get an example process knocked up demonstrating this. 
Sign In or Register to comment.