Conditional Replace functionality?

DWJames · February 2016

Hi all,
very fresh new user here

I'm looking to use Rapidminer to convert/translate data files provided to us by a variety of different suppliers into the format we need for our system.
Currently we're doing this manually each time using excel, but I would like to change to using stored procedures which can be run quickly to convert the data for us each time.

Could someone help me find the function I need for this specific job please?

an example:

We have some rows where the main 4 attributes that define the product are identical to other rows.
We need each row to be a unique combination of these attributes.
The reason for this is that the 'colour' attribute is not specific enough. So there are actually 4 distinct types of Black, but they all just have the colour attribute = 'Black'

So I need an operator with the following type of rule(s)

if 'colour' = 'Black'
lookup the value for that row in the 'name' column
if 'name' contains '%gloss%' then set 'colour' for that row = 'Gloss Black'
if 'name' contains '%matt%' then set 'colour' for that row = 'Matt Black'

I've had a search through the help but couldn't find an obvious answer, how do you recommend I do this?
thanks in advance,
James

mob · February 2016

Maybe you could accomplish it with a combination of the filter attribute operator and the replace operator but maybe a guru here will show how to do it with 1 operator

Why don't you just create a new attribute called ColourType that combines the attribute together for every row and then filter the result set to only process the ones you are interested in. You could use the generate attributes operator

MartinLiebig · February 2016

i think all of this can be doable with one generate attributes in RM 7.0.

but i need to think a bit on it. Maybe david is faster

JEdward · February 2016

Actually I think you can do something much deeper with a RapidMiner process.

You have two parts to the problem.
The one you mentioned is applying rules to the different product lines to ensure that they have some uniqueness across the four attributes; this can definitely be solved in various ways with a single RapidMiner operator. I personally would favour splitting it up a little so that new rules can be added and managed just by creating a new operator and dropping into the Process flow.

The second problem which you mentioned, is that you are getting products which when broken down by those 4 main attributes are not unique. (& you want them to be unique). There is a possibility, that a new product could be added named Extra Gloss with a colour Black which would create a duplicate with the record Gloss Black.
I think you should also add a part at the beginning of the dataset cleaning which highlights (& possibly attempts to resolve) any duplicates that it finds.

If I get time later I'll try to get an example process knocked up demonstrating this.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Conditional Replace functionality?

Answers