The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Conditional Replace functionality?
Hi all,
very fresh new user here
I'm looking to use Rapidminer to convert/translate data files provided to us by a variety of different suppliers into the format we need for our system.
Currently we're doing this manually each time using excel, but I would like to change to using stored procedures which can be run quickly to convert the data for us each time.
Could someone help me find the function I need for this specific job please?
an example:
We have some rows where the main 4 attributes that define the product are identical to other rows.
We need each row to be a unique combination of these attributes.
The reason for this is that the 'colour' attribute is not specific enough. So there are actually 4 distinct types of Black, but they all just have the colour attribute = 'Black'
So I need an operator with the following type of rule(s)
if 'colour' = 'Black'
lookup the value for that row in the 'name' column
if 'name' contains '%gloss%' then set 'colour' for that row = 'Gloss Black'
if 'name' contains '%matt%' then set 'colour' for that row = 'Matt Black'
I've had a search through the help but couldn't find an obvious answer, how do you recommend I do this?
thanks in advance,
James
very fresh new user here
I'm looking to use Rapidminer to convert/translate data files provided to us by a variety of different suppliers into the format we need for our system.
Currently we're doing this manually each time using excel, but I would like to change to using stored procedures which can be run quickly to convert the data for us each time.
Could someone help me find the function I need for this specific job please?
an example:
We have some rows where the main 4 attributes that define the product are identical to other rows.
We need each row to be a unique combination of these attributes.
The reason for this is that the 'colour' attribute is not specific enough. So there are actually 4 distinct types of Black, but they all just have the colour attribute = 'Black'
So I need an operator with the following type of rule(s)
if 'colour' = 'Black'
lookup the value for that row in the 'name' column
if 'name' contains '%gloss%' then set 'colour' for that row = 'Gloss Black'
if 'name' contains '%matt%' then set 'colour' for that row = 'Matt Black'
I've had a search through the help but couldn't find an obvious answer, how do you recommend I do this?
thanks in advance,
James
0
Answers
Why don't you just create a new attribute called ColourType that combines the attribute together for every row and then filter the result set to only process the ones you are interested in. You could use the generate attributes operator
but i need to think a bit on it. Maybe david is faster
Dortmund, Germany
You have two parts to the problem.
The one you mentioned is applying rules to the different product lines to ensure that they have some uniqueness across the four attributes; this can definitely be solved in various ways with a single RapidMiner operator. I personally would favour splitting it up a little so that new rules can be added and managed just by creating a new operator and dropping into the Process flow.
The second problem which you mentioned, is that you are getting products which when broken down by those 4 main attributes are not unique. (& you want them to be unique). There is a possibility, that a new product could be added named Extra Gloss with a colour Black which would create a duplicate with the record Gloss Black.
I think you should also add a part at the beginning of the dataset cleaning which highlights (& possibly attempts to resolve) any duplicates that it finds.
If I get time later I'll try to get an example process knocked up demonstrating this.