RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

Replace (dictionary)

abevenseeabevensee Member Posts: 4 Contributor I
edited November 2019 in Help
I have a data set that has 30+ attributes. Each data row has numerical codes in each column that correlate to a classification. For example; Gender is an attribute with codes 1-3 meaning Male, Female, and not provided, respectively. There are similar code structures for ethnicity, race, etc. I have set up a dictionary for each one of these attributes so that my model can reference to the specific dictionary and convert the codes to meaningful data. I have 2 questions:

1):  Codes mean something different for each attribute conversion I'm performing so I set up separate dictionaries for each. For instance 1 means male for gender but it also means white for race and single for marital status. Is there a way to use the loop operator to have RM run all 30+ conversions using the different dictionaries or do I need to have 30 separate "Replace (Dictionary)" operators in my process?

2):  In some dictionaries there are layered codes, for instance in my use case
1   = Latino/Hispanic
4   = N/A
14 = Other Hispanic or Latino

Instead of returning "Other Hispanic or Latino" for codes that equal 14, the operator is returning "Latino/HispanicN/A". I've seen that the regular expression option could prevent this however since I have the operator set up to run on a subset (the various ethnicity related attributes) and I do not want it applied to the whole population, I'm not sure that'd work. How can I go about fixing this?

Best Answer

Sign In or Register to comment.