🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS DEADLINE IS NOVEMBER 15   🦉 🎤

CLICK HERE TO GO TO ENTRY FORM

Replace (dictionary)

abevenseeabevensee Member Posts: 3 Newbie
edited November 5 in Help
I have a data set that has 30+ attributes. Each data row has numerical codes in each column that correlate to a classification. For example; Gender is an attribute with codes 1-3 meaning Male, Female, and not provided, respectively. There are similar code structures for ethnicity, race, etc. I have set up a dictionary for each one of these attributes so that my model can reference to the specific dictionary and convert the codes to meaningful data. I have 2 questions:

1):  Codes mean something different for each attribute conversion I'm performing so I set up separate dictionaries for each. For instance 1 means male for gender but it also means white for race and single for marital status. Is there a way to use the loop operator to have RM run all 30+ conversions using the different dictionaries or do I need to have 30 separate "Replace (Dictionary)" operators in my process?

2):  In some dictionaries there are layered codes, for instance in my use case
1   = Latino/Hispanic
4   = N/A
14 = Other Hispanic or Latino

Instead of returning "Other Hispanic or Latino" for codes that equal 14, the operator is returning "Latino/HispanicN/A". I've seen that the regular expression option could prevent this however since I have the operator set up to run on a subset (the various ethnicity related attributes) and I do not want it applied to the whole population, I'm not sure that'd work. How can I go about fixing this?

Answers

  • kaymankayman Member Posts: 413   Unicorn
    That's because it is first replacing 1 and then 4 rather than using 14
    Changing the order (so putting 14 before 1 and 4) may fix it already. When using regex you can force to have a full match, but the standard one replaces basically anything that matches, so 14 is considered as 1 and 4. 
    Tghadiallysgenzer
Sign In or Register to comment.