How to convert numerical values in result file back to original nominal values of input

Hung_Bui_221Hung_Bui_221 Member Posts: 5 Learner I
edited November 14 in Help
Hello everyone! I am just a beginner whom have just started to study RM for a few months. I am having a group problem to detect the outliers of Bank Marketing Dataset. This is my process (image below).

The dataset has more than 40.000 examples and Outlier Detection Operator seems too slow for both Nominal and Numerical values so I decided to change all of Nominal values into Numerical.

After running this process, I obtained result file and I would like to convert all of the Numerical values that I changed before back to Original Nominal values like the input file. Manual converting is absolutely the last choice but I wonder if I can do it as fast as possible by using the operators of RM or something else.

Please help me to find out the best way for this case asap  :# Thank you very much.

Best Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 860 Unicorn
    Solution Accepted
    Hi!

    Do you have an ID in your data? If not, you can also use the Generate ID operator to get one. Then you use Join to get back the original data and add the generated outlier score to that.

    By the way, Local Outlier Factor is a nearest neighbor-based method, so it works best with normalized input. Use the Normalize operator before applying it, you should get better results with that. The join-based method for getting the original data is applicable there, too.

    Regards,
    Balázs
    Hung_Bui_221
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 860 Unicorn
    Solution Accepted
    Hi!

    Normalizing changes all numeric attributes to be roughly between 0 and 1 (or -1 and 1), depending on the method.

    Nearest-neighbors methods compare values of different attributes with each other. This means that an attribute with large numerical values (e. g. money amounts) would dominate all the other attributes (age in years, 0/1 in nominal to numerical transformation etc.) and determine the neighborhood alone. Normalizing avoids this and gives all attributes a better chance to determine the distance calculations.

    Regards,
    Balázs
    Hung_Bui_221

Answers

  • Hung_Bui_221Hung_Bui_221 Member Posts: 5 Learner I
    Thank you so much for replying me. Your answer is really helpful for me. Can I ask you one more question?

    After I used Normalize Operator for all attributes, the datatype and the values was changed. Such as Age, first this attribute contained the age of customers (40, 50, 60 years old...), but then the datatype and the values was changed into real (attached image).

    I wonder if this affects the result.  :# Please tell me more. Thank you again.

  • Hung_Bui_221Hung_Bui_221 Member Posts: 5 Learner I
    Thank you so much, Mr.Balázs.  o:) Your answer is really great. 
Sign In or Register to comment.