Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Similarity between non identical Numbers
Hello
I have a problem with the data to similarity block when I feed it with text column that has numbers like phone numbers it only gives similarity of 100% between identical numbers but other than that all similarity values are 0
Any Ideas how I can make it detect the similarity for example between "7788" and "7722"
Thanks and best regards
I have a problem with the data to similarity block when I feed it with text column that has numbers like phone numbers it only gives similarity of 100% between identical numbers but other than that all similarity values are 0
Any Ideas how I can make it detect the similarity for example between "7788" and "7722"
Thanks and best regards
1
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornYou need to set the distance metric according to the way you want to measure similarity between nominal values, which is not necessarily intuitive. If your data has both numerical and nominal data and you are using the default "mixed Euclidean distance" parameter, then nominal values that are the same have a distance of zero but all other values have a distance of 1. If you filter your dataset to look only at nominal attributes and then switch your measure type to "nominal" then you will get several other options for measuring nominal distances, which you can look up on Wikipedia to understand how they work exactly (but they will generally provide values other than simple 1/0 match logic).7