Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Similarity between mutiple tables
Hi,
Currently, I am working on a thesis research for my university to solve an entity resolution problem. Today I have tried to integrate two tables with each other through measuring the Similarity between these tables. If the threshold is above 0,9 it is considered as useful and will it be used in the second evaluation. In the second evaluation the variables will be evaluated on weight. For example, a phone number is a better unique key, than a firstname. At the end, the customer representation need to be evaluated as followed (0.9*2)+(0,8*7) = .... if the threshold is above the 0.8 (for example) it will consider as usefull and integrate the rows. I Tried to perform the similarity (with a couple of similarity measures) measure In rapid miner, but I received extreme values ( <0 or >1).
(currently, I cannot post any screenshots, since I am new)
What do I wrong?
Cheers, Robin
Currently, I am working on a thesis research for my university to solve an entity resolution problem. Today I have tried to integrate two tables with each other through measuring the Similarity between these tables. If the threshold is above 0,9 it is considered as useful and will it be used in the second evaluation. In the second evaluation the variables will be evaluated on weight. For example, a phone number is a better unique key, than a firstname. At the end, the customer representation need to be evaluated as followed (0.9*2)+(0,8*7) = .... if the threshold is above the 0.8 (for example) it will consider as usefull and integrate the rows. I Tried to perform the similarity (with a couple of similarity measures) measure In rapid miner, but I received extreme values ( <0 or >1).
(currently, I cannot post any screenshots, since I am new)
What do I wrong?
Cheers, Robin
Tagged:
0
Answers
Best Regards,
Edwin Yaqub
Scott
Currently, I made the following process in rapid miner:
I used the same data set, 1 with the correct data and the other with manipulated data (same columns). To start with the first cross distance test I selected the "initials" attribute. Within the cross distance operate I selected "nominal measures" and "JaccardSimilarity". I received the following results::
Results:
I was expecting results such as: 0,43, 0,33 etc, see below an real example: