Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"For each XLS row, calculate similarity among the 3 text cells in that row"
Hi everyone,
I would appreciate if you could share any thoughts on how could I solve the problem below:
INPUT: Excel with multiple rows and 3 columns (say columns A,B and C). All excel content is text
PROBLEM: For each row, calculate similarity among the 3 text cells in that row. Then save the calculated similarities
Example:
If Sim(x,y) is the text similarity between any cells 'x' and 'y' in the Excel file, an ideal output would be another excel that follows the format below:
Sim(A1,B1) Sim(A1,C1) Sim(B1,C1)
Sim(A2,B2) Sim(A2,C2) Sim(B2,C2)
Sim(A3,B3) Sim(A3,C3) Sim(B3,C3)
Sim(A4,B4) Sim(A4,C4) Sim(B4,C4)
Sim(A5,B5) Sim(A5,C5) Sim(B5,C5)
...
Sim(An,Bn) Sim(An,Cn) Sim(Bn,Cn)
I've see a number of Rapidminer videos to learn this task but haven't succeeded yet.
Any ideas? Since I am still learning the basics, I would appreciate if you could tell what the entire process looks like.
Thank you in advance
I would appreciate if you could share any thoughts on how could I solve the problem below:
INPUT: Excel with multiple rows and 3 columns (say columns A,B and C). All excel content is text
PROBLEM: For each row, calculate similarity among the 3 text cells in that row. Then save the calculated similarities
Example:
If Sim(x,y) is the text similarity between any cells 'x' and 'y' in the Excel file, an ideal output would be another excel that follows the format below:
Sim(A1,B1) Sim(A1,C1) Sim(B1,C1)
Sim(A2,B2) Sim(A2,C2) Sim(B2,C2)
Sim(A3,B3) Sim(A3,C3) Sim(B3,C3)
Sim(A4,B4) Sim(A4,C4) Sim(B4,C4)
Sim(A5,B5) Sim(A5,C5) Sim(B5,C5)
...
Sim(An,Bn) Sim(An,Cn) Sim(Bn,Cn)
I've see a number of Rapidminer videos to learn this task but haven't succeeded yet.
Any ideas? Since I am still learning the basics, I would appreciate if you could tell what the entire process looks like.
Thank you in advance
Tagged:
0
Answers
the operators you might need is Cross distances. This is calculating the similarity - but usually between documents which are given as examples. So you i think you need to use a Loop and a Transpose (or Depivot?) Operator to get a vertical example set for each round.
If you could post an example set me or another helper might find time to build an example process.
cheers,
Martin
Dortmund, Germany