"For each XLS row, calculate similarity among the 3 text cells in that row"

dfischerdfischer Member Posts: 2 Contributor I
edited June 2019 in Help
Hi everyone,

I would appreciate if you could share any thoughts on how could I solve the problem below:

INPUT: Excel with multiple rows and 3 columns (say columns A,B and C). All excel content is text

PROBLEM: For each row, calculate similarity among the 3 text cells in that row. Then save the calculated similarities


If Sim(x,y) is the text similarity between any cells 'x' and 'y' in the Excel file, an ideal output would be another excel that follows the format below:

Sim(A1,B1) Sim(A1,C1) Sim(B1,C1)
Sim(A2,B2) Sim(A2,C2) Sim(B2,C2)
Sim(A3,B3) Sim(A3,C3) Sim(B3,C3)
Sim(A4,B4) Sim(A4,C4) Sim(B4,C4)
Sim(A5,B5) Sim(A5,C5) Sim(B5,C5)
Sim(An,Bn) Sim(An,Cn) Sim(Bn,Cn)

I've see a number of Rapidminer videos to learn this task but haven't succeeded yet.

Any ideas? Since I am still learning the basics, I would appreciate if you could tell what the entire process looks like.

Thank you in advance


  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist

    the operators you might need is Cross distances. This is calculating the similarity - but usually between documents which are given as examples. So you i think you need to use a Loop and a Transpose (or Depivot?) Operator to get a vertical example set for each round.

    If you could post an example set me or another helper might find time to build an example process.

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.