Document similarity of 2 excel spreadsheets containing text
I've posted before about text mining but I think my question was too vague and didn't explain what I wanted to do very well. So I've gone away, watched some (a lot!) of tutorials and tried again.
So what I've done is read in 2 excel spreadsheets 1 containing relevant text, keywords etc and 1 containing 504 references exported from medical databases. Both spreadsheets contain title and abstract and for the 504 each reference is on a new row with the aim of comparing the 2 spreadsheets to find the most relevant references compared to the text in the 1st excel spreadsheet.
Ok so I've played around with this alot and got a few things to work (eucalidean distance, cosine similarity etc) but it's not quite doing what I wanted it to... I want it to re-order the 504 references with regards to how similar they are to the relevant text in the first excel spreadsheet. Ideally so that the most relevant references are first in the list and then the least relevant are down the bottom of the list... if that makes sense.
Also just to clarify I am no data scientist so I don't actually know what the results mean when I run cosine similarity and eucalidean distance and that. All i know is I got it to work without any errors, which at the moment is a pretty good achievement for me.
Anyway, I've gone off topic. Can anyone help with what I'm aiming to do with the ranking of the documents?? Also, I don't know if you need to see what I've done so far?
Thank you so much