PrenticePrentice Member Posts: 66 Maven

I have manually calculated a TF-IDF of a very simple case:

I have 2 documents:
1 The small cat is black
2 We see a black dog and a black cat

My word list contains:
cat      dog       black      small

My calculations give a TF-IDF of 
d1: [-0.707   0   -0.707   0]
d2: [-0.447   0   -0.894   0]

I've made and excel file with these simplified sentences:
small cat black
black dog black cat

I made a small process:
Read excel -> Nominal to text -> Process documents from data (inside Tokenize)
With the Process documents from data set to TF-IDF. When I run this it gives this result:
d1: [0     0     0     1]
d2: [0     1     0     0]

I'm pretty sure that my calculations are right. But I also don't understand the result from RapidMiner, how is it that small and dog have value 1 for document 1 and 2 respectively. There is something not right here and I do not know what.


Best Answer


  • PrenticePrentice Member Posts: 66 Maven
    Oh yes, this helped. I must have looked over it, thanks!
Sign In or Register to comment.