Calculate TFIDF

barthos · April 2011

Hello,
I would like to calculate the TF-IDF of words in different documents, in order to determine the words that are the most significant for each document.
I use the create document block for each new document and I add an attribute (a name) to each document. I then use a "documents to data" operator to generate an example set. Then I use a "process document from data" operator to compute the TF-IDF (which I selected on the parameter board).
The problem is that I don't get TF-IDF but only the number of occurences of the words and the number of documents in which they appear. Moreover, I don't see anymore the label of the document, so I am not able to distinguish the different documents.
Can somebody help me?
Thanks a lot,
Barthélémy

colo · April 2011

Hi Barthélémy,

it sounds like you are only looking at the wordlist output (where the word occurences are shown). But also take a look at the example set output of the "Process Documents" operator. There you will see TF-IDF values and also the document's label.

Instead of chaining "Documents to Data" and "Process Documents from Data" you can use the single operator "Process Documents" instead.

Best regards
Matthias

barthos · May 2011

Thanks Matthias !
Barth

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Calculate TFIDF

Answers