image

🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Calculate TFIDF

barthosbarthos Member Posts: 20  Maven
edited November 2018 in Help
Hello,
I would like to calculate the TF-IDF of words in different documents, in order to determine the words that are the most significant for each document.
I use the create document block for each new document and I add an attribute (a name) to each document. I then use a "documents to data" operator to generate an example set. Then I use a "process document from data" operator to compute the TF-IDF (which I selected on the parameter board).
The problem is that I don't get TF-IDF but only the number of occurences of the words and the number of documents in which they appear. Moreover, I don't see anymore the label of the document, so I am not able to distinguish the different documents.
Can somebody help me?
Thanks a lot,
Barthélémy

Answers

  • colocolo Member Posts: 236  Guru
    Hi Barthélémy,

    it sounds like you are only looking at the wordlist output (where the word occurences are shown). But also take a look at the example set output of the "Process Documents" operator. There you will see TF-IDF values and also the document's label.

    Instead of chaining "Documents to Data" and "Process Documents from Data" you can use the single operator "Process Documents" instead.

    Best regards
    Matthias
  • barthosbarthos Member Posts: 20  Maven
    Thanks Matthias !
    Barth
Sign In or Register to comment.