Options

Calculate TFIDF

barthosbarthos Member Posts: 20 Contributor II
edited November 2018 in Help
Hello,
I would like to calculate the TF-IDF of words in different documents, in order to determine the words that are the most significant for each document.
I use the create document block for each new document and I add an attribute (a name) to each document. I then use a "documents to data" operator to generate an example set. Then I use a "process document from data" operator to compute the TF-IDF (which I selected on the parameter board).
The problem is that I don't get TF-IDF but only the number of occurences of the words and the number of documents in which they appear. Moreover, I don't see anymore the label of the document, so I am not able to distinguish the different documents.
Can somebody help me?
Thanks a lot,
Barthélémy

Answers

  • Options
    colocolo Member Posts: 236 Maven
    Hi Barthélémy,

    it sounds like you are only looking at the wordlist output (where the word occurences are shown). But also take a look at the example set output of the "Process Documents" operator. There you will see TF-IDF values and also the document's label.

    Instead of chaining "Documents to Data" and "Process Documents from Data" you can use the single operator "Process Documents" instead.

    Best regards
    Matthias
  • Options
    barthosbarthos Member Posts: 20 Contributor II
    Thanks Matthias !
    Barth
Sign In or Register to comment.