Options

[SOLVED] How is term frequency calculated?

kasper2304kasper2304 Member Posts: 28 Contributor II
edited November 2018 in Help
Dear Rapid forum

Sorry for raising this question again, but I simply cannot figure out how rapidminer calculates term frequency.

where question have been addressed:
http://rapid-i.com/rapidforum/index.php/topic,4825.0.html

I have setup a reaaly simple example in rapidminer to try this out and in one documents I have 5 different terms and 5 terms i total within a document. This yields a tf score on 0.447 and I simply cannot figure out how this happens. It should not be that difficult but apparently it is...

Best
Kasper

Answers

  • Options
    RWingerterRWingerter Member Posts: 38 Contributor II
    It's explained in this video:

    http://www.youtube.com/watch?v=ToxzfYECxOU

    Roland
  • Options
    kasper2304kasper2304 Member Posts: 28 Contributor II
    Hi Roland.


    He only speaks about the TF-IDF score not the TF score. I know they are closely related but I  think i figured it out meanwhile:

    In my case I have 5 terms meaning the the total number of terms is 5. A given term only occurs once in my case giving the equation of tf:

    tf = countofterm(termi) / sqrt(totalnumberofterms)->  1 / sqrt(5)  = 0.447


    I think my problem was that i was trying to read the source code but did not get it fully.

    Anyways, thanks for the hint.

    I consider the question answered

    Kasper
  • Options
    RWingerterRWingerter Member Posts: 38 Contributor II
    Hello Kasper,

    glad you figured it out. Thanks for the details.

    Roland
Sign In or Register to comment.