jaskiemrjaskiemr Member Posts: 8 Contributor II
I run TFIDF on some text, four files.

1) alpha bravo
2) alpha bravo
3) alpha bravo charlie delta
4) alpha bravo charlie delta

How is the "statistic" field calculated in the Meta data view output here? Is the mean here the calculation the td/idf measure (f[ij] / f[dj] * log( D / f )?

When I run it on "charlie" from above, RapidMiner gives 0.354. When I run the calculation by hand 1/4 * log( 4 / 2 ) I get 0.075. Is this normalized somehow or is the log the natural log or base 2?

Thank you for any input.


  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    as I already explained in another topic, the mean is simply the statistical mean of all values in this attribute. Please take a look in the other topic for more information.

Sign In or Register to comment.