[Solved] Problem with TF IDF calculation

danieldaniel Member Posts: 12 Contributor II
edited November 2018 in Help
Hello everyone,

I am currently working on a task where I have to resample some data. Because I am unsure if it's okay to use a method like SMOTE on already calculated tfidf weights I wanted to calculate the term occurances in Rapidminer, export and smote the data and later import it and calculate the TFIDF weights.

When I ran a test without the smoting step I came across the following behaviour which I just can't find an explanation to.

1. For the first piece of data I just calculate the TFIDF weights using the Process Documents Operator.
2. Then for the same input data I calculate only the term occurences, binary term occurences and term frequency and then use the "Generate TFIDF" Operator to calculate the tfidf weights.

However none of the combinations from step 2 comes out with the same values as the calculation in step 1. Am I missing something?

Does anyone have an answer to this?

Okay the problem seems to be that the "Process Documents" and "Generate TFIDF" Operators don't seem to work together.
Sign In or Register to comment.