Options

# [Solved] Problem with TF IDF calculation

Hello everyone,

I am currently working on a task where I have to resample some data. Because I am unsure if it's okay to use a method like SMOTE on already calculated tfidf weights I wanted to calculate the term occurances in Rapidminer, export and smote the data and later import it and calculate the TFIDF weights.

When I ran a test without the smoting step I came across the following behaviour which I just can't find an explanation to.

1. For the first piece of data I just calculate the TFIDF weights using the Process Documents Operator.

2. Then for the same input data I calculate only the term occurences, binary term occurences and term frequency and then use the "Generate TFIDF" Operator to calculate the tfidf weights.

However none of the combinations from step 2 comes out with the same values as the calculation in step 1. Am I missing something?

Does anyone have an answer to this?

Okay the problem seems to be that the "Process Documents" and "Generate TFIDF" Operators don't seem to work together.

I am currently working on a task where I have to resample some data. Because I am unsure if it's okay to use a method like SMOTE on already calculated tfidf weights I wanted to calculate the term occurances in Rapidminer, export and smote the data and later import it and calculate the TFIDF weights.

When I ran a test without the smoting step I came across the following behaviour which I just can't find an explanation to.

1. For the first piece of data I just calculate the TFIDF weights using the Process Documents Operator.

2. Then for the same input data I calculate only the term occurences, binary term occurences and term frequency and then use the "Generate TFIDF" Operator to calculate the tfidf weights.

However none of the combinations from step 2 comes out with the same values as the calculation in step 1. Am I missing something?

Does anyone have an answer to this?

Okay the problem seems to be that the "Process Documents" and "Generate TFIDF" Operators don't seem to work together.

0