Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Term Frequencies greater than 1
Dear all,
I use "Text Processing - Process Documents From Files" to calculate word vectors for documents.
As I read here: http://rapid-i.com/rapidforum/index.php?PHPSESSID=0aba344304fbb94614ad24f236d974e4&;topic=3728.0
term frequencies are normalized (as I expected).
For me this means that term frequencies always have values < 1.
In my case I use TF-IDF for vector creation as proposed, and get some term frequencies in the range of 1E+10 or 1E+11.
Looking at the related documents they appear to be "normal".
Any ideas why this happens? What I´m not understanding?
I use "Text Processing - Process Documents From Files" to calculate word vectors for documents.
As I read here: http://rapid-i.com/rapidforum/index.php?PHPSESSID=0aba344304fbb94614ad24f236d974e4&;topic=3728.0
term frequencies are normalized (as I expected).
For me this means that term frequencies always have values < 1.
In my case I use TF-IDF for vector creation as proposed, and get some term frequencies in the range of 1E+10 or 1E+11.
Looking at the related documents they appear to be "normal".
Any ideas why this happens? What I´m not understanding?
0
Answers
Do I think wrong?
Can term frequencies be greater than 1?
Are there circumstances where it is better to use method for vector creation?
Under which circumstances which method for vector creation is most appropriate?
Many thanks in advance for any hint ...
Dortmund, Germany
thanks a lot for your hints ...
I´ll try and see.
BR