Search
-
Re: Text analysis of single words
Hi! The default way to create attributes in a text mining context is TF-IDF: Term Frequency, Inverse Document Frequency. Term Frequency: How often is a word (token) in a document. Inverse Document Frequency: In how many documents the word (token) is. You can select another method in the "vector creation" parameter of… -
interpreting the sum of TF-IDF scores of words across documents
hi guys! after doing a clustering on a list of documents with the k-means, I would like to analyze the words in each cluster (in order to correlate them with other attributes). About this I added up the value of tf-idf for each words, but I think that this solution can be wrong. Could it be more correct to use term… -
Term Frequencies greater than 1
Dear all, I use "Text Processing - Process Documents From Files" to calculate word vectors for documents. As I read here: http://rapid-i.com/rapidforum/index.php?PHPSESSID=0aba344304fbb94614ad24f236d974e4&;topic=3728.0 term frequencies are normalized (as I expected). For me this means that term frequencies always have… -
A question about naive bayes based text classification
Hi, I am testing the naive bayes(NB) for text classification. To my understanding, the result should not be affected by the tf-idf vector of the text. Because NB considers the frequency of each term(t) in each category(c), i.e., p(t | c), and this information is stored in WordList, not the term vectors(i.e., the… -
Re: TF-IDF
-
Re: fp-growth and association rules cannot run
Hi Thomas, Thanks for your advice, and I changed the min_support to 0.05, but still there is no items found in FP-Growth so neither association rules. I also changed the vector creation to term occurence, TF-IDF and term frequency, they all produce "no items found". I wonder if it is due to the problems of files… -
Re: Document similarity of 2 excel spreadsheets containing text
-
Term Occurrences and Frequency - I have to be missing something
I am following along with this post because I wanted to ensure my intuition was correct, because I was seeing results that didn't make sense, to me anyway. https://community.rapidminer.com/discussion/46333/term-frequencies-and-tf-idf-how-are-these-calculated The only difference that I see in my process to start is that I… -
PSO showing error "incompatible number of attributes (821! = 4260)!"
hello, I am using RM for the sentiment analysis of the movie review dataset. i have tokenised the sentiments and have calculated the term frequency and TF-IDF for the words. for classification want to use 10-fold cross validated SVM-PSO but after 11th execution the tool returns the error "incompatible number of attributes… -
Process Documents multiple times to get TF-IDF and TO in one output file
Hi , this is my first post, so hello all. Ok sorted that using multiply but need the term frequency but not total occurances but by document. So if the word cheap appears in both documents I need to get the amount of occurances in document A and the amount of occurances in document B and NOT the combined total off…
36 results