Search
-
Re: "Creating SVDs in X-Validation operator very slow"
Sebastian, I agree, a warning would be nice. In addition, another thing to consider is changing the TFIDFFilter class to set zeros for columns without any counts. Although the missing values can currently be changed to zeros with the Replace Missing Values operator, this (1) requires the use of another operator and (2)… -
[Solved] Problem with TF IDF calculation
Hello everyone, I am currently working on a task where I have to resample some data. Because I am unsure if it's okay to use a method like SMOTE on already calculated tfidf weights I wanted to calculate the term occurances in Rapidminer, export and smote the data and later import it and calculate the TFIDF weights. When I… -
Re: Extracting the most representative 10 keywords from web page
As @Thomas_Ott suggests, this is definitely possible, but it will require a series of operators. Working with text from web pages can be quite tricky because of all the extra html and formatting. It also depends on what you mean by "10 most representative" words. Many times, the most frequent words are not necessarily the… -
Process Document from Data
Hello, Everyone! I am very beginner in rapid miner and doing a sentiment analysis on tweets. I have a problem at a basic level. I am using a tool process document data to generate tf-idf vector and word counts after cleaning the tweets. I have opened an excel file which containing 2000 tweets with reading excel… -
Text clustering and labeling
Hi, I'm using Rapidminer for text clustering (kmeans) and then labeling the clusters. We have usually around 2000 documents and the texts are in German. The texts are short (title and short description of news or articles) and so far Rapidminer is working nice! In the text processing phase, I use Term Frequency vectors,… -
Re: Text Pre-processing
You need to download and install the free text mining extension from the marketplace. The operator "Process Documents" will generate a word vector using term frequency if you set that as the option in the parameters (TF-IDF is the default), and it will also automatically generate the bag of words for you if you use the… -
[SOLVED] Rename regular attributes generated by Text Processing
Hi all, I'm a newbie in using RapidMiner. I hope I'm placing my issue in the right place. But, first of all let me congratulate the support team for lunching this forum. I hope I can contribute also to solve other issues. Going back to my problem. I'm using the Text Processing module in order to create term vector… -
Re: [SOLVED] Transform Document-Term matrix to flat table?
Hi Marcin, thanks for your reply. Here is my example data and my simple process. The input is a list of user queries (query_id, query, frequency), which is processed with "Process Documents from Data". The result is a word list and a document-term matrix. In addition, I would like to get a term-document table with Term,… -
LOF on Text Data
Hello Team, I am fairly new to RM and currently conducting some research on online text. In particular I am trying to detect outliers from an set of documents by using the LOF operator. Now I have some troubles, since the LOF for each document is very close to 1, no matter how I set the MinPtsUB and MinPtsLB. Basically I… -
"Generate pivoted example set from word vector"
Hi, I have got some text entries in an excel worksheet that I would like to text mine and find associations(if any) between some words. So my initial thinking was to process the text into Process Documents from Data->Convert WordList to Data and then Pivot it. The problem is after processing documents, I only get a word…
36 results