Process document from data

barthos · April 2011

Hello,
I'm very begginer at Rapid Miner and applying the video tutorials found on http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-part-3.html (text mining)
I have a problem at the very basic level.
I want to use the tool "Process document from data" to compute binary word vectors
To do so, I load an excel file with the embedded read excel tool. My file is a unique columns with 500 rows each containing text data. I then send this to the "exa" input of the Process document from data box. In the box, I make some basic processings (tokenize, single case, word filter and token filter). And I connect the "exa" output of the box to the results connector.
The problem is that I dont get vectors but only a two columns table, first column = row numbers (1,2,etc.), second rows titled "text" but with empty cells. The description of the data is : ExampleSet(437 examples, 1 special attribute, 0 regular attributes). What can I do ????

When I put a break point after the read excel tool, I get (in the results) a two columns table, the first one with Row No. and the second with the rows in my excel file. So it looks like the file is red properly...
Help!
Thanks,
Barthélémy

colo · April 2011

Hi Barthélémy,

you have to tell the "Process Documents from Data" operator which attribute shall be treated as text. Usually if you use the similar operators for files or documents this is clear. The document or file body is used as text, but if you have an example set there are many attributes that can potentially contain the text. You have to set this before the processing starts (even if you only have one single attribute). To do so, use the "Nominal to text" operator after "Read Excel". The attribute with type text is then used as document content for the processing inside the following operator.

Best regards
Matthias

barthos · April 2011

Fantastic!
Thanks a lot Mathias, you make me gain about two days of work !
I'd like to offer you a beer. I'm in paris, what about you?
Thanks again,
Barthélémy

mrfabrittzio · December 2016

You sir are brilliant, thx so much!

laurahajnalka · October 2018

Dear Matthias,

I have a similar problem, but not the same. I have csv files with two columns. One contains words of a document, the other contains the occurrence of the words. I would like to filter the rows I do not need for my model. I used the "Nominal to text" operator, but I still can not filter the stopwords, because the "Process Documents from Data" operator seems to not working. Whatever I put inside, the result is going to be 0 lines. What can I do?

Thank you in advance!

Laura

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Process document from data

Answers