"Data to text Analysis"

bkruger · January 2011

Hi,

I have data in this format:

Code Text
A This is some text that could by anything.
B This is some other text relating to something else.
A This is more text.
A Yet more text
C Another line with more text.

I import the data with Code=Label and Text=Text. I process this with the DataToDocuments operator followed by ProcessDocuments. You get the idea. Now, in the end, I want to know:

What is common for A, B and C. In other words, what defines A, B and C in terms of word frequencies in the text for each. I don't know RapidMiner well enough to work out the last part.
Can anyone please direct me in the right direction?

Much appreciated.
B

SebastianLoh · January 2011

Hi B,

what you would like to do is an interesting but also tough datamining task.

Maybe this works:

After your document processing (with probably filtering, pruning, TFID, etc.) you can try to apply a Weight by SVM or Weight by Value in order to find the descriptive terms (=Attributes after the the Doc processing) for each class. Do not expect perfect results, you might need to filter afterwards and experiment with the document processing.

The Weights to Data operator transforms the weight list into a ExampleSet which you can process further with the usaual operators.

Ciao Sebastian

P.S. Does anybody have better/other ideas?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Data to text Analysis"

Answers