"Data to text Analysis"

bkrugerbkruger Member Posts: 17 Maven
edited May 2019 in Help

I have data in this format:

Code        Text

A                This is some text that could by anything.
B                This is some other text relating to something else.
A                This is more text.
A                Yet more text
C                Another line with more text.

I import the data with Code=Label and Text=Text. I process this with the DataToDocuments operator followed by ProcessDocuments. You get the idea. Now, in the end, I want to know:

What is common for A, B and C. In other words, what defines A, B and C in terms of word frequencies in the text for each. I don't know RapidMiner well enough to work out the last part.
Can anyone please direct me in the right direction?

Much appreciated.


  • SebastianLohSebastianLoh Member Posts: 99 Contributor II
    Hi B,

    what you would like to do is an interesting but also tough datamining task.

    Maybe this works:

    After your document processing (with probably filtering, pruning, TFID, etc.)  you can try to apply a Weight by SVM or Weight by Value in order to find the descriptive terms (=Attributes after the the Doc processing) for each class. Do not expect perfect results, you might need to filter afterwards and experiment with the document processing.

    The Weights to Data operator transforms the weight list into a ExampleSet which you can process further with the usaual operators.

    Ciao Sebastian

    P.S. Does anybody have better/other ideas?
Sign In or Register to comment.