Options

"How to visualize textmining results"

maria_godricmaria_godric Member Posts: 20 Maven
edited May 2019 in Help
Hi,
How to visualize text mining results.Which operator can be used for that
Thanks
Maria

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    what exactly do you mean by "text mining results"? Usually you have only one outcome per text, which simply is noted in the table as prediction.

    Greetings,
      Sebastian
  • Options
    maria_godricmaria_godric Member Posts: 20 Maven
    Hi,
    Using other Text Mining operators i got TFIDF.Is there any other RM operators to graphically represent the words and its TFIDF.
    Thanks
    Maria
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    in the new TextProcessing Extension of RapidMiner 5.0 there you will be able to visualize the word list with the values of occurrences with the rich selection of plotters available in RapidMiner.

    If you want to see the source document of a word vector, you might use the TextExampleVisualizer. Whereever you click on an example in the plotter, you will see the original text.

    Greetings,
      Sebastian
  • Options
    derchiefderchief Member Posts: 5 Contributor II
    Hi,

    "you will be able to visualize the word list with the values of occurrences with the rich selection of plotters available"

    I tried a lot of combinations but couldn´t get any list not to mention a visualisation out of the wordlist I tried to create with the following code:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="logfile" value="C:\Dokumente und Einstellungen\cniemann\Eigene Dateien\bla.log"/>
        <parameter key="resultfile" value="C:\Dokumente und Einstellungen\cniemann\Eigene Dateien\blub.res"/>
        <process expanded="true" height="521" width="681">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="first data"/>
          </operator>
          <operator activated="true" class="nominal_to_text" expanded="true" height="76" name="Nominal to Text" width="90" x="179" y="120">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="Text"/>
          </operator>
          <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="120">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Text"/>
          </operator>
          <operator activated="true" class="text:process_document_from_data" expanded="true" height="76" name="Process Documents from Data" width="90" x="447" y="165">
            <list key="specify_weights"/>
            <process expanded="true" height="623" width="710">
              <operator activated="true" class="text:tokenize" expanded="true" height="60" name="Tokenize" width="90" x="306" y="197"/>
              <connect from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
          <connect from_op="Nominal to Text" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
          <connect from_op="Process Documents from Data" from_port="word list" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Where is the failure?

    Regards, chris
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Chris,
    you are definitively right. I forgot that the count's aren't shown when you don't have a label. In fact this is a small bug that I have fixed now. Unfortunately the latest update of the Text Extension was already rolled out, so it didn't made it in time. But there's a workaround:
    Simply add a label by inserting a generate attribute operator and setting this attribute to role label:
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="442" width="948">
          <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes" width="90" x="246" y="75">
            <list key="function_descriptions"/>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="380" y="75">
            <parameter key="name" value="label"/>
            <parameter key="target_role" value="label"/>
          </operator>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>

    The following process will show you, how you can turn the word list into an example set and use arbitrary plotter:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="logfile" value="C:\Dokumente und Einstellungen\cniemann\Eigene Dateien\bla.log"/>
        <parameter key="resultfile" value="C:\Dokumente und Einstellungen\cniemann\Eigene Dateien\blub.res"/>
        <process expanded="true" height="521" width="748">
          <operator activated="true" class="text:create_document" expanded="true" height="60" name="Create Document" width="90" x="45" y="30">
            <parameter key="text" value="Hi,&#10;in the new TextProcessing Extension of RapidMiner 5.0 there you will be able to visualize the word list with the values of occurrences with the rich selection of plotters available in RapidMiner.&#10;&#10;If you want to see the source document of a word vector, you might use the TextExampleVisualizer. Whereever you click on an example in the plotter, you will see the original text.&#10;&#10;Greetings,&#10;  Sebastian"/>
            <parameter key="add label" value="true"/>
            <parameter key="label_value" value="label"/>
          </operator>
          <operator activated="true" class="text:create_document" expanded="true" height="60" name="Create Document (2)" width="90" x="45" y="120">
            <parameter key="text" value="Hi,&#10;Using other Text Mining operators i got TFIDF.Is there any other RM operators to graphically represent the words and its TFIDF.&#10;Thanks&#10;Maria"/>
            <parameter key="add label" value="true"/>
            <parameter key="label_value" value="label"/>
          </operator>
          <operator activated="true" class="text:documents_to_data" expanded="true" height="94" name="Documents to Data" width="90" x="313" y="30">
            <parameter key="text_attribute" value="text"/>
            <parameter key="label_attribute" value="labelAttribute"/>
          </operator>
          <operator activated="true" class="text:process_document_from_data" expanded="true" height="76" name="Process Documents from Data" width="90" x="447" y="30">
            <list key="specify_weights"/>
            <process expanded="true" height="623" width="710">
              <operator activated="true" class="text:tokenize" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
              <connect from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="text:wordlist_to_data" expanded="true" height="76" name="WordList to Data" width="90" x="581" y="30"/>
          <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
          <connect from_op="Create Document (2)" from_port="output" to_op="Documents to Data" to_port="documents 2"/>
          <connect from_op="Documents to Data" from_port="example set" to_op="Process Documents from Data" to_port="example set"/>
          <connect from_op="Process Documents from Data" from_port="word list" to_op="WordList to Data" to_port="word list"/>
          <connect from_op="WordList to Data" from_port="word list" to_port="result 1"/>
          <connect from_op="WordList to Data" from_port="example set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    With the next update the displaying bug will be removed and the table shows all columns that are there but are currently not shown.
    Following the normal update cycle, it will be update in around a month again, if no enterprise customer demands an earlier update.

    Greetings,
      Sebastian
  • Options
    derchiefderchief Member Posts: 5 Contributor II
    Hello again,

    you said, "you can turn the word list into an example set and use arbitrary plotter". In my results workplace, there is just one tab (result overview). Only the metadata of the example set from the "wordlisttodata" is shown there with: Role, Name, Type, Range etc., but there is no wordlist. In screenshots of RM I´ve seen tabs like "ExampleSet(Retrieve)". I guess I could find the results in such tabs normally, but in my case they are not shown. In your example you said I should "add a label by inserting a generate attribute operator and setting this attribute to role label". The generate attribute operator demands a list entry with an attribute name (for what?) and a function expression. I don´t know what I should paste here?! 

    best regards
    chris
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Chris,
    sorry I forgot to add example values to the parameter list:
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="442" width="948">
          <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes" width="90" x="246" y="75">
            <list key="function_descriptions">
              <parameter key="label" value="&quot;any string value&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="380" y="75">
            <parameter key="name" value="label"/>
            <parameter key="target_role" value="label"/>
          </operator>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    The Attribute name will be the name of the attribute that is added to your example set. The Expression is evaluated for each example. Please take a look at the operator documentation for details on this. (In the Help view) Here the expression comes down to just enter a constant: For example something like I entered in my example above. This String will be set as value of each example in the new attribute column "label".

    Greetings,
      Sebastian
Sign In or Register to comment.