HOWTO ExcelExampleSetWriter in Classification Output

mdcmdc Member Posts: 58 Maven
edited October 2019 in Help
Hi,

I am trying the text classification example and I want to output the results, prediction and confidence columns, to excel. Is this possible? I tried the code below but it output the attributes only.

thanks,
Matthew

<operator name="Root" class="Process" expanded="yes">
    <description text="#ylt#h3#ygt#Loading and applying a text classifier#ylt#/h3#ygt##ylt#p#ygt#This experiments shows how to load a text classifier and how to apply it a new set of texts.#ylt#/p#ygt##ylt#p#ygt##ylt#b#ygt#Important note:#ylt#/b#ygt#You have to load the wordlist stored in the experiment that created the text classification model. Otherwise, the TextInput will not know which dimensions to use for the vector space and the learned model and the new text representations will not match. #ylt#/p#ygt#"/>
    <operator name="TextInput" class="TextInput" expanded="yes">
        <list key="texts">
          <parameter key="graphics" value="../data/newsgroup/graphics"/>
        </list>
        <parameter key="default_content_language" value="english"/>
        <parameter key="input_word_list" value="../data/training_words.list"/>
        <list key="namespaces">
        </list>
        <operator name="StringTokenizer" class="StringTokenizer">
        </operator>
        <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
        </operator>
        <operator name="TokenLengthFilter" class="TokenLengthFilter">
            <parameter key="min_chars" value="3"/>
        </operator>
        <operator name="PorterStemmer" class="PorterStemmer">
        </operator>
    </operator>
    <operator name="ModelLoader" class="ModelLoader">
        <parameter key="model_file" value="../data/training_model.mod"/>
    </operator>
    <operator name="ModelApplier" class="ModelApplier">
        <list key="application_parameters">
        </list>
    </operator>
    <operator name="ExcelExampleSetWriter" class="ExcelExampleSetWriter">
        <parameter key="excel_file" value="C:\Documents and Settings\matthew_garong\My Documents\Matthew\TM_Workspace\rapidminer-text-4.4-examples\04_Learning\apply_output_temp.xls"/>
    </operator>
</operator>
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Matthew,
    I think the problem here are the limitations of the excel format .xls. It supports only a very small number of columns, as you know from A to ZZ. But especial text mining produces a great bunch of attributes, more than 10.000 are usuall.
    If you are interested solely in the prediction and confidence columns, you might filter the attributes before writing them to excel.

    You only need to insert the AttributeFilter operator before the excel writer. This is done in the following process:
    <operator name="Root" class="Process" expanded="yes">
        <description text="#ylt#h3#ygt#Loading and applying a text classifier#ylt#/h3#ygt##ylt#p#ygt#This experiments shows how to load a text classifier and how to apply it a new set of texts.#ylt#/p#ygt##ylt#p#ygt##ylt#b#ygt#Important note:#ylt#/b#ygt#You have to load the wordlist stored in the experiment that created the text classification model. Otherwise, the TextInput will not know which dimensions to use for the vector space and the learned model and the new text representations will not match. #ylt#/p#ygt#"/>
        <operator name="TextInput" class="TextInput" expanded="yes">
            <list key="texts">
              <parameter key="graphics" value="../data/newsgroup/graphics"/>
            </list>
            <parameter key="default_content_language" value="english"/>
            <parameter key="input_word_list" value="../data/training_words.list"/>
            <list key="namespaces">
            </list>
            <operator name="StringTokenizer" class="StringTokenizer">
            </operator>
            <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
            </operator>
            <operator name="TokenLengthFilter" class="TokenLengthFilter">
                <parameter key="min_chars" value="3"/>
            </operator>
            <operator name="PorterStemmer" class="PorterStemmer">
            </operator>
        </operator>
        <operator name="ModelLoader" class="ModelLoader">
            <parameter key="model_file" value="../data/training_model.mod"/>
        </operator>
        <operator name="ModelApplier" class="ModelApplier">
            <list key="application_parameters">
            </list>
        </operator>
        <operator name="AttributeFilter" class="AttributeFilter">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="parameter_string" value=".*"/>
            <parameter key="invert_filter" value="true"/>
        </operator>
        <operator name="ExcelExampleSetWriter" class="ExcelExampleSetWriter">
            <parameter key="excel_file" value="C:\Documents and Settings\matthew_garong\My Documents\Matthew\TM_Workspace\rapidminer-text-4.4-examples\04_Learning\apply_output_temp.xls"/>
        </operator>
    </operator>
    Greeting,
      Sebastian
  • mdcmdc Member Posts: 58 Maven
    It worked.

    thanks.
Sign In or Register to comment.