Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Write excel

JonBJonB Member Posts: 3 Contributor I
edited July 2019 in Help
I apologise for asking such a basic question, but I have searched through support pages and cannot find anything that answers my query at the basic level I function at!

I followed instructions from the Vancouver blog to create a basic series of operators to extract text, tokenise and generate n-grams.
Now I want to output the results as an Excel, with values for each word/phrase identified in each text file so I can use other software to analyse it.
So, I added a "write excel" operator.
It runs fine and produces a "File Write excel" tab in my results. All that says is "memory buffered file".
Has something gone wrong?
Tagged:

Answers

  • Nils_WoehlerNils_Woehler Member Posts: 463 Maven
    Hi,

    if you want to store your results as an excle file you need to specify the file by changing the 'excle file' parameter of the 'Write Excle' operator.
    The file output port only emits an file object that can be used by other operators..

    Here is an example:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <process expanded="true" height="235" width="346">
          <operator activated="true" class="retrieve" compatibility="5.2.003" expanded="true" height="60" name="Retrieve" width="90" x="246" y="165">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="write_excel" compatibility="5.2.003" expanded="true" height="76" name="Write Excel" width="90" x="440" y="155">
            <parameter key="excel_file" value="REPLACE_ME"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Write Excel" to_port="input"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
    Best,
    Nils
  • JonBJonB Member Posts: 3 Contributor I
    Thanks Nils,

    is there any way to do this without editing the xml?
    The code generated by the current process is below.
    Sorry to be ignorant, but how do I go about modifying that?

    Thanks,

    Jon
     <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <process expanded="true" height="431" width="882">
          <operator activated="true" class="text:process_document_from_file" compatibility="5.2.001" expanded="true" height="76" name="Process Documents from Files" width="90" x="179" y="300">
            <list key="text_directories">
              <parameter key="Free" value="/Volumes/MyBook/Ethometrix/Text mining abstracts/ESVCE Free txt"/>
              <parameter key="Themed" value="/Volumes/MyBook/Ethometrix/Text mining abstracts/ESVCE Themed txt"/>
            </list>
            <parameter key="create_word_vector" value="false"/>
            <parameter key="add_meta_information" value="false"/>
            <parameter key="keep_text" value="true"/>
            <parameter key="prune_method" value="absolute"/>
            <parameter key="prune_below_absolute" value="2"/>
            <parameter key="prune_above_absolute" value="999"/>
            <process expanded="true" height="799" width="917">
              <operator activated="true" class="text:tokenize" compatibility="5.2.001" expanded="true" height="60" name="Tokenize" width="90" x="112" y="75"/>
              <operator activated="true" class="text:transform_cases" compatibility="5.2.001" expanded="true" height="60" name="Transform Cases" width="90" x="112" y="210"/>
              <operator activated="true" class="text:filter_stopwords_english" compatibility="5.2.001" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="112" y="345"/>
              <operator activated="true" class="text:stem_porter" compatibility="5.2.001" expanded="true" height="60" name="Stem (Porter)" width="90" x="246" y="75"/>
              <operator activated="true" class="text:generate_n_grams_terms" compatibility="5.2.001" expanded="true" height="60" name="Generate n-Grams (Terms)" width="90" x="246" y="210">
                <parameter key="max_length" value="3"/>
              </operator>
              <operator activated="true" class="text:filter_by_length" compatibility="5.2.001" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="246" y="345">
                <parameter key="min_chars" value="2"/>
                <parameter key="max_chars" value="999"/>
              </operator>
              <connect from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
              <connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
              <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
              <connect from_op="Stem (Porter)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/>
              <connect from_op="Generate n-Grams (Terms)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
              <connect from_op="Filter Tokens (by Length)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="write_excel" compatibility="5.2.003" expanded="true" height="76" name="Write Excel" width="90" x="514" y="345"/>
          <connect from_op="Process Documents from Files" from_port="example set" to_op="Write Excel" to_port="input"/>
          <connect from_op="Process Documents from Files" from_port="word list" to_port="result 1"/>
          <connect from_op="Write Excel" from_port="through" to_port="result 2"/>
          <connect from_op="Write Excel" from_port="file" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>
    [ /code]
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    the problem is that you have an outgoing connection on "fil" output of the Write Excel operator. You could either delete that connection to make the "excel_file" parameter appear in the operator parameters, or connect the Write File operator to the "fil" output to write the excel file either to disk or into the repository.

    All the best,
    Marius
Sign In or Register to comment.