Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Filter Stopwords (Dictionary): how to connect the dictionary"

KrystynaKrystyna Member Posts: 2 Contributor I
edited June 2019 in Help
Hello everybody,

I have RapidMiner 5.2.0003, where Filter Stopwords (Dictionary) module differes from the previous version. I can not manage to connest the file with stopwords anymore. Earlier i just selected the txt-file. Now there is an input-file parameter. I tried to use retrieve, read from... etc. but it doesn't work. Could you please advise?

Thanks a lot!

My best
Krystyna

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    RapidMiner 5.2.3 has been released more than 7 months ago. Please update both RapidMiner and the Text Processing extension to the latest version, and if the problem still occurs, please give a detailed problem description with an example process according to the post linked in my signature.

    Best, Marius
  • KrystynaKrystyna Member Posts: 2 Contributor I
    Hi Marius,

    My softrawe is updated. In video tutorials I only habe seen examples for older vesrion, where Modul Filter Stopwords (Dictionary) had another structure. this is my process:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" breakpoints="after" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <process expanded="true" height="476" width="547">
          <operator activated="true" class="retrieve" compatibility="5.2.003" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="Nachfrager 2012-07_Lexikon"/>
          </operator>
          <operator activated="true" class="text:process_document_from_data" compatibility="5.2.004" expanded="true" height="76" name="Process Documents from Data" width="90" x="179" y="210">
            <parameter key="keep_text" value="true"/>
            <parameter key="prunde_below_percent" value="2.0"/>
            <parameter key="prune_above_percent" value="100.0"/>
            <list key="specify_weights"/>
            <process expanded="true" height="763" width="785">
              <operator activated="true" class="text:transform_cases" compatibility="5.2.004" expanded="true" height="60" name="Transform Cases" width="90" x="112" y="120"/>
              <operator activated="true" class="text:tokenize" compatibility="5.2.004" expanded="true" height="60" name="Tokenize" width="90" x="246" y="210"/>
              <operator activated="true" class="text:stem_german" compatibility="5.2.004" expanded="true" height="60" name="Stem (German)" width="90" x="380" y="300"/>
              <connect from_port="document" to_op="Transform Cases" to_port="document"/>
              <connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_op="Stem (German)" to_port="document"/>
              <connect from_op="Stem (German)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
          <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Your example process does not help much, since it does not even contain the Filter Stopwords operator (that's how we call "modules" in RapidMiner: "Operator"). However, if you disconnect the file input port, the option to select a text file will re-appear. The file input port is supposed to be used together with the Open File operator, which can also read from web resources and thus makes operators relying on file input more flexible. But as I said, just disconnect the port to get the old behaviour back.

    Happy Mining!
    ~Marius
Sign In or Register to comment.