The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

problem with stopwordfilterfile

nguyenxuanhaunguyenxuanhau Member Posts: 22 Contributor II
edited November 2018 in Help
my file xml as:

<process version="4.6">

  <operator name="Root" class="Process" expanded="yes">
      <description text="Text Hau"/>
      <parameter key="logverbosity" value="init"/>
      <parameter key="random_seed" value="2001"/>
      <parameter key="send_mail" value="never"/>
      <parameter key="process_duration_for_mail" value="30"/>
      <parameter key="encoding" value="UTF-8"/>
      <operator name="TextInput" class="TextInput" expanded="yes">
          <list key="texts">
            <parameter key="graphics" value="dulieu"/>
          </list>
          <parameter key="default_content_type" value=""/>
          <parameter key="default_content_encoding" value="utf-8"/>
          <parameter key="default_content_language" value=""/>
          <parameter key="prune_below" value="-1"/>
          <parameter key="prune_above" value="-1"/>
          <parameter key="vector_creation" value="TermOccurrences"/>
          <parameter key="use_content_attributes" value="false"/>
          <parameter key="use_given_word_list" value="false"/>
          <parameter key="return_word_list" value="false"/>
          <parameter key="id_attribute_type" value="short"/>
          <list key="namespaces">
          </list>
          <parameter key="create_text_visualizer" value="false"/>
          <parameter key="on_the_fly_pruning" value="-1"/>
          <parameter key="extend_exampleset" value="false"/>
          <operator name="StringTokenizer" class="StringTokenizer">
          </operator>
          <operator name="StopwordFilterFile" class="StopwordFilterFile">
              <parameter key="file" value="dulieu/stopword.txt"/>
              <parameter key="case_sensitive" value="true"/>
          </operator>
      </operator>
  </operator>

</process>

when i run this file, it don't filter words that were encoded by utf-8

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    if you switch to expert mode of RapidMiner in the parameters view, you will see that there is an encoding parameter. If you set this parameter to UTF-8 the process will work.

    Greetings,
    Sebastian
Sign In or Register to comment.