"Text Mining beginner HELP"

ayaghciayaghci Member Posts: 5 Contributor II
edited June 2019 in Help

I am quite new for text mining process. I am trying to user defined external dictionary but having problem.

My question is that when I create user-defined dictionary (in notepad), what will be the structure. For instance
I did craete my file as, and I used (open wordnet dictionary)
artier, arty
artiest, arty

but I am getting an error.

I appreciate any comments, or any reference (book, website) suggestion



  • Nils_WoehlerNils_Woehler Member Posts: 463 Maven

    with which operator do you want to use your external dictionary? Depending on the operator the structure of the dictionary may change.

  • ayaghciayaghci Member Posts: 5 Contributor II
    Hi Nills

    I am trying to use [Stem (Dictionary)] operator. My intention is that (1) generate dictionary (txt form), (2) stem the tokens

    Thanks in advance

  • Nils_WoehlerNils_Woehler Member Posts: 463 Maven

    your dictionary has to look like this:

    and your process may look like this:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.007">
      <operator activated="true" class="process" compatibility="5.2.007" expanded="true" name="Process">
        <process expanded="true" height="235" width="547">
          <operator activated="true" class="text:read_document" compatibility="5.2.001" expanded="true" height="60" name="Read Document" width="90" x="132" y="152">
            <parameter key="file" value=""/>
          <operator activated="true" class="text:tokenize" compatibility="5.2.001" expanded="true" height="60" name="Tokenize" width="90" x="313" y="165"/>
          <operator activated="true" class="text:stem_dictionary" compatibility="5.2.001" expanded="true" height="60" name="Stem (Dictionary)" width="90" x="447" y="165">
            <parameter key="file" value=""/>
          <connect from_op="Read Document" from_port="output" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Stem (Dictionary)" to_port="document"/>
          <connect from_op="Stem (Dictionary)" from_port="document" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
Sign In or Register to comment.