Open WordNet Dictionary and Extract Sentiment

pix123pix123 Member Posts: 27 Contributor I
edited January 16 in Help
Hi there,

I am using the open wordnet dictionary operator along with the extract sentiment (english) operator. I have set up the dictionary path to point to the correct folder. I have a couple 100 text files that I want to analyze, if I analyze just 8 of those files the process runs fine, however if I try run it against 9+ of the text files I get an I/O error that the resource can't be read and parsed.

Is it possible to have the wordnet operator run once and remember the list of words instead of running for each time a new file is captured through the process documents from files operator?

If this is not possible, is there a way to overcome this issue?

Many Thanks.

Answers

  • pix123pix123 Member Posts: 27 Contributor I
    Anyone able to assist with this query?
  • MaerkliMaerkli Member Posts: 84   Unicorn
    Hello pic123,
    Is it perhaps an idea to post an XML file?
    Maerkli
    lionelderkrikor
  • pix123pix123 Member Posts: 27 Contributor I
    @Maerkli please see the attached XML process. Any help would be much appreciated. Thanks.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="text:process_document_from_file" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Files" width="90" x="246" y="34">
            <list key="text_directories">
              <parameter key="fx_bukley_reviews" value="C:Tripadvisor Text Files 1"/>
            </list>
            <process expanded="true">
              <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="85"/>
              <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases" width="90" x="246" y="85"/>
              <operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="380" y="85">
                <parameter key="min_chars" value="3"/>
              </operator>
              <operator activated="true" class="text:stem_porter" compatibility="8.1.000" expanded="true" height="68" name="Stem (Porter)" width="90" x="581" y="85"/>
              <operator activated="true" class="wordnet:open_wordnet_dictionary" compatibility="5.3.000" expanded="true" height="68" name="Open WordNet Dictionary" width="90" x="581" y="187">
                <parameter key="directory" value="C:\Desktop\WordNet-3.0\dict"/>
              </operator>
              <operator activated="true" class="wordnet:find_sentiment_wordnet" compatibility="5.3.000" expanded="true" height="82" name="Extract Sentiment (English)" width="90" x="782" y="85"/>
              <connect from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
              <connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
              <connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
              <connect from_op="Stem (Porter)" from_port="document" to_op="Extract Sentiment (English)" to_port="document"/>
              <connect from_op="Open WordNet Dictionary" from_port="dictionary" to_op="Extract Sentiment (English)" to_port="dictionary"/>
              <connect from_op="Extract Sentiment (English)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.0.003" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34">
            <list key="function_descriptions">
              <parameter key="Recommend" value="if(sentiment&lt;0,&quot;NO&quot;,&quot;YES&quot;)"/>
            </list>
          </operator>
          <connect from_port="input 1" to_op="Process Documents from Files" to_port="word list"/>
          <connect from_op="Process Documents from Files" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


  • MaerkliMaerkli Member Posts: 84   Unicorn
    Thanks for having shared your XML file. By executing the process, I can't reproduce what you see because I don't have the input data file.
    @Lionelderkrikor , may I ask you to have a look, please?
    Maerkli

     



  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 731   Unicorn
    Hi @pix123,

    Is it always on the same txt file that the error occurs ? can you perform some tests ?

    In order we can reproduce what you observe, can you share : 
     - your .txt files (a minima 9 .txt files)
     - your dictionnary

    Regards,

    Lionel
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 731   Unicorn
    Hi again @pix123,

    I executed your process on 12 of my own .txt files (with Wordnet 3.0 dictionnary) and I have no problem : Your process
    works fine...
    So my hypothesis is that one of your .txt files poses problem...

    Regards,

    Lionel
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 731   Unicorn
    Hi again @pix123,

    Maybe an answer element : 
    Try to set file pattern = *.txt (instead file pattern = *) in the Process Documents from Files parameters.


    Hope it helps,

    Regards,

    Lionel
    Maerkli
  • pix123pix123 Member Posts: 27 Contributor I
    edited December 2018
    @Maerkli @lionelderkrikor thank you for the suggestions so far, I have tried a random sample of 10 files but still get an error after the 8th file has processed. I also tried your suggestion of changing the file pattern to *txt but continue to get the I/O error.

    Attached is the dictionary and some sample *txt files. Appreciate any help in resolving.

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 731   Unicorn
    Hi @pix123,

    I'm not able to reproduce the I/O error : Your process works fine with the 16 .txt files you shared on my computer.
    Can you detail the I/O error you encounred ? Can you share the RapidMiner log file ?

    Regards,

    Lionel
  • pix123pix123 Member Posts: 27 Contributor I
    edited December 2018
    Duplicate
  • pix123pix123 Member Posts: 27 Contributor I
    edited December 2018
    Dublicate
  • pix123pix123 Member Posts: 27 Contributor I
    @lionelderkrikor thank you for your assistance thus far, attached are both the log file and a screenshot of the error I encounter.
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 731   Unicorn
    Hi @pix123,

    Which OS are you running?  

    Regards,

    Lionel
  • pix123pix123 Member Posts: 27 Contributor I
    edited December 2018
    Duplicate

  • pix123pix123 Member Posts: 27 Contributor I
    edited December 2018
    Duplicate




  • pix123pix123 Member Posts: 27 Contributor I
    edited December 2018
    Duplicate

  • pix123pix123 Member Posts: 27 Contributor I
    edited December 2018
    Duplicate
  • pix123pix123 Member Posts: 27 Contributor I
    edited December 2018
    Duplicate

  • pix123pix123 Member Posts: 27 Contributor I
    @lionelderkrikor I am running on Windows 10
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 731   Unicorn
    Hi @pix123,

    I'm running Windows 10 too and with your files and your Wordnet dictionary, the process works fine here.
    I do not know what to think...
    Try to update RapidMiner to the latest release (RM 9.1)...and if needed the extension Wordnet

    Anyone have an idea?

    Regards,

    Lionel 
  • pix123pix123 Member Posts: 27 Contributor I
    @lionelderkrikor thank you, I tried another computer also running Windows 10 and got the same error.

    If anyone else has suggestions they are appreciated. Thanks 
  • MaerkliMaerkli Member Posts: 84   Unicorn
    Hallo pix123,
    Lionel has made an amazing job, as usual. I can't add anything. Did you use Breakpoints - it can help to check the execution flow?
    Maerkli
    lionelderkrikorsgenzer
Sign In or Register to comment.