Options

[SOLVED] Dictionary Stemmer doesn´t work

claudioluciodovclaudioluciodov Member Posts: 2 Contributor I
edited December 2019 in Help
Hi folks,
  I am trying to implement an simple process to test the stemmer dictionary:
      text file, teste.txt with the following content:
   
someday
other
day
                     

I create the example file in dictionary stemmer, file stem_teste.txt with the following content:

weekday : .*day            

When I run the process. It doesn´t work. I suppose that "someday" must be changed by the "weekday".



XML File:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.000">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process">
   <parameter key="logverbosity" value="all"/>
   <parameter key="logfile" value="C:\log.rm"/>
   <process expanded="true" height="280" width="413">
     <operator activated="true" class="text:read_document" compatibility="5.2.000" expanded="true" height="60" name="Read Document" width="90" x="45" y="30">
       <parameter key="file" value="D:\teste.txt"/>
     </operator>
     <operator activated="true" class="text:tokenize" compatibility="5.2.000" expanded="true" height="60" name="Tokenize" width="90" x="179" y="165">
       <parameter key="mode" value="linguistic tokens"/>
     </operator>
     <operator activated="true" class="text:stem_dictionary" compatibility="5.2.000" expanded="true" height="60" name="Stem (Dictionary)" width="90" x="313" y="165">
       <parameter key="file" value="D:\stem_teste.txt"/>
     </operator>
     <connect from_op="Read Document" from_port="output" to_op="Tokenize" to_port="document"/>
     <connect from_op="Tokenize" from_port="document" to_op="Stem (Dictionary)" to_port="document"/>
     <connect from_op="Stem (Dictionary)" from_port="document" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    the operator help is a bit misleading. Your dictionary file must not contain any spaces around the colon. The following dictionary file should work:
    weekday:.*day
    Best, Marius
  • Options
    claudioluciodovclaudioluciodov Member Posts: 2 Contributor I
    Thanks a lot !!!
Sign In or Register to comment.