The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
"Stopwords Dictionary Won't Work"
james_hickman
Member Posts: 1 Learner III
I can't get the stopwords dictionary operator to work.
I would like to use it to treat whole sentences as stopwords. (I am processing emails and some of them contain text which is a reply to standard emails)
However, I have simplified things as much as possible to try and understand the operator.
I have a .txt file with 11 single words, 1 per line. I use this as the input file for the filter stopwords (Dictionary) operator
I then created a text file using these 11 words and a few random words. I used this as the document input for the filter stopwords (Dictionary) operator
I run the process and all the words are still present. XML below:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="open_file" compatibility="5.3.008" expanded="true" height="60" name="Open File" width="90" x="179" y="165">
<parameter key="filename" value="C:\Users\james.hickman\Desktop\RTStopWordDictionary.txt"/>
</operator>
<operator activated="true" class="text:read_document" compatibility="5.3.000" expanded="true" height="60" name="Read Document" width="90" x="179" y="75">
<parameter key="file" value="C:\Users\james.hickman\Desktop\testdoc.txt"/>
<parameter key="encoding" value="UTF-8"/>
</operator>
<operator activated="true" class="text:filter_stopwords_dictionary" compatibility="5.3.000" expanded="true" height="76" name="Filter Stopwords (Dictionary)" width="90" x="380" y="120">
<parameter key="file" value="C:\Users\james.hickman\Desktop\RTStopWordDictionary.txt"/>
</operator>
<connect from_op="Open File" from_port="file" to_op="Filter Stopwords (Dictionary)" to_port="file"/>
<connect from_op="Read Document" from_port="output" to_op="Filter Stopwords (Dictionary)" to_port="document"/>
<connect from_op="Filter Stopwords (Dictionary)" from_port="document" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I would like to use it to treat whole sentences as stopwords. (I am processing emails and some of them contain text which is a reply to standard emails)
However, I have simplified things as much as possible to try and understand the operator.
I have a .txt file with 11 single words, 1 per line. I use this as the input file for the filter stopwords (Dictionary) operator
I then created a text file using these 11 words and a few random words. I used this as the document input for the filter stopwords (Dictionary) operator
I run the process and all the words are still present. XML below:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="open_file" compatibility="5.3.008" expanded="true" height="60" name="Open File" width="90" x="179" y="165">
<parameter key="filename" value="C:\Users\james.hickman\Desktop\RTStopWordDictionary.txt"/>
</operator>
<operator activated="true" class="text:read_document" compatibility="5.3.000" expanded="true" height="60" name="Read Document" width="90" x="179" y="75">
<parameter key="file" value="C:\Users\james.hickman\Desktop\testdoc.txt"/>
<parameter key="encoding" value="UTF-8"/>
</operator>
<operator activated="true" class="text:filter_stopwords_dictionary" compatibility="5.3.000" expanded="true" height="76" name="Filter Stopwords (Dictionary)" width="90" x="380" y="120">
<parameter key="file" value="C:\Users\james.hickman\Desktop\RTStopWordDictionary.txt"/>
</operator>
<connect from_op="Open File" from_port="file" to_op="Filter Stopwords (Dictionary)" to_port="file"/>
<connect from_op="Read Document" from_port="output" to_op="Filter Stopwords (Dictionary)" to_port="document"/>
<connect from_op="Filter Stopwords (Dictionary)" from_port="document" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
After the Filter Stopword operator use a Process Documents operator to turn the document into an example set
Here's a simple document Here's a simple stopword file containing sentences Here's a process to use them Andrew