Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Error in Web Crawl: The example set must contain at least one text attribute
Hi everybody.
in the following, simple, example, where I process a few pages retrieved by a web crawl, I keep seeing the error "The example set must contain at least one text attribute". The "Nominal to Text", which is suggested as a solution this very same problem in other posts, does not remove the error. Interestingly, there seems to be indeed a 'text' attribute in th example set and the error does not seem to affect the execution of the process. What am I missing?
Thanks!
Marco
in the following, simple, example, where I process a few pages retrieved by a web crawl, I keep seeing the error "The example set must contain at least one text attribute". The "Nominal to Text", which is suggested as a solution this very same problem in other posts, does not remove the error. Interestingly, there seems to be indeed a 'text' attribute in th example set and the error does not seem to affect the execution of the process. What am I missing?
Thanks!
Marco
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="510" width="909">
<operator activated="true" class="web:crawl_web" compatibility="5.2.003" expanded="true" height="60" name="Crawl Web" width="90" x="45" y="345">
<parameter key="url" value="http://www.cnn.com"/>
<list key="crawling_rules"/>
<parameter key="write_pages_into_files" value="false"/>
<parameter key="add_pages_as_attribute" value="true"/>
<parameter key="max_pages" value="10"/>
<parameter key="domain" value="server"/>
</operator>
<operator activated="true" class="nominal_to_text" compatibility="5.2.008" expanded="true" height="76" name="Nominal to Text" width="90" x="179" y="345"/>
<operator activated="true" class="multiply" compatibility="5.2.008" expanded="true" height="94" name="Multiply" width="90" x="380" y="345"/>
<operator activated="true" class="text:process_document_from_data" compatibility="5.2.004" expanded="true" height="76" name="Process Documents from Data" width="90" x="514" y="255">
<parameter key="vector_creation" value="Term Occurrences"/>
<list key="specify_weights"/>
<process expanded="true" height="510" width="909">
<operator activated="true" class="text:tokenize" compatibility="5.2.004" expanded="true" height="60" name="Tokenize (2)" width="90" x="246" y="30"/>
<operator activated="true" class="text:generate_n_grams_terms" compatibility="5.2.004" expanded="true" height="60" name="Generate n-Grams (2)" width="90" x="447" y="30"/>
<connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
<connect from_op="Tokenize (2)" from_port="document" to_op="Generate n-Grams (2)" to_port="document"/>
<connect from_op="Generate n-Grams (2)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Crawl Web" from_port="Example Set" to_op="Nominal to Text" to_port="example set input"/>
<connect from_op="Nominal to Text" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Multiply" from_port="output 2" to_port="result 2"/>
<connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
[ /code]
Tagged:
0
Answers
I am not getting any errors on the latest development version. So the error should be fixed with the next release of RapidMiner and/or the text extension.
All the best,
Marius
is the development version available? I couldn't find it on sourceforge.
Thanks
Happy compiling!
~Marius