Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Error in Web Crawl: The example set must contain at least one text attribute

turicumturicum Member Posts: 15 Contributor II
edited August 2019 in Help
Hi everybody.

in the following, simple, example, where I process a few pages retrieved by a web crawl, I keep seeing the error "The example set must contain at least one text attribute". The "Nominal to Text", which is suggested as a solution this very same problem in other posts, does not remove the error. Interestingly, there seems to be indeed a 'text' attribute  in th example set and the error does not seem to affect the execution of the process. What am I missing?

Thanks!
Marco

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
   <process expanded="true" height="510" width="909">
     <operator activated="true" class="web:crawl_web" compatibility="5.2.003" expanded="true" height="60" name="Crawl Web" width="90" x="45" y="345">
       <parameter key="url" value="http://www.cnn.com"/>
       <list key="crawling_rules"/>
       <parameter key="write_pages_into_files" value="false"/>
       <parameter key="add_pages_as_attribute" value="true"/>
       <parameter key="max_pages" value="10"/>
       <parameter key="domain" value="server"/>
     </operator>
     <operator activated="true" class="nominal_to_text" compatibility="5.2.008" expanded="true" height="76" name="Nominal to Text" width="90" x="179" y="345"/>
     <operator activated="true" class="multiply" compatibility="5.2.008" expanded="true" height="94" name="Multiply" width="90" x="380" y="345"/>
     <operator activated="true" class="text:process_document_from_data" compatibility="5.2.004" expanded="true" height="76" name="Process Documents from Data" width="90" x="514" y="255">
       <parameter key="vector_creation" value="Term Occurrences"/>
       <list key="specify_weights"/>
       <process expanded="true" height="510" width="909">
         <operator activated="true" class="text:tokenize" compatibility="5.2.004" expanded="true" height="60" name="Tokenize (2)" width="90" x="246" y="30"/>
         <operator activated="true" class="text:generate_n_grams_terms" compatibility="5.2.004" expanded="true" height="60" name="Generate n-Grams (2)" width="90" x="447" y="30"/>
         <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
         <connect from_op="Tokenize (2)" from_port="document" to_op="Generate n-Grams (2)" to_port="document"/>
         <connect from_op="Generate n-Grams (2)" from_port="document" to_port="document 1"/>
         <portSpacing port="source_document" spacing="0"/>
         <portSpacing port="sink_document 1" spacing="0"/>
         <portSpacing port="sink_document 2" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Crawl Web" from_port="Example Set" to_op="Nominal to Text" to_port="example set input"/>
     <connect from_op="Nominal to Text" from_port="example set output" to_op="Multiply" to_port="input"/>
     <connect from_op="Multiply" from_port="output 1" to_op="Process Documents from Data" to_port="example set"/>
     <connect from_op="Multiply" from_port="output 2" to_port="result 2"/>
     <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>
   </process>
 </operator>
</process>
[ /code]

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    I am not getting any errors on the latest development version. So the error should  be fixed with the next release of RapidMiner and/or the text extension.

    All the best,
    Marius
  • turicumturicum Member Posts: 15 Contributor II
    Hi Marius

    is the development version available? I couldn't find it on sourceforge.

    Thanks
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hey, what I call development version is pulled and compiled directly from svn. The source code is available on sourceforge, and our website provides a manual on how to compile RapidMiner from source.

    Happy compiling!
    ~Marius
Sign In or Register to comment.