Rapid miner does not display any errors still fails

Kausty88Kausty88 Member Posts: 2 Contributor I
edited June 19 in Help
Hi All,

Though I am new here, I have used out web scrapping software with ease, reason being, I am able to pin point any issue using the logs. Here I am finding it very difficult to get it through. I referred to the video to do a simple scrapping from the site:

http://www.altusinsite.com/index_en.php?page=searchengine&;attri_40_1641=4230&attri_20_11%5B%5D=920&attri_20_11%5B%5D=921&attri_20_11%5B%5D=922&location=Greater+Vancouver+%2F+Downtown+Vancouver&UpdateCompany2=&format=&contact=&attri_40_1740_1=1&attri_40_1740_2=100%2C000&searchbasicbtn1=Find+Space

I hope its a simple web crawler, but I am not getting any error message as well. Can anyone please help me with that? I also dont see the green dot below glowing.

Here is my xml code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <parameter key="logverbosity" value="all"/>
    <parameter key="logfile" value="D:\Rapidminer\Scrape.txt"/>
    <parameter key="parallelize_main_process" value="true"/>
    <process expanded="true" height="145" width="145">
      <operator activated="true" class="web:crawl_web" compatibility="5.2.003" expanded="true" height="60" name="Crawl Web" width="90" x="45" y="75">
        <parameter key="url" value="http://www.altusinsite.com/index_en.php?page=searchengine&amp;amp;attri_40_1641=4230&amp;amp;attri_20_11[]=920&amp;amp;attri_20_11[]=921&amp;amp;attri_20_11[]=922&amp;amp;location=Greater+Vancouver+/+Downtown+Vancouver&amp;amp;UpdateCompany2=&amp;amp;format=&amp;amp;contact=&amp;amp;attri_40_1740_1=1&amp;amp;attri_40_1740_2=100,000&amp;amp;searchbasicbtn1=Find+Space"/>
        <list key="crawling_rules">
          <parameter key="store_with_matching_url" value=".+suiteid.+"/>
          <parameter key="follow_link_with_matching_url" value=".+pagenum.+|.+suiteid.+"/>
        </list>
        <parameter key="output_dir" value="D:\Rapidminer"/>
        <parameter key="extension" value="html"/>
        <parameter key="max_depth" value="1"/>
        <parameter key="delay" value="100"/>
        <parameter key="user_agent" value="Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20100101 Firefox/17.0"/>
        <parameter key="really_ignore_exclusion" value="true"/>
      </operator>
      <connect from_op="Crawl Web" from_port="Example Set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Another issue with JAVA:

Dec 26, 2012 1:07:05 PM WARNING: Operator recommendations unavailable: Failed to access the WSDL at: http://recommender.rapid-i.com:80/OperatorRecommenderService/RecommenderService?wsdl. It failed with:
Network is unreachable: connect.
Tagged:
Sign In or Register to comment.