Options

"Text mining operators are not visible in RM5"

vyronadvyronad Member Posts: 12 Contributor II
edited June 2019 in Help
Hi All,

I installed RM 5 at home PC and also at work place. But installation at work place doesn't have entire text mining operators, where as home installation has all text mining operators. Do I need to make any changes in the application configurations to display?

Kindly advice me on this.

Thanks in advance and regards,
Veeranna Ronad.

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    do you have the Text Processing Extension installed on both installations? It can be added by using the integrated update mechanism in the help menu.

    Greetings,
      Sebastian
  • Options
    vyronadvyronad Member Posts: 12 Contributor II
    Thanks Sebastian.

    I have updated software from the help menu and now text and web crawl operators are visible.

    But when I wrote web crawl process, I did not get any result. The xml is given below. I want operator to search 'Sachin' text in the URL 'http://www.indiatimes.com'

    XML:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="100" width="145">
          <operator activated="true" class="web:crawl_web" expanded="true" height="60" name="Crawl Web" width="90" x="45" y="30">
            <parameter key="url" value="http://www.indiatimes.com/"/>
            <list key="crawling_rules">
              <parameter key="1" value="Sachin"/>
            </list>
            <parameter key="output_dir" value="C:\Documents and Settings\36533\Desktop\x.txt"/>
          </operator>
          <connect from_op="Crawl Web" from_port="Example Set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    probably the page obeys the crawling. This might be done in the robots.txt.

    Greetings,
      Sebastian
Sign In or Register to comment.