Options

"set a parameterWeb Crawler"

Juju147Juju147 Member Posts: 5 Contributor II
edited June 2019 in Help
Hi everyone,

I have question about the operator WebCrawler. Is it possible to use the result of a Read Excel to set some parameter into Web Crawler.

For example, is it possible to make a loop wich each adress on an excel file in one column and then make a web crawl on it ?

Sincerly,

Ju
Tagged:

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hey Ju,

    you can use the Loop Values operator to iterate the different values in an example set that contains e.g. the links. Then in the operators inside the loop you can access the current value with the iteration macro. By default it is called loop_value and can be accessed with the syntax %{loop_value} in any parameter of any of the operators.
    See the attached process for an example. In your case you would load the Excel sheet with the Read Excel operator instead of using Generate Data.

    Best regards,
    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.013">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_nominal_data" compatibility="5.3.013" expanded="true" height="60" name="Generate Nominal Data" width="90" x="112" y="30"/>
          <operator activated="true" class="loop_values" compatibility="5.3.013" expanded="true" height="60" name="Loop Values" width="90" x="246" y="30">
            <parameter key="attribute" value="att1"/>
            <process expanded="true">
              <operator activated="true" class="print_to_console" compatibility="5.3.013" expanded="true" height="76" name="Print to Console" width="90" x="246" y="30">
                <parameter key="log_value" value="Current Value: %{loop_value}"/>
              </operator>
              <connect from_port="example set" to_op="Print to Console" to_port="through 1"/>
              <connect from_op="Print to Console" from_port="through 1" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Loop Values" to_port="example set"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
Sign In or Register to comment.