The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

"[SOLVED] Use an excel file as source for reading RSS Feeds"

HSG_MinerHSG_Miner Member Posts: 3 Contributor I
edited June 2019 in Help
Hi there,

I started using Rapidminer 5 recently, and I want to Cluster the content of around 100 Blogs.

I set up the tool to cluster the content of one homepage. I generate the data with the function Read RSS Feed. My question is if there is a possibility to use an excel list with the url's of all the blog i want to store the data instead of typing every single url in the function Read RSS Feed.

Thanks a lot for your help,

HSG_Miner
Tagged:

Answers

  • Options
    ReneRene Member Posts: 24 Contributor II
    Try using macros, e.g.:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="false" class="read_excel" compatibility="5.3.008" expanded="true" height="60" name="Read Excel" width="90" x="45" y="120">
            <parameter key="excel_file" value="urls.xls"/>
            <parameter key="imported_cell_range" value="A1:A2"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <parameter key="locale" value="German"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="urls.true.attribute_value.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="read_csv" compatibility="5.3.008" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
            <parameter key="csv_file" value="http://pastebin.com/raw.php?i=dchJNtxe"/>
            <parameter key="trim_lines" value="true"/>
            <parameter key="use_quotes" value="false"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <parameter key="locale" value="German"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="urls.true.nominal.regular"/>
            </list>
          </operator>
          <operator activated="true" class="loop_values" compatibility="5.3.008" expanded="true" height="76" name="Loop Values" width="90" x="179" y="30">
            <parameter key="attribute" value="urls"/>
            <process expanded="true">
              <operator activated="true" class="web:read_rss" compatibility="5.3.001" expanded="true" height="60" name="Read RSS Feed" width="90" x="179" y="75">
                <parameter key="url" value="%{loop_value}"/>
                <parameter key="random_user_agent" value="true"/>
              </operator>
              <connect from_op="Read RSS Feed" from_port="output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="5.3.008" expanded="true" height="76" name="Append" width="90" x="313" y="30"/>
          <connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    HSG_MinerHSG_Miner Member Posts: 3 Contributor I
    Thx Rene for your help, it worked out perfectly!
Sign In or Register to comment.