The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

"Read CSV then read RSS feed using each row in the csv file"

montaqimontaqi Member Posts: 10 Contributor II
edited May 2019 in Help
I am currently working on a project that I want to read rss feeds from a list of rss urls. I built the following process, but somehow it has error and cannot go through. Please help me...as I think it should be a simple problem, but I just can't figure out somehow...

the csv file only contains five rows:
News
http://feeds.bbci.co.uk/news/rss.xml
http://feeds.bbci.co.uk/news/world/rss.xml
http://feeds.bbci.co.uk/news/uk/rss.xml
http://feeds.bbci.co.uk/news/business/rss.xml

My XML looks like below:
<process version="5.1.006">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
    <process expanded="true" height="449" width="614">
      <operator activated="true" class="read_csv" compatibility="5.1.006" expanded="true" height="60" name="Read CSV" width="90" x="45" y="120">
        <parameter key="csv_file" value="C:\Documents and Settings\TU001YU\Desktop\RSSLoop.csv"/>
        <parameter key="column_separators" value=","/>
        <parameter key="skip_comments" value="true"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="att1.true.polynominal.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="loop_values" compatibility="5.1.006" expanded="true" height="94" name="Loop Values" width="90" x="246" y="120">
        <parameter key="attribute" value="att1"/>
        <process expanded="true" height="524" width="806">
          <operator activated="true" class="web:read_rss" compatibility="5.1.000" expanded="true" height="60" name="Read RSS Feed" width="90" x="120" y="32">
            <parameter key="url" value="%{loop_value}"/>
          </operator>
          <connect from_op="Read RSS Feed" from_port="output" to_port="out 1"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
          <portSpacing port="sink_out 3" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="example set"/>
      <connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
      <connect from_op="Loop Values" from_port="out 2" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>
Tagged:

Answers

  • Options
    colocolo Member Posts: 236 Maven
    Hi montaqi,

    you have to make sure that the example set after "Read CSV" contains only valid URLs. In your case the title of the column (News) might be contained in the data. If you use the import wizard of the "Read CSV" operator you can set this as row title.
    But even after changing this, the process did not run for me either. I never used it before, but the "Read RSS Feed" operator does not seem to work. Even in a process with a single operator of this type the error message
    Jun 14, 2011 9:12:47 AM SEVERE: Process failed: Could not initialize class com.sun.syndication.feed.synd.SyndFeedImpl
    is generated.

    Regards
    Matthias
Sign In or Register to comment.