"[SOLVED] problem merging current Loop Values iteration with rss feed results"

jim21gmsjim21gms Member Posts: 2 Contributor I
edited June 2019 in Help
Hi,
I’m trying to build a very simple process to read an rss url from a database,  loop through them and write the result to a table.  That part works fine. However  in addition to the rss feed results I also need to save the original rss url I used to retrieve the feed along with the Published, Author, Title, Link ect. data from the rss feed.  I’ve tried to use the extract macro operator hoping that would give me the value of the current iteration but instead it contains all the rss_url values from the database read and not just the current iteration as I expected.  Additionally even if I had the current rss_url iteration how would I merge that and the rss feed results ?

Thanks in advance for you help.  Process thus far is below

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.017">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
   <process expanded="true" height="522" width="747">
     <operator activated="true" class="read_database" compatibility="5.1.017" expanded="true" height="60" name="Read Database" width="90" x="45" y="30">
       <parameter key="connection" value="localDB"/>
       <parameter key="query" value="SELECT rss_url&#10;FROM `rss_url`"/>
       <enumeration key="parameters"/>
     </operator>
     <operator activated="true" class="loop_values" compatibility="5.1.017" expanded="true" height="112" name="Loop Values" width="90" x="380" y="30">
       <parameter key="attribute" value="rss_url"/>
       <process expanded="true" height="540" width="765">
         <operator activated="true" class="web:read_rss" compatibility="5.1.004" expanded="true" height="60" name="Read RSS Feed" width="90" x="112" y="75">
           <parameter key="url" value="%{loop_value}"/>
         </operator>
         <operator activated="true" breakpoints="before" class="write_database" compatibility="5.1.017" expanded="true" height="60" name="Write Database" width="90" x="298" y="76">
           <parameter key="connection" value="localDB"/>
           <parameter key="table_name" value="rss_output"/>
           <parameter key="overwrite_mode" value="overwrite first, append then"/>
         </operator>
         <operator activated="true" breakpoints="after" class="extract_macro" compatibility="5.1.017" expanded="true" height="60" name="Extract Macro" width="90" x="112" y="210">
           <parameter key="macro" value="rss_url"/>
           <parameter key="macro_type" value="data_value"/>
           <parameter key="attribute_name" value="rss_url"/>
           <parameter key="example_index" value="%{a}"/>
         </operator>
         <connect from_port="example set" to_op="Extract Macro" to_port="example set"/>
         <connect from_op="Read RSS Feed" from_port="output" to_op="Write Database" to_port="input"/>
         <connect from_op="Write Database" from_port="through" to_port="out 2"/>
         <connect from_op="Extract Macro" from_port="example set" to_port="out 3"/>
         <portSpacing port="source_example set" spacing="216"/>
         <portSpacing port="sink_out 1" spacing="0"/>
         <portSpacing port="sink_out 2" spacing="0"/>
         <portSpacing port="sink_out 3" spacing="0"/>
         <portSpacing port="sink_out 4" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Read Database" from_port="output" to_op="Loop Values" to_port="example set"/>
     <connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    the extract macro operator sets the value of a macro, but it does not modify the example set. Try sth. like the process below. Maybe it needs some adaption, I didn't test it since I don't have your data. The join operator should join the original table with the rss-contents based on the url of the feed. You have to reenable the Write Database operator and add it at a suitable place in the process.

    All the best,
    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <process expanded="true" height="522" width="748">
          <operator activated="true" class="read_database" compatibility="5.2.003" expanded="true" height="60" name="Read Database" width="90" x="45" y="210">
            <parameter key="connection" value="localDB"/>
            <parameter key="query" value="SELECT rss_url&#10;FROM `rss_url`"/>
            <enumeration key="parameters"/>
          </operator>
          <operator activated="false" breakpoints="before" class="write_database" compatibility="5.2.003" expanded="true" height="60" name="Write Database" width="90" x="581" y="210">
            <parameter key="connection" value="localDB"/>
            <parameter key="table_name" value="rss_output"/>
            <parameter key="overwrite_mode" value="overwrite first, append then"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.2.003" expanded="true" height="94" name="Multiply" width="90" x="179" y="30"/>
          <operator activated="true" class="loop_values" compatibility="5.2.003" expanded="true" height="94" name="Loop Values" width="90" x="313" y="120">
            <parameter key="attribute" value="rss_url"/>
            <process expanded="true" height="540" width="765">
              <operator activated="true" class="web:read_rss" compatibility="5.1.004" expanded="true" height="60" name="Read RSS Feed" width="90" x="45" y="30">
                <parameter key="url" value="%{loop_value}"/>
              </operator>
              <connect from_op="Read RSS Feed" from_port="output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="54"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="5.2.003" expanded="true" height="76" name="Append" width="90" x="447" y="120"/>
          <operator activated="true" class="join" compatibility="5.2.003" expanded="true" height="76" name="Join" width="90" x="581" y="30">
            <parameter key="join_type" value="left"/>
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="rss_url" value="Link"/>
            </list>
          </operator>
          <connect from_op="Read Database" from_port="output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Join" to_port="left"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • jim21gmsjim21gms Member Posts: 2 Contributor I
    Marius,
    works perfect when I join on the ids.  Thanks for the quick turnaround and great solution.

    Best regards,
    Jim.

Sign In or Register to comment.