The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

"need some help with loop operators"

kaymankayman Member Posts: 662 Unicorn
edited June 2019 in Help
Maybe it's my old school view on the topic, but I'm failing to get the loop concept understood / working.

In short this is what I want to do :

- I have a repository containing JSON snippets that I crawled from the web.
- I want to loop through this repository, and for every row I want to use the Json to XML operator so I can merge this with other data from the crawled pages .

My approach was to do it as follows ;

(1) Retrieve the dataset
(2) Loop examples
(3) for every example -> data to documents
(4) JSON to XML
(5) store XML
(6) Merge data and do some magic

But it only seem to work in theory, can someone advise me on the best way to tackle this ? If I can get (1) to (3) i can figure out the rest myself, but I'm stuck and it's driving me nuts...

Answers

  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering
    Hi,

    have a look at the process below. It iterates over a repository folder via "Loop Repository" (and only takes IOObjects following the specified naming pattern) and then a "Loop Examples" operator moves over each row of an example set. The content of a predefined attribute is then converted to a document. After that, you could do whatever you like with those documents.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.4.000-SNAPSHOT">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.4.000-SNAPSHOT" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="loop_repository" compatibility="6.4.000-SNAPSHOT" expanded="true" height="76" name="Loop Repository" width="90" x="45" y="30">
            <parameter key="repository_folder" value="//Samples/data/"/>
            <parameter key="entry_type" value="IOObject"/>
            <parameter key="filter" value="Deals.*"/>
            <process expanded="true">
              <operator activated="true" class="loop_examples" compatibility="6.4.000-SNAPSHOT" expanded="true" height="94" name="Loop Examples" width="90" x="45" y="30">
                <process expanded="true">
                  <operator activated="true" class="text:extract_document" compatibility="6.4.000-SNAPSHOT" expanded="true" height="76" name="Extract Document" width="90" x="45" y="30">
                    <parameter key="attribute_name" value="Age"/>
                    <parameter key="example_index" value="%{example}"/>
                  </operator>
                  <connect from_port="example set" to_op="Extract Document" to_port="example set"/>
                  <connect from_op="Extract Document" from_port="document" to_port="output 1"/>
                  <portSpacing port="source_example set" spacing="0"/>
                  <portSpacing port="sink_example set" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="repository object" to_op="Loop Examples" to_port="example set"/>
              <connect from_op="Loop Examples" from_port="output 1" to_port="out 1"/>
              <portSpacing port="source_repository object" spacing="0"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Loop Repository" from_port="out 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Regards,
    Marco
  • Options
    kaymankayman Member Posts: 662 Unicorn
    Thanks Marco,

    I simply overlooked the fact that the loop operator allows you to loop, but doesn't do anything unless you tell it to (or something like that...).
    Once I realized that it was fairly straightforward  :)
Sign In or Register to comment.