RAPIDMINER 9.7 BETA ANNOUNCEMENT

The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!

CLICK HERE TO DOWNLOAD

Loop data sets and dynamically generated file path

Serek91Serek91 Member Posts: 22 Contributor II
edited August 2019 in Help
Hi,

I have subprocess with Write CSV operator. It is multiplied ~70 times. Output file has path like "{category_id}/{set_id}/filename.csv" So I want to have it dynamically generated. Can I create it somehow? Like putting to the subprocess two custom variables and then using it in filepath?

EDIT:
I'm using Loop Datasets operator. But after each iteration I have to somehow obtain index of current iteration and generate filepath...



Process added as attachment.

Tghadially

Best Answer

Answers

  • kaymankayman Member Posts: 464   Unicorn
    Seems like you need to use a nested loop values operator.
    The first one you use to loop through the category_id's, then you loop through the set_id's, and then you do your logic. You can then save it using the stored macro values for both category and set id. As in attached simplified example

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.3.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="34">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="category_id,set_id,something&#10;1,1,x&#10;1,1,y&#10;1,2,z&#10;2,1,a&#10;2,2,b&#10;2,2,c&#10;"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="true"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="concurrency:loop_values" compatibility="9.3.001" expanded="true" height="82" name="Loop Values" width="90" x="246" y="34">
            <parameter key="attribute" value="category_id"/>
            <parameter key="iteration_macro" value="cid"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="false"/>
            <process expanded="true">
              <operator activated="true" class="filter_examples" compatibility="9.3.001" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="34">
                <parameter key="parameter_expression" value=""/>
                <parameter key="condition_class" value="custom_filters"/>
                <parameter key="invert_filter" value="false"/>
                <list key="filters_list">
                  <parameter key="filters_entry_key" value="category_id.equals.%{cid}"/>
                </list>
                <parameter key="filters_logic_and" value="true"/>
                <parameter key="filters_check_metadata" value="true"/>
              </operator>
              <operator activated="true" class="concurrency:loop_values" compatibility="9.3.001" expanded="true" height="82" name="Loop Values (2)" width="90" x="179" y="34">
                <parameter key="attribute" value="set_id"/>
                <parameter key="iteration_macro" value="sid"/>
                <parameter key="reuse_results" value="false"/>
                <parameter key="enable_parallel_execution" value="false"/>
                <process expanded="true">
                  <operator activated="true" class="filter_examples" compatibility="9.3.001" expanded="true" height="103" name="Filter Examples (2)" width="90" x="45" y="34">
                    <parameter key="parameter_expression" value=""/>
                    <parameter key="condition_class" value="custom_filters"/>
                    <parameter key="invert_filter" value="false"/>
                    <list key="filters_list">
                      <parameter key="filters_entry_key" value="set_id.equals.%{sid}"/>
                    </list>
                    <parameter key="filters_logic_and" value="true"/>
                    <parameter key="filters_check_metadata" value="true"/>
                  </operator>
                  <operator activated="true" breakpoints="before" class="write_csv" compatibility="9.3.001" expanded="true" height="82" name="Write CSV" width="90" x="179" y="34">
                    <parameter key="csv_file" value="mypath/%{cid}/%{sid}/filename.csv"/>
                    <parameter key="column_separator" value=";"/>
                    <parameter key="write_attribute_names" value="true"/>
                    <parameter key="quote_nominal_values" value="true"/>
                    <parameter key="format_date_attributes" value="true"/>
                    <parameter key="append_to_file" value="false"/>
                    <parameter key="encoding" value="UTF-8"/>
                  </operator>
                  <connect from_port="input 1" to_op="Filter Examples (2)" to_port="example set input"/>
                  <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Write CSV" to_port="input"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Loop Values (2)" to_port="input 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Loop Values" to_port="input 1"/>
          <connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


    Tghadially
  • Serek91Serek91 Member Posts: 22 Contributor II
    edited August 2019
    Hi, I modified my previous post.

    Custom values added to the path of csv file are not abtained from example set. It is just an index. I mean something like:


    index = 0;
    exampleSets = [A, B, C, D];
    foreach (exampleSets as exampleSet) {
       ++index;
        path = index . '/example.csv';
    }
    Tghadially
  • Serek91Serek91 Member Posts: 22 Contributor II
    edited August 2019
    Thanks! One last ask, can you check my process now (and sorry for polish descriptions above operators)? I hope that now it is ok...




  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Administrator, Moderator, Employee, Member, University Professor Posts: 1,928   RM Engineering
    Hi,

    If you put the input CSV files into one folder, you could use Loop Files and use a single Read CSV instead of multiple, but other than that, the macro thing looks fine.

    Regards,
    Marco
    Tghadially
Sign In or Register to comment.