Options

"Loop Files over Example Sets using Retrieve"

dragoljubdragoljub Member Posts: 241 Contributor II
edited May 2019 in Help
Hi Guys,

I'm running into an annoying issue. The retrieve operator seems to be the culprit. It only takes relative paths to the repositories. EX: //Repository/ExampleSet and not C:/Repository/ExampleSet.ioo.

Currently I have 25 Example sets I want to append by looping through the .ioo files but the retrieve operator will not take an actual path name. Worse yet, the %{file_name} macro will return the name with the .ioo extension so that is also no help.

How can I specify an absolute path in the retrieve operator (this should be a check box option in the retrieve operator)?

or

How can I strip the extension from my macro to trick the retrieve operators?

Thanks,
-Gagi

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    actually you can't the retrieve operator is designed for only work on Repositories. With RapidAnalytics in Background you actually don't have any files you can access...

    So what you can do is use the read operator to read ioo files from disc.

    Greetings,
      Sebastian
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    Thanks Sebastian,

    I currently have my  example sets stored as ioo files. (Binary Format I Believe) I was excited when I read your response, however the read operator seems to be failing.

    Process failed: Could not read file 'C:\ExampleSet.ioo': java.io.IOException: Cannot read from XML stream, wrong format:  : only whitespace content allowed before start tag and not * (position: START_DOCUMENT seen *... @1:1) .

    I tried various ypes but everything gave me the same error. Can this operator actually read example sets saved to the repository?

    -Gagi
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    sorry, but example sets stored in the internal repository just can be read if they are still accessed using the internal repository. We just checked that.

    I guess the problem preventing you from doing so is, that you can't loop over entries of your repository?

    Greetings,
      Sebastian
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    Right,

    I have say 25 ~300MB repository example sets created by concatenating many csv files. I would like to process these entries by a loop rather than by selecting each one by one. Ideally, I could loop through a folder in the repository and select the NAME of the example set without the '.ioo' extension so I can trick the load operator to read the data. No luck so far.  :-\

    I'll try to strip the extension by extracting macros to text...

    -Gagi
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Gagi,
    that's a possibility until we add a loop repository operator:

    Loop over the files, use the generate Macro operator to cut away the .ioo using substring or replace and use this macro to access the repository.

    And: Never forget to post this cool process son MyExperiment :)

    Greetings,
      Sebastian
  • Options
    TKTK Member Posts: 14 Contributor II
    Is there any documentation how to write macros? Esp. about the syntax..is it java.code or anything else? I got the same "Retrieve and loop"- Problem and don´t know how to replace the substring .ioo inside the "Generate Macro".

    Thanks!
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    if you have define a macro named "something", then you can write
    %{something}
    anywhere in parameters and it will be replaced by the value of the macro if it's defined at the point in time when the parameter is read.

    Greetings,
      Sebastian
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    Here is how I performed this using a RM Flow. I also posted the flow on MYExperement.org. You can find it by searching 'loop repository'. Hope this helps someone. I'm sure the RM team will implement a loop repository operator soon.

    -Gagi  ;)

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
        <process expanded="true" height="430" width="1820">
          <operator activated="true" class="loop_files" compatibility="5.0.11" expanded="true" height="60" name="Loop Repository" width="90" x="45" y="30">
            <parameter key="directory" value="C:\Users\Repository"/>
            <parameter key="filter" value=".*\.ioo$"/>
            <process expanded="true" height="449" width="1962">
              <operator activated="true" class="provide_macro_as_log_value" compatibility="5.0.11" expanded="true" height="76" name="Get File Name" width="90" x="45" y="30">
                <parameter key="macro_name" value="file_name"/>
              </operator>
              <operator activated="true" class="log" compatibility="5.0.11" expanded="true" height="76" name="Log File Name" width="90" x="179" y="30">
                <list key="log">
                  <parameter key="name" value="operator.Get File Name.value.macro_value"/>
                </list>
              </operator>
              <operator activated="true" class="log_to_data" compatibility="5.0.11" expanded="true" height="94" name="Log to Data" width="90" x="313" y="30">
                <parameter key="log_name" value="Log File Name"/>
              </operator>
              <operator activated="true" class="replace" compatibility="5.0.11" expanded="true" height="76" name="Strip Extension" width="90" x="447" y="30">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="name"/>
                <parameter key="replace_what" value=".ioo"/>
              </operator>
              <operator activated="true" class="extract_macro" compatibility="5.0.11" expanded="true" height="60" name="Extract Macro" width="90" x="581" y="30">
                <parameter key="macro" value="name"/>
                <parameter key="macro_type" value="data_value"/>
                <parameter key="attribute_name" value="name"/>
                <parameter key="example_index" value="%{a}"/>
              </operator>
              <operator activated="true" class="retrieve" compatibility="5.0.11" expanded="true" height="60" name="Retrieve" width="90" x="45" y="210">
                <parameter key="repository_entry" value="Data/%{name}"/>
              </operator>
              <operator activated="true" class="extract_macro" compatibility="5.0.11" expanded="true" height="60" name="Attributes" width="90" x="179" y="210">
                <parameter key="macro" value="attributes"/>
                <parameter key="macro_type" value="number_of_attributes"/>
              </operator>
              <operator activated="true" class="extract_macro" compatibility="5.0.11" expanded="true" height="60" name="Examples" width="90" x="313" y="210">
                <parameter key="macro" value="examples"/>
                <parameter key="statistics" value="unknown"/>
              </operator>
              <operator activated="true" class="generate_macro" compatibility="5.0.11" expanded="true" height="76" name="Select" width="90" x="451" y="210">
                <list key="function_descriptions">
                  <parameter key="select" value="if (%{iteration}==1, 1, 2)"/>
                </list>
              </operator>
              <operator activated="true" class="provide_macro_as_log_value" compatibility="5.0.11" expanded="true" height="76" name="Provide Macro as Log Value" width="90" x="586" y="210">
                <parameter key="macro_name" value="select"/>
              </operator>
              <operator activated="true" class="select_subprocess" compatibility="5.0.11" expanded="true" height="76" name="Combine Data" width="90" x="715" y="210">
                <parameter key="select_which" value="%{select}"/>
                <process expanded="true" height="448" width="909">
                  <operator activated="true" class="remember" compatibility="5.0.11" expanded="true" height="60" name="Save Data Part" width="90" x="45" y="30">
                    <parameter key="name" value="Data"/>
                    <parameter key="io_object" value="ExampleSet"/>
                  </operator>
                  <connect from_port="input 1" to_op="Save Data Part" to_port="store"/>
                  <connect from_op="Save Data Part" from_port="stored" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
                <process expanded="true" height="448" width="909">
                  <operator activated="true" class="recall" compatibility="5.0.11" expanded="true" height="60" name="Load Previous Part" width="90" x="45" y="30">
                    <parameter key="name" value="Data"/>
                    <parameter key="io_object" value="ExampleSet"/>
                  </operator>
                  <operator activated="true" class="append" compatibility="5.0.11" expanded="true" height="94" name="Append" width="90" x="180" y="30"/>
                  <operator activated="true" class="remember" compatibility="5.0.11" expanded="true" height="60" name="Save Appended Data" width="90" x="313" y="30">
                    <parameter key="name" value="Data"/>
                    <parameter key="io_object" value="ExampleSet"/>
                  </operator>
                  <connect from_port="input 1" to_op="Append" to_port="example set 2"/>
                  <connect from_op="Load Previous Part" from_port="result" to_op="Append" to_port="example set 1"/>
                  <connect from_op="Append" from_port="merged set" to_op="Save Appended Data" to_port="store"/>
                  <connect from_op="Save Appended Data" from_port="stored" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
              </operator>
              <connect from_op="Get File Name" from_port="through 1" to_op="Log File Name" to_port="through 1"/>
              <connect from_op="Log File Name" from_port="through 1" to_op="Log to Data" to_port="through 1"/>
              <connect from_op="Log to Data" from_port="exampleSet" to_op="Strip Extension" to_port="example set input"/>
              <connect from_op="Strip Extension" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
              <connect from_op="Retrieve" from_port="output" to_op="Attributes" to_port="example set"/>
              <connect from_op="Attributes" from_port="example set" to_op="Examples" to_port="example set"/>
              <connect from_op="Examples" from_port="example set" to_op="Select" to_port="through 1"/>
              <connect from_op="Select" from_port="through 1" to_op="Provide Macro as Log Value" to_port="through 1"/>
              <connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Combine Data" to_port="input 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="recall" compatibility="5.0.11" expanded="true" height="60" name="Final Appended Data" width="90" x="1720" y="30">
            <parameter key="name" value="Data"/>
            <parameter key="io_object" value="ExampleSet"/>
          </operator>
          <connect from_op="Final Appended Data" from_port="result" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    you could post a feature request int he bugtracker for that.

    Greetings,
    Sebastian
Sign In or Register to comment.