RAPIDMINER 9.7 BETA ANNOUNCEMENT

The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!

CLICK HERE TO DOWNLOAD

Filename of output according to input filename

MuehliManMuehliMan Member Posts: 85  Guru
edited November 2018 in Help
Hello again,

Within my workflows I often encounter the problem that i want to preprocess my files and then write them to another output file.

Is there the possiblity to take the input filename from the Read module (for example data_jun_2010.xls) and automatically generate the output filename for the write module at the end of te workflow (for example data_jun_2010_out.cls)?

Best regards,
Markus

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,529   Unicorn
    Hi,
    you might use Macros for this: Store the file name into a macro, use %{macro_name}.xls for reading and  %{macro_name}_out.xls for putting it out again.
    With Macro Construction you can even modify your macros in a more sophisticated way, for example cutting the .xls and appending it again...

    Greetings,
      Sebastian
  • MuehliManMuehliMan Member Posts: 85  Guru
    Thank you for your answer.

    Unfortunately I could not find an example process where the usage of a macro like this is described. Could you give me an example workflow doing something like this:

    Read something and Write it as [date]_originalname.csv

    Thanks a lot in advance,
    Markus

  • haddockhaddock Member Posts: 849  Guru
    Hi there,

    Seb beat me to it again, drats! This logs your files..
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.0" expanded="true" name="Process">
        <process expanded="true" height="632" width="1044">
          <operator activated="true" class="set_macro" compatibility="5.0.8" expanded="true" height="76" name="Set Macro" width="90" x="31" y="138">
            <parameter key="macro" value="prefix"/>
            <parameter key="value" value="XXXX"/>
          </operator>
          <operator activated="true" class="loop_files" compatibility="5.0.0" expanded="true" height="76" name="Loop Files" width="90" x="246" y="120">
            <parameter key="directory" value="C:\Documents and Settings\Alien\My Documents\rm_workspace"/>
            <parameter key="filter" value=".*"/>
            <parameter key="iterate_over_subdirs" value="true"/>
            <process expanded="true" height="296" width="705">
              <operator activated="true" class="set_macro" compatibility="5.0.8" expanded="true" height="76" name="Path+Pre+File" width="90" x="179" y="30">
                <parameter key="macro" value="nu"/>
                <parameter key="value" value="%{parent_path}\%{prefix}\%{file_name}"/>
              </operator>
              <operator activated="true" class="provide_macro_as_log_value" compatibility="5.0.8" expanded="true" height="76" name="Provide Macro as Log Value (2)" width="90" x="447" y="30">
                <parameter key="macro_name" value="nu"/>
              </operator>
              <operator activated="true" class="log" compatibility="5.0.8" expanded="true" height="76" name="Log" width="90" x="585" y="30">
                <list key="log">
                  <parameter key="name" value="operator.Path+Pre+File.value.macro_value"/>
                </list>
              </operator>
              <connect from_port="in 1" to_op="Path+Pre+File" to_port="through 1"/>
              <connect from_op="Path+Pre+File" from_port="through 1" to_op="Provide Macro as Log Value (2)" to_port="through 1"/>
              <connect from_op="Provide Macro as Log Value (2)" from_port="through 1" to_op="Log" to_port="through 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Set Macro" from_port="through 1" to_op="Loop Files" to_port="in 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
  • MuehliManMuehliMan Member Posts: 85  Guru
    Thank you haddock!

    I read the conversation between Ingo and you in the last thread and I can say that I tried to find a solution myself searching through the forum and going through the tutorial files.
    I really appreciate your help.

    Markus
  • haddockhaddock Member Posts: 849  Guru
    Hola Markus,

    Cool, I realised that an answer was not available, and was glad to help - don't forget I learnt something too!

    Happy mining!

  • MuehliManMuehliMan Member Posts: 85  Guru
    Is there another way of getting filename and filepath into a variable than loop files?
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,745  RM Founder
    Hmm, you could first define the corresponding macros with the Set Macro(s) operator(s) and use those macros for both the input files and the output files. This of course is only practical if you have only a single file (or only a few).

    Cheers,
    Ingo
  • MuehliManMuehliMan Member Posts: 85  Guru
    In fact it is only a smaller number of files, but I just want to change one macro and then automatically change the names for output, weights, log and model. If I want to apply the macro on the input operator, I would need to have my input files named in a systematic way, which should be possible.

    Is there a way way to export the variable for path and filename out of the loop?

    Markus
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,745  RM Founder
    Hi,

    Is there a way way to export the variable for path and filename out of the loop?
    maybe I got you wrong but: which one should be exported? It would be a different one in each iteration...?

    However, if you mean if it possible to collect all paths and filenames within a loop and make those available to the process outside of the loop: yes, that's possible by logging the macros and creating a data table from the process log table after the loop. But that's probably not what you are after since you would get the same result with a simple "dir" / "ls" command  ;)

    Cheers,
    Ingo
  • haddockhaddock Member Posts: 849  Guru
    Evening All,

    I used to do 1 to 20 day forecasts on 1000+ things, and used parameter looping as the way to impose order on those pesky files ( parameters mainly ); I've wrapped my previous offering in a parameter iteration to show the point...
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
        <process expanded="true" height="-20" width="-50">
          <operator activated="true" class="loop_parameters" compatibility="5.0.8" expanded="true" height="76" name="Loop Parameters" width="90" x="126" y="39">
            <list key="parameters">
              <parameter key="Set Macro.value" value="XXXX,YYYY"/>
            </list>
            <process expanded="true" height="300" width="891">
              <operator activated="true" class="set_macro" compatibility="5.0.8" expanded="true" height="76" name="Set Macro" width="90" x="112" y="75">
                <parameter key="macro" value="prefix"/>
                <parameter key="value" value="YYYY"/>
              </operator>
              <operator activated="true" class="loop_files" compatibility="5.0.8" expanded="true" height="76" name="Loop Files" width="90" x="313" y="75">
                <parameter key="directory" value="C:\Documents and Settings\Alien\My Documents\rm_workspace"/>
                <parameter key="filter" value=".*"/>
                <parameter key="iterate_over_subdirs" value="true"/>
                <process expanded="true" height="300" width="891">
                  <operator activated="true" class="set_macro" compatibility="5.0.8" expanded="true" height="76" name="Path+Pre+File" width="90" x="112" y="30">
                    <parameter key="macro" value="nu"/>
                    <parameter key="value" value="%{parent_path}\%{prefix}\%{file_name}"/>
                  </operator>
                  <operator activated="true" class="provide_macro_as_log_value" compatibility="5.0.8" expanded="true" height="76" name="Provide Macro as Log Value (2)" width="90" x="313" y="30">
                    <parameter key="macro_name" value="nu"/>
                  </operator>
                  <operator activated="true" class="log" compatibility="5.0.8" expanded="true" height="76" name="Log" width="90" x="514" y="30">
                    <list key="log">
                      <parameter key="name" value="operator.Path+Pre+File.value.macro_value"/>
                    </list>
                  </operator>
                  <connect from_port="in 1" to_op="Path+Pre+File" to_port="through 1"/>
                  <connect from_op="Path+Pre+File" from_port="through 1" to_op="Provide Macro as Log Value (2)" to_port="through 1"/>
                  <connect from_op="Provide Macro as Log Value (2)" from_port="through 1" to_op="Log" to_port="through 1"/>
                  <portSpacing port="source_in 1" spacing="0"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Set Macro" to_port="through 1"/>
              <connect from_op="Set Macro" from_port="through 1" to_op="Loop Files" to_port="in 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Loop Parameters" from_port="result 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Any use?


Sign In or Register to comment.