[SOLVED] Write results in different files automatically?

T-UnitT-Unit Member Posts: 12 Contributor II
edited November 2018 in Help
Hi everyone,

i'm doing some clustering and want to bring up some cluster-models (k-medeoids) using different model-parameters (number of clusters, max runs, max optimization steps, ...). Doing so i used the "optimize"-Operator to generate several cluster-models using different parameters. I need the clustered data for further analytics (doesn't matter if the used parameter combinations are perfect or not) so i use the "write excel"-operator to extract the generated data into an excel sheet. But doing so i only get the clustered data of the first run (eg. when k was 2) into the final excel file. In the "optimize"-operator i tell the process it should change (for example) the number of clusters from k= 2 to 20.

My Question:
Is is it possible to change the name of the Output-File automatically during the process is running?

I mean it this way:
choose k=2 --> do the clustering --> save the results to file named "results_k_2.xls"
choose k=3 --> do the clustering --> save the results to file named "results_k_3.xls"
choose k= 20 --> do the clustering --> save the results to file named "results_k_20.xls"

Thanks for help.



  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn

    You can use macros to do this. If you have a macro containing k then you could create another from it containing the filename you want and use that as the parameter to the write excel operator.


  • T-UnitT-Unit Member Posts: 12 Contributor II
    Hello Andrew,

    first of all thanks for your fast reply.

    Your idea sounds logical to me but - to be honest - i don't have any glue how to work with macros in rapidminer. Neither I know how and where to define them nor how to use them in the process. Maybe you can give a recommendation to a website where working with marcos in rapidminer is (detailed) explained? Your blog from september 15th gives a short look on what the macro can be used for but i can't implement this to my process.

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
  • SkirzynskiSkirzynski Member Posts: 164 Maven
    Macros are some kind of named variables you can set and use everywhere in the process. To set a macro there are two ways:
    • In the context tab of your process
    • With the macro operators in Utility/Macros (see the help tab for usage)
    To use a macro you write %{name_of_the_macro}, e.g. results_k_%{k}.xls. Don't forget to define a macro k with the macro operator before you use it.
  • T-UnitT-Unit Member Posts: 12 Contributor II
    Hello Marcin,

    i implemented - using the "Set macro"-Operator - a macro called "k". How can i give this Parameter the value of the actual count of clusters of "Cluster"-Operator (the count of clusters is set by the "optimize Parameter"-Operator and changes from 2 to 20)? I tried "operator.Clustering.parameter.k" but this didn't work properly. Instead of different files of the kind "results_k_2.xls", "results_k_3.xls", ... i got only one file named "results_k_operator.Clustering.parameter.k.xls". Maybe it's impossible to direct access to the value of a models parameters?

  • SkirzynskiSkirzynski Member Posts: 164 Maven
    I thought that there is a predefined macro for this but i was wrong. So unfortunately there is no easy way to do this, but a hack. You can log the parameter of an operator, transform it to an example set and extract a macro from the last example (-1) from this example set. Here is an example process.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.009">
      <operator activated="true" class="process" compatibility="5.2.009" expanded="true" name="Process">
        <process expanded="true" height="520" width="643">
          <operator activated="true" class="retrieve" compatibility="5.2.009" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          <operator activated="true" class="optimize_parameters_grid" compatibility="5.2.009" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="246" y="75">
            <list key="parameters">
              <parameter key="Clustering.k" value="[2.0;20;19;linear]"/>
            <process expanded="true" height="538" width="643">
              <operator activated="true" class="k_means" compatibility="5.2.009" expanded="true" height="76" name="Clustering" width="90" x="45" y="30">
                <parameter key="k" value="20"/>
              <operator activated="true" class="apply_model" compatibility="5.2.009" expanded="true" height="76" name="Apply Model" width="90" x="179" y="30">
                <list key="application_parameters"/>
              <operator activated="true" class="log" compatibility="5.2.009" expanded="true" height="76" name="Log" width="90" x="45" y="210">
                <list key="log">
                  <parameter key="k" value="operator.Clustering.parameter.k"/>
              <operator activated="true" class="log_to_data" compatibility="5.2.009" expanded="true" height="94" name="Log to Data" width="90" x="179" y="210"/>
              <operator activated="true" class="write_csv" compatibility="5.2.009" expanded="true" height="76" name="Write CSV" width="90" x="380" y="300">
                <parameter key="csv_file" value="/home/marcin/temp/result_k_%{k}.csv"/>
              <operator activated="true" class="extract_macro" compatibility="5.2.009" expanded="true" height="60" name="Extract Macro" width="90" x="313" y="165">
                <parameter key="macro" value="k"/>
                <parameter key="macro_type" value="data_value"/>
                <parameter key="attribute_name" value="k"/>
                <parameter key="example_index" value="-1"/>
                <list key="additional_macros"/>
              <operator activated="true" class="performance" compatibility="5.2.009" expanded="true" height="76" name="Performance" width="90" x="514" y="300"/>
              <connect from_port="input 1" to_op="Clustering" to_port="example set"/>
              <connect from_op="Clustering" from_port="cluster model" to_op="Apply Model" to_port="model"/>
              <connect from_op="Clustering" from_port="clustered set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Log" to_port="through 1"/>
              <connect from_op="Log" from_port="through 1" to_op="Log to Data" to_port="through 1"/>
              <connect from_op="Log to Data" from_port="exampleSet" to_op="Extract Macro" to_port="example set"/>
              <connect from_op="Log to Data" from_port="through 1" to_op="Write CSV" to_port="input"/>
              <connect from_op="Write CSV" from_port="through" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
          <connect from_op="Retrieve" from_port="output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>

    Please note that the "Extract Macro" operator has to be executed before you use the macro (click on the blue double-arrow with the question mark to check and alter the execution order).
  • T-UnitT-Unit Member Posts: 12 Contributor II
    Thanks for your fast reply and your suggestion, Marcin!

    I crawled around the forum an found a thread that helped me to solve my problem (the hint of Sebastian Land is it):

    So here is my adaption:
    I put a "Clone Parameters"-operator after the cluster-operator. The clone-operator is connected to the "set macro"-operator. In the "Clone Parameters"-operator i filled in the following:
    source: Clustering.k
    target: Set Macro.value

    So the changing value of k is copied to the value for the macro and the macro is later used to generate the different filenames (results_k_2.xls, results_k_3.xls, and so on).

    The solution is kinda simple but I assure, that i would have  never solve this problem by myself (or even expect that the "clone parameters"-operator would do it). Hope this will help other users with the same problem.

    Regards and thanks to all who tried to help me,
Sign In or Register to comment.