RapidMiner

How to save macros in csv or excel format?

SOLVED
Contributor II

How to save macros in csv or excel format?

Hi experts,

I want to save performance values for every validation fold which is stored as macros. Need to save it in csv format for further analysis. Please help.

 

Thank you,

Archana

2 ACCEPTED SOLUTIONS

Accepted Solutions
Elite II
Solution
Accepted by topic author archu92
‎01-31-2017 01:09 PM

Re: How to save macros in csv or excel format?

Hi,

this is actually much easier:
With RapidMiner 7.3 you have a new port in the cross validation, called test results. If you put a "Performance to Data" operator on the test side and connect it's output to this port, you will get a table of all the performances on the outside "test" port. You can then simply write it as csv or excel. Here's a process that does it with the sonar dataset. I also add a Generate Attribute to identify the folds, using Martin's %{a} apply count macro suggestion:

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Retrieve Sonar" width="90" x="112" y="34">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="7.3.000" expanded="true" height="145" name="Cross Validation" width="90" x="246" y="34">
        <process expanded="true">
          <operator activated="true" class="naive_bayes" compatibility="7.3.000" expanded="true" height="82" name="Naive Bayes" width="90" x="45" y="34"/>
          <connect from_port="training set" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="7.3.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="7.3.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="performance_to_data" compatibility="7.3.000" expanded="true" height="82" name="Performance to Data" width="90" x="313" y="34"/>
          <operator activated="true" class="generate_attributes" compatibility="7.3.000" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34">
            <list key="function_descriptions">
              <parameter key="Fold" value="eval(%{a})"/>
            </list>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_op="Performance to Data" to_port="performance vector"/>
          <connect from_op="Performance to Data" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Performance to Data" from_port="performance vector" to_port="performance 1"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve Sonar" from_port="output" to_op="Cross Validation" to_port="example set"/>
      <connect from_op="Cross Validation" from_port="test result set" to_port="result 1"/>
      <connect from_op="Cross Validation" from_port="performance 1" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Greetings,

 Sebastian

 

Old World Computing - Establishing the Future

Professional consulting for your Data Science problems

RMStaff
Solution
Accepted by topic author archu92
‎01-31-2017 01:09 PM

Re: How to save macros in csv or excel format?

[ Edited ]

Hi @archu92, it is normal to have such doubt. It is not suggested to use error estimation from one single fold. “What a coincidence! 100 accuracy Smiley Wink” Usually we take average of MSE(mean squared error) or average of accuracy from 10 cross validated models. That is exactly what you will see in the results view for 'Performance' output of a cross-validation operator.

For example, you can have ouput for different performance criterion in the performance vector view,

accuracy: 66.9048% +/- 7.2695% (mikro: 66.8269%) shows average accuracy with its standard deviation

AUC (optimistic): 0.810101 +/- 0.078353 (mikro: 0.810101) (positive class: Mine) shows average Aera Under Roc Curve with its standard deviation

.....

 

An insightful post from Ingo Smiley Wink  can also help you understand why we need cross validation and how to interpret it  

also check out his latest “Learn the RIGHT Way to Validate Models” blog post series

ps. When we talk about model error, we should only care about the error on testing set (testing error), not the training error.

10 REPLIES
RMStaff

Re: How to save macros in csv or excel format?

Hi Archana,

 

i usually use Generate Data By User Specification and build myself the desired example set.

 

~Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Contributor II

Re: How to save macros in csv or excel format?

Hi Martin,

Previously you shared a process which stores all performance folds:

http://community.rapidminer.com/t5/RapidMiner-Studio/How-to-store-performance-metrics-from-each-10-f...

 

I need to save all performance fold results into csv or excel. Is there any way?

 

Thank you

PS: i have used write as text operator but it stores only last result not all.

RMStaff

Re: How to save macros in csv or excel format?

[ Edited ]

Hi Again,

 

have a look at performance to data. Just use performance to data and Write CSV instead of store. You can use %{a} to write different files. In case you want to do this on the stored performances: You can use Loop Repository here.

 

~Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Contributor II

Re: How to save macros in csv or excel format?

Hi,



have a look at performance to data. Just use performance to data and Write CSV instead of store. You can use %{a} to write different files.



I tried to save it in csv format after using performance to data operator, but it is empty. Where i need to define %{a}?

 


In case you want to do this on the stored performances: You can use Loop Repository here.

As i am new to RM, please help in how to integrate loop operator with stored performance.

Thank you

Elite II
Solution
Accepted by topic author archu92
‎01-31-2017 01:09 PM

Re: How to save macros in csv or excel format?

Hi,

this is actually much easier:
With RapidMiner 7.3 you have a new port in the cross validation, called test results. If you put a "Performance to Data" operator on the test side and connect it's output to this port, you will get a table of all the performances on the outside "test" port. You can then simply write it as csv or excel. Here's a process that does it with the sonar dataset. I also add a Generate Attribute to identify the folds, using Martin's %{a} apply count macro suggestion:

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Retrieve Sonar" width="90" x="112" y="34">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="7.3.000" expanded="true" height="145" name="Cross Validation" width="90" x="246" y="34">
        <process expanded="true">
          <operator activated="true" class="naive_bayes" compatibility="7.3.000" expanded="true" height="82" name="Naive Bayes" width="90" x="45" y="34"/>
          <connect from_port="training set" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="7.3.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="7.3.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="performance_to_data" compatibility="7.3.000" expanded="true" height="82" name="Performance to Data" width="90" x="313" y="34"/>
          <operator activated="true" class="generate_attributes" compatibility="7.3.000" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34">
            <list key="function_descriptions">
              <parameter key="Fold" value="eval(%{a})"/>
            </list>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_op="Performance to Data" to_port="performance vector"/>
          <connect from_op="Performance to Data" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Performance to Data" from_port="performance vector" to_port="performance 1"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve Sonar" from_port="output" to_op="Cross Validation" to_port="example set"/>
      <connect from_op="Cross Validation" from_port="test result set" to_port="result 1"/>
      <connect from_op="Cross Validation" from_port="performance 1" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Greetings,

 Sebastian

 

Old World Computing - Establishing the Future

Professional consulting for your Data Science problems

Contributor II

Re: How to save macros in csv or excel format?

Criterion Value Standard Deviation Variance Fold
accuracy .0     10.0

Hi Sebastian,

I run the process which you have shared, above table shows only one result. Not able to obtain all the 10-fold results.

 

Thank you.

 

Elite II

Re: How to save macros in csv or excel format?

Hi,

if I simply copy the xml into the XML panel of RapidMiner 7.3, press the green button to reload the xml as process, and execute it, it shows then lines as expected. Sorry, I simply cannot reproduce your problem and I assume, you simply changed something in the process before executing it. Did you perhaps insert a breakpoint somewhere?

 

Greetings, 

Sebastian

Old World Computing - Establishing the Future

Professional consulting for your Data Science problems

Elite III

Re: How to save macros in csv or excel format?

I confirmed that the example process that @land supplied works as intended and provides a table of performance output for each of the folds.  Make sure the "test" output is connected from the cross-validation operator and you should see it.  

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
Contributor II

Re: How to save macros in csv or excel format?

[ Edited ]

Hi,
Thanks a lot, yes i split the generate attribute operator to get results in excel instead connecting to test port of validation.
I have a doubt, should i consider 10th fold result for model error calculations?