RapidMiner

Urgent: Cannot export/generate Report for Statistics from Results view

SOLVED
Regular Contributor

Urgent: Cannot export/generate Report for Statistics from Results view

[ Edited ]

Why is it so complicated to export basic summary statistics from Results view in rapidminer? I know there is a reporting extension but it exports everything else except Statistics. Because its an old extension that nobody bothered to update. The Statistics tab used to be called "Metadata View" in the much older versions of RapidMiner and that's what this extension tries to look for which it obviously it cannot find! Here is my process. What am I doing wrong? 

 

Basically I need the values from the html report from this process to pass to a neo4j database query (python program sitting in between) that will extract data based on certain numbers from the summary statistics for each level of some of the categorical attributes.

Attachments

6 REPLIES
Community Manager

Re: Export/Generate Report for Statistics from Results view

Can't you use an Aggregate operator and use those summary statistics? This way you can skip the report generator, which is old and outdated. 

Regards,
Thomas - Community Manager
LinkedIn: Thomas Ott
Elite III

Re: Export/Generate Report for Statistics from Results view

I don't believe there is any way to directly export figures or data from the Statistics view other than by printing or taking a screenshot.  As @Thomas_Ott suggests, the Aggregate operator will be your best option here, unless you need the graphs, in which case the Reporting extension is your only real option to have the process do the export automatically.

 

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
Regular Contributor

Re: Export/Generate Report for Statistics from Results view

Thank you Thomas but the aggregate operator does not give a count statistics (of missing values) for numeric attributes. This is understandable as we don't generate count stats for numeric variables but point estimates. However there is no other operator to achieve this other than running python or R scripts inside rapidminer. And within the Execute R operator, I found that unless you ended your code to print the results as.data.frame, RapidMiner does not understand what is being passed for output by that operator. I'm not an R guru but as I understand data frames contain data of one type only, so I'd have to break my ExampleSet into multiple data frames (around 56 columns mixed data types) and loop through each one. I think to just count the number of missing values in a column, thats an overkill. At the moment, only the statiscs view in results tab gives missing value counts per attribute irrespective of the type of attribute but it seems that result is unexportable outside of RapidMiner which is really an annoying inconvenience as even Microsoft Excel has this functionality.

 

In the end I chose a simple opensource commandline tool called csvkit. 

Regular Contributor

Re: Export/Generate Report for Statistics from Results view

The reporting extension hasn't been updated in ages. It's still at version 5.3 or something like that. In those versions, the Statistics tab used to be called Metadata View. And the reporting extension searches for that name. Which is why when you run a process with the reporting extension to export the Statistics tab, it outputs a blank report.

Elite II

Re: Export/Generate Report for Statistics from Results view

Hi,

actually that's not entirely true. You can very well use the Aggregate operator to count missing values. I will post you a mockup process below.

 

Perhaps a good time to offer our professional support for this kind of questions? ;-)

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Retrieve Labor-Negotiations" width="90" x="179" y="85">
        <parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
      </operator>
      <operator activated="true" class="extract_macro" compatibility="7.3.000" expanded="true" height="68" name="Extract Macro" width="90" x="313" y="85">
        <parameter key="macro" value="numberOfExamples"/>
        <list key="additional_macros"/>
      </operator>
      <operator activated="true" class="aggregate" compatibility="7.3.000" expanded="true" height="82" name="Aggregate (2)" width="90" x="447" y="85">
        <parameter key="use_default_aggregation" value="true"/>
        <parameter key="default_aggregation_function" value="count (ignoring missings)"/>
        <list key="aggregation_attributes"/>
      </operator>
      <operator activated="true" class="transpose" compatibility="7.3.000" expanded="true" height="82" name="Transpose" width="90" x="581" y="85"/>
      <operator activated="true" class="generate_attributes" compatibility="7.3.000" expanded="true" height="82" name="Generate Attributes" width="90" x="715" y="85">
        <list key="function_descriptions">
          <parameter key="Missings" value="parse(%{numberOfExamples}) - att_1"/>
        </list>
      </operator>
      <connect from_op="Retrieve Labor-Negotiations" from_port="output" to_op="Extract Macro" to_port="example set"/>
      <connect from_op="Extract Macro" from_port="example set" to_op="Aggregate (2)" to_port="example set input"/>
      <connect from_op="Aggregate (2)" from_port="example set output" to_op="Transpose" to_port="example set input"/>
      <connect from_op="Transpose" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

 

 

 

Old World Computing - Establishing the Future

Professional consulting for your Data Science problems

Highlighted
Regular Contributor

Re: Export/Generate Report for Statistics from Results view

Thank you very much. Its not exactly the obvious solution but its A solution and I will happily accept it if it keeps my workflow within the RM ecosystem.