Urgent: Cannot export/generate Report for Statistics from Results view

batstache611batstache611 Member Posts: 45 Guru
edited November 2018 in Help

Why is it so complicated to export basic summary statistics from Results view in rapidminer? I know there is a reporting extension but it exports everything else except Statistics. Because its an old extension that nobody bothered to update. The Statistics tab used to be called "Metadata View" in the much older versions of RapidMiner and that's what this extension tries to look for which it obviously it cannot find! Here is my process. What am I doing wrong? 

 

Basically I need the values from the html report from this process to pass to a neo4j database query (python program sitting in between) that will extract data based on certain numbers from the summary statistics for each level of some of the categorical attributes.

Best Answer

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Solution Accepted

    Hi,

    actually that's not entirely true. You can very well use the Aggregate operator to count missing values. I will post you a mockup process below.

     

    Perhaps a good time to offer our professional support for this kind of questions? ;-)

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Retrieve Labor-Negotiations" width="90" x="179" y="85">
    <parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="7.3.000" expanded="true" height="68" name="Extract Macro" width="90" x="313" y="85">
    <parameter key="macro" value="numberOfExamples"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="aggregate" compatibility="7.3.000" expanded="true" height="82" name="Aggregate (2)" width="90" x="447" y="85">
    <parameter key="use_default_aggregation" value="true"/>
    <parameter key="default_aggregation_function" value="count (ignoring missings)"/>
    <list key="aggregation_attributes"/>
    </operator>
    <operator activated="true" class="transpose" compatibility="7.3.000" expanded="true" height="82" name="Transpose" width="90" x="581" y="85"/>
    <operator activated="true" class="generate_attributes" compatibility="7.3.000" expanded="true" height="82" name="Generate Attributes" width="90" x="715" y="85">
    <list key="function_descriptions">
    <parameter key="Missings" value="parse(%{numberOfExamples}) - att_1"/>
    </list>
    </operator>
    <connect from_op="Retrieve Labor-Negotiations" from_port="output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_op="Aggregate (2)" to_port="example set input"/>
    <connect from_op="Aggregate (2)" from_port="example set output" to_op="Transpose" to_port="example set input"/>
    <connect from_op="Transpose" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

     

     

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Can't you use an Aggregate operator and use those summary statistics? This way you can skip the report generator, which is old and outdated. 

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    I don't believe there is any way to directly export figures or data from the Statistics view other than by printing or taking a screenshot.  As @Thomas_Ott suggests, the Aggregate operator will be your best option here, unless you need the graphs, in which case the Reporting extension is your only real option to have the process do the export automatically.

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • batstache611batstache611 Member Posts: 45 Guru

    Thank you Thomas but the aggregate operator does not give a count statistics (of missing values) for numeric attributes. This is understandable as we don't generate count stats for numeric variables but point estimates. However there is no other operator to achieve this other than running python or R scripts inside rapidminer. And within the Execute R operator, I found that unless you ended your code to print the results as.data.frame, RapidMiner does not understand what is being passed for output by that operator. I'm not an R guru but as I understand data frames contain data of one type only, so I'd have to break my ExampleSet into multiple data frames (around 56 columns mixed data types) and loop through each one. I think to just count the number of missing values in a column, thats an overkill. At the moment, only the statiscs view in results tab gives missing value counts per attribute irrespective of the type of attribute but it seems that result is unexportable outside of RapidMiner which is really an annoying inconvenience as even Microsoft Excel has this functionality.

     

    In the end I chose a simple opensource commandline tool called csvkit. 

  • batstache611batstache611 Member Posts: 45 Guru

    The reporting extension hasn't been updated in ages. It's still at version 5.3 or something like that. In those versions, the Statistics tab used to be called Metadata View. And the reporting extension searches for that name. Which is why when you run a process with the reporting extension to export the Statistics tab, it outputs a blank report.

  • batstache611batstache611 Member Posts: 45 Guru

    Thank you very much. Its not exactly the obvious solution but its A solution and I will happily accept it if it keeps my workflow within the RM ecosystem.

Sign In or Register to comment.