Output from "parse(att1)" not numerical?

MacPhotoBikerMacPhotoBiker Member Posts: 60 Contributor II
Hi,

I'm trying to create a process that shows for every customer/article combination the month of the first purchase
Select Customer, Article, Min(Month)
Group By Customer, Article

The Month consists actually of two attributes, year and month (i. g. "2013-01"). It seems that RapidMiner can only calculate a minimum from numbers, not from text, so I generate a number via parse():

parse(concat(year,month), which correctly results in "201301".

In the next step, I use the "aggregate" operator, grouping by customer and article, aggregating the minimum of the parsed number. Running the process provides the correct results, and the meta view shows "real" as datatype. However, the aggregate operator shows that little yellow warning triangle with this message:
The value type of the attribute Year_Month_Numeric is not compatible with the aggregation function minimum. It requires an attribute of type numeric or date_time but is nominal.


Even worse, when I try to rename the MinMonth field, it does not even show up in the Rename operator.

To me this actually looks like a bug, but then I'm certainly a newbie and I'd hope that somebody can help me out.

Not sure if that helps, but this is the process code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
   <parameter key="encoding" value="UTF-8"/>
   <process expanded="true">
     <operator activated="true" class="read_database" compatibility="5.3.008" expanded="true" height="60" name="Read Database" width="90" x="45" y="30">
       <parameter key="connection" value="MyConnection"/>
       <parameter key="define_query" value="table name"/>
       <parameter key="table_name" value="30_00_00_Sales"/>
       <enumeration key="parameters"/>
     </operator>
     <operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes" width="90" x="112" y="210">
       <list key="function_descriptions">
         <parameter key="Year_Month_Numeric" value="parse(concat(Year,Month))"/>
       </list>
       <parameter key="use_standard_constants" value="false"/>
     </operator>
     <operator activated="true" class="aggregate" compatibility="5.3.008" expanded="true" height="76" name="Aggregate" width="90" x="313" y="210">
       <list key="aggregation_attributes">
         <parameter key="Year_Month_Numeric" value="minimum"/>
       </list>
       <parameter key="group_by_attributes" value="|Customer|Article"/>
     </operator>
     <operator activated="true" class="rename" compatibility="5.3.008" expanded="true" height="76" name="Rename" width="90" x="447" y="210">
       <list key="rename_additional_attributes"/>
     </operator>
     <connect from_op="Read Database" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
     <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
     <connect from_op="Aggregate" from_port="example set output" to_op="Rename" to_port="example set input"/>
     <connect from_op="Rename" from_port="example set output" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
Any help would be greatly appreciated.

Answers

  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    as you have noticed your process runs fine. The warnings in the design perspective can be ignored in that case. They are a prediciton of what the result will be, however sometimes you just don't know what the result really will be without running the process and so you have to guess (and that guess may end up being wrong). Especially the Generate Attributes operator is notorious for not being very good at guessing what his result will be, it's on our list.

    Regards,
    Marco
  • MacPhotoBikerMacPhotoBiker Member Posts: 60 Contributor II
    Hi Marco,

    thanks for your answer. It is correct that the process is running and the operator provides correct results. However, as I stated, when I add a "Rename" operator, the just calculated field "minimum(xxxx)" does not show up at all, so I couldn't rename and hence not use it. I found a workaround for my particular problem (minimizing dates rather than "year-month"), but still that's not a very satisfactory solution.

    Also, I believe the "minimum" function should not only accept numeric values but also text/ polynominal values. All it needs to do is sorting the input values in a lexicographical order and show the first one (the minimum of "a" and "aa" is "a") but that's just my personal opinion.

    Anyway, thank you for your reply.

    MacPhotoBiker
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    indeed the attribute will not show up in the rename operator to select, but you can still enter its name and ignore the error telling you it is wrong. Unless the process fails at runtime it is fine ;)

    Regards,
    Marco
  • MacPhotoBikerMacPhotoBiker Member Posts: 60 Contributor II
    Hi Marco,

    thanks for your answer. It's true, the operator provides the correct result, but since it's not available for further processing, it's basically useless for my purpose. Anyway, I found a workaround (as described).

    Thanks again for looking into it.
Sign In or Register to comment.