Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Output from "parse(att1)" not numerical?
MacPhotoBiker
Member Posts: 60 Contributor II
in Help
Hi,
I'm trying to create a process that shows for every customer/article combination the month of the first purchase
Select Customer, Article, Min(Month)
Group By Customer, Article
The Month consists actually of two attributes, year and month (i. g. "2013-01"). It seems that RapidMiner can only calculate a minimum from numbers, not from text, so I generate a number via parse():
parse(concat(year,month), which correctly results in "201301".
In the next step, I use the "aggregate" operator, grouping by customer and article, aggregating the minimum of the parsed number. Running the process provides the correct results, and the meta view shows "real" as datatype. However, the aggregate operator shows that little yellow warning triangle with this message:
The value type of the attribute Year_Month_Numeric is not compatible with the aggregation function minimum. It requires an attribute of type numeric or date_time but is nominal.
Even worse, when I try to rename the MinMonth field, it does not even show up in the Rename operator.
To me this actually looks like a bug, but then I'm certainly a newbie and I'd hope that somebody can help me out.
Not sure if that helps, but this is the process code:
I'm trying to create a process that shows for every customer/article combination the month of the first purchase
Select Customer, Article, Min(Month)
Group By Customer, Article
The Month consists actually of two attributes, year and month (i. g. "2013-01"). It seems that RapidMiner can only calculate a minimum from numbers, not from text, so I generate a number via parse():
parse(concat(year,month), which correctly results in "201301".
In the next step, I use the "aggregate" operator, grouping by customer and article, aggregating the minimum of the parsed number. Running the process provides the correct results, and the meta view shows "real" as datatype. However, the aggregate operator shows that little yellow warning triangle with this message:
The value type of the attribute Year_Month_Numeric is not compatible with the aggregation function minimum. It requires an attribute of type numeric or date_time but is nominal.
Even worse, when I try to rename the MinMonth field, it does not even show up in the Rename operator.
To me this actually looks like a bug, but then I'm certainly a newbie and I'd hope that somebody can help me out.
Not sure if that helps, but this is the process code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>Any help would be greatly appreciated.
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<parameter key="encoding" value="UTF-8"/>
<process expanded="true">
<operator activated="true" class="read_database" compatibility="5.3.008" expanded="true" height="60" name="Read Database" width="90" x="45" y="30">
<parameter key="connection" value="MyConnection"/>
<parameter key="define_query" value="table name"/>
<parameter key="table_name" value="30_00_00_Sales"/>
<enumeration key="parameters"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes" width="90" x="112" y="210">
<list key="function_descriptions">
<parameter key="Year_Month_Numeric" value="parse(concat(Year,Month))"/>
</list>
<parameter key="use_standard_constants" value="false"/>
</operator>
<operator activated="true" class="aggregate" compatibility="5.3.008" expanded="true" height="76" name="Aggregate" width="90" x="313" y="210">
<list key="aggregation_attributes">
<parameter key="Year_Month_Numeric" value="minimum"/>
</list>
<parameter key="group_by_attributes" value="|Customer|Article"/>
</operator>
<operator activated="true" class="rename" compatibility="5.3.008" expanded="true" height="76" name="Rename" width="90" x="447" y="210">
<list key="rename_additional_attributes"/>
</operator>
<connect from_op="Read Database" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0
Answers
as you have noticed your process runs fine. The warnings in the design perspective can be ignored in that case. They are a prediciton of what the result will be, however sometimes you just don't know what the result really will be without running the process and so you have to guess (and that guess may end up being wrong). Especially the Generate Attributes operator is notorious for not being very good at guessing what his result will be, it's on our list.
Regards,
Marco
thanks for your answer. It is correct that the process is running and the operator provides correct results. However, as I stated, when I add a "Rename" operator, the just calculated field "minimum(xxxx)" does not show up at all, so I couldn't rename and hence not use it. I found a workaround for my particular problem (minimizing dates rather than "year-month"), but still that's not a very satisfactory solution.
Also, I believe the "minimum" function should not only accept numeric values but also text/ polynominal values. All it needs to do is sorting the input values in a lexicographical order and show the first one (the minimum of "a" and "aa" is "a") but that's just my personal opinion.
Anyway, thank you for your reply.
MacPhotoBiker
indeed the attribute will not show up in the rename operator to select, but you can still enter its name and ignore the error telling you it is wrong. Unless the process fails at runtime it is fine
Regards,
Marco
thanks for your answer. It's true, the operator provides the correct result, but since it's not available for further processing, it's basically useless for my purpose. Anyway, I found a workaround (as described).
Thanks again for looking into it.