macro's - generate attributes

lghansselghansse Member Posts: 18 Contributor I
edited June 17 in Help

Hi, 

 

I've shared my process below and included my xml to illustrate my problem since I can't explain my problem very clearly. What I'm trying to do is to generate a new attribute based on another attribute that is built with a macro (loop_value) and then renamed. However in the generate attributes I cannot use the expression previous_eval(%{loop_value}) (in which "previous_" is part of the renaming process). Does anybody knows how to work around this problem? Sorry if I'm being vague, but the xml should make it more clear. 

 

Thanks! 

 

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
<parameter key="csv_file" value="C:\Users\lise.hanssens\Downloads\test_macro.csv"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="concurrency:loop_values" compatibility="8.1.000" expanded="true" height="82" name="Loop Values" width="90" x="246" y="34">
<parameter key="attribute" value="activity"/>
<process expanded="true">
<operator activated="true" class="multiply" compatibility="8.1.000" expanded="true" height="103" name="Multiply" width="90" x="179" y="85"/>
<operator activated="true" breakpoints="after" class="rename_by_replacing" compatibility="8.1.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="313" y="34">
<parameter key="replace_what" value="(.+)"/>
<parameter key="replace_by" value="previous_$1"/>
</operator>
<operator activated="true" class="concurrency:join" compatibility="8.1.000" expanded="true" height="82" name="Join" width="90" x="447" y="85">
<parameter key="join_type" value="right"/>
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="previous_id" value="id"/>
</list>
</operator>
<operator activated="true" class="generate_attributes" compatibility="8.1.000" expanded="true" height="82" name="Generate Attributes" width="90" x="581" y="85">
<list key="function_descriptions">
<parameter key="previous_activity" value="if(missing(eval(%{loop_value}), previous_eval(%{loop_value}), eval(%{loop_value}))"/>
</list>
</operator>
<connect from_port="input 1" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Rename by Replacing" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_op="Join" to_port="right"/>
<connect from_op="Rename by Replacing" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="input 1"/>
<connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 731   Unicorn

    Hi @lghansse,

     

    Correct me if I'm wrong : You want to replace missing value(s) by the last known value (previous valid value) ?

     

    Regards,

     

    Lionel

  • lghansselghansse Member Posts: 18 Contributor I

    Hi lionelderkrikor

     

    Yes, in a way that is what I want to do. However, maybe it's good to know that the example is just an illustration of the problem I have. In reality, this 'subproces' is part of a much larger and complex script (that I cannot share because of privacy-issues).

     

    Lise

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 731   Unicorn

    Hi again @lghansse

     

    I think that the  Replace Missing Values (Series) operator from the Series Extension 7.4.0 (to install from the MarketPlace) will do the job.

    It replace missing value(s) by the previous valid value : 

    Rename_Macro.png

     

    Does this process answer to your need ?

    If not, can you explain more explicitly by giving an example of your initial dataset and a dataset you want to obtain ?

     

    Regards,

     

    Lionel

     

    NB : the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.000-BETA">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.000-BETA" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" breakpoints="after" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Rename_Macro\test_macro.csv"/>
    <parameter key="skip_comments" value="true"/>
    <parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="id.true.integer.attribute"/>
    <parameter key="1" value="activity.true.polynominal.attribute"/>
    </list>
    <parameter key="read_not_matching_values_as_missings" value="false"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="9.0.000-BETA" expanded="true" height="103" name="Multiply" width="90" x="179" y="34"/>
    <operator activated="true" class="series:replace_missing_series_values" compatibility="7.4.000" expanded="true" height="82" name="Replace Missing Values (Series)" width="90" x="313" y="34">
    <parameter key="attribute_name" value="activity"/>
    </operator>
    <operator activated="true" class="rename_by_replacing" compatibility="9.0.000-BETA" expanded="true" height="82" name="Rename by Replacing" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="activity"/>
    <parameter key="replace_what" value="(.+)"/>
    <parameter key="replace_by" value="corrected_$1"/>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="9.0.000-BETA" expanded="true" height="82" name="Join" width="90" x="581" y="85">
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="id" value="id"/>
    </list>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Replace Missing Values (Series)" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Join" to_port="right"/>
    <connect from_op="Replace Missing Values (Series)" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
    <connect from_op="Rename by Replacing" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

  • lghansselghansse Member Posts: 18 Contributor I

    Hi  lionelderkrikor

     

    Thanks for helping, but this doesn't really solve my problem because I can't omit the 'loop values' operator and the 'replace missings -series' does not recoginse previous_%{loop_value} as an attribute either. Basically what I'm beginning with is a dataset with:

    -id's for persons

    -time id's (which are generated by following expression: ((parse(year)-1)*12)+parse(month_num)). These are necessary because the data needs to be imported in qlik later on, and we want to be able to compare different timeframes. 

    -status of an activity. 

     

    For each month a new .csv file is written in which the status of each contact_id is registered. The process recognises how many times this step has to be repeted (= equal to the number of months that are present in the dataset). So for example, if the first date in the file is 1/09/2010 and the last 1/08/2011 there will be 13 iterations. For each consequent step I read the latest .csv file in order to get the status for the previous month, rename the attributes with previous_.... and then merge them. Because this proces needs to be done for each 'activity status' seperatly I loop over the values of the attribute 'activity' and then filter activity = %{loop_value}. Later on I generate an attribute based on this macro for which the value is 'true' because the goal is to obtain several datasets for each activity in with:

     

    -contact_id

    -time_id

    -acitvity {true, false}

     

    So the macro at the time will be the same as the attribute generated, which serves as input for the previous_... -attribute. And because each of this operators is located in the loop_values-operator I figuered I needed to use the macro in order for the script to work. E.g.: If I type 'read' as attribute in the 'missing value'-operator, it won't work when the process loops over the value 'listen'. 

     

    I don't know if this makes sense to you, but thanks for helping anyhow! 

     

    Lise

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 731   Unicorn

    @lghansse,

     

    It's more and more complicated...

    Could you share your entry dataset(s) with "fictive" data and the associated final resulting dataset (what you want to obtain) : 

    It helps me a lot to understand and help you...

    Regards,

     

    Lionel

Sign In or Register to comment.