RapidMiner Wisdom Banner

Difficulties with macro value in scientific notation

lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 934   Unicorn
edited June 2019 in Help
Hi all,

RapidMiner raises an error when a macro value is in scientific notation: 


The macro value is well recorded ...: 


...and is used in an operator : 


Is there a solution / workaround to this ?

Thanks for your explanations...

Regards,

Lionel

NB : The process : 

<?xml version="1.0" encoding="UTF-8"?><process version="9.2.000-SNAPSHOT">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.2.000-SNAPSHOT" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
        <parameter key="generator_type" value="date_series"/>
        <parameter key="number_of_examples" value="100"/>
        <parameter key="use_stepsize" value="true"/>
        <list key="function_descriptions"/>
        <parameter key="add_id_attribute" value="false"/>
        <list key="numeric_series_configuration"/>
        <list key="date_series_configuration"/>
        <list key="date_series_configuration (interval)">
          <parameter key="date" value="2019-01-01 00:00:00.1.hour"/>
        </list>
        <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
        <parameter key="column_separator" value=","/>
        <parameter key="parse_all_as_nominal" value="false"/>
        <parameter key="decimal_point_character" value="."/>
        <parameter key="trim_attribute_names" value="true"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34">
        <parameter key="create_nominal_ids" value="false"/>
        <parameter key="offset" value="0"/>
      </operator>
      <operator activated="true" class="generate_data" compatibility="9.2.000-SNAPSHOT" expanded="true" height="68" name="Generate Data" width="90" x="45" y="136">
        <parameter key="target_function" value="random"/>
        <parameter key="number_examples" value="100"/>
        <parameter key="number_of_attributes" value="2"/>
        <parameter key="attributes_lower_bound" value="-10.0"/>
        <parameter key="attributes_upper_bound" value="10.0"/>
        <parameter key="gaussian_standard_deviation" value="10.0"/>
        <parameter key="largest_radius" value="10.0"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
        <parameter key="datamanagement" value="double_array"/>
        <parameter key="data_management" value="auto"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Generate ID (2)" width="90" x="179" y="136">
        <parameter key="create_nominal_ids" value="false"/>
        <parameter key="offset" value="0"/>
      </operator>
      <operator activated="true" breakpoints="after" class="concurrency:join" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Join" width="90" x="313" y="85">
        <parameter key="remove_double_attributes" value="true"/>
        <parameter key="join_type" value="inner"/>
        <parameter key="use_id_attribute_as_key" value="true"/>
        <list key="key_attributes"/>
        <parameter key="keep_both_join_attributes" value="false"/>
      </operator>
      <operator activated="true" class="extract_macro" compatibility="9.2.000-SNAPSHOT" expanded="true" height="68" name="Extract Macro" width="90" x="447" y="85">
        <parameter key="macro" value="dateOffset"/>
        <parameter key="macro_type" value="data_value"/>
        <parameter key="statistics" value="average"/>
        <parameter key="attribute_name" value="date"/>
        <parameter key="example_index" value="1"/>
        <list key="additional_macros"/>
      </operator>
      <operator activated="true" class="generate_macro" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Generate Macro" width="90" x="581" y="85">
        <list key="function_descriptions">
          <parameter key="dateDiffOffset" value="date_diff(date_parse(&quot;01/01/1970&quot;),date_parse(%{dateOffset}))"/>
        </list>
      </operator>
      <operator activated="true" breakpoints="after" class="date_to_numerical" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Date to Numerical" width="90" x="715" y="85">
        <parameter key="attribute_name" value="date"/>
        <parameter key="time_unit" value="day"/>
        <parameter key="millisecond_relative_to" value="second"/>
        <parameter key="second_relative_to" value="minute"/>
        <parameter key="minute_relative_to" value="hour"/>
        <parameter key="hour_relative_to" value="day"/>
        <parameter key="day_relative_to" value="month"/>
        <parameter key="week_relative_to" value="year"/>
        <parameter key="month_relative_to" value="year"/>
        <parameter key="quarter_relative_to" value="year"/>
        <parameter key="half_year_relative_to" value="year"/>
        <parameter key="year_relative_to" value="era"/>
        <parameter key="keep_old_attribute" value="false"/>
      </operator>
      <operator activated="true" class="aggregate" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Aggregate" width="90" x="849" y="34">
        <parameter key="use_default_aggregation" value="false"/>
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="default_aggregation_function" value="average"/>
        <list key="aggregation_attributes">
          <parameter key="att1" value="average"/>
        </list>
        <parameter key="group_by_attributes" value="date"/>
        <parameter key="count_all_combinations" value="false"/>
        <parameter key="only_distinct" value="false"/>
        <parameter key="ignore_missings" value="true"/>
      </operator>
      <operator activated="true" class="numerical_to_date" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Numerical to Date" width="90" x="1117" y="34">
        <parameter key="attribute_name" value="date"/>
        <parameter key="keep_old_attribute" value="false"/>
        <parameter key="time_offset" value="%{dateDiffOffset}"/>
      </operator>
      <operator activated="true" class="concurrency:join" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Join (2)" width="90" x="1275" y="85">
        <parameter key="remove_double_attributes" value="true"/>
        <parameter key="join_type" value="inner"/>
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="date" value="date"/>
        </list>
        <parameter key="keep_both_join_attributes" value="false"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Set Role" width="90" x="1448" y="85">
        <parameter key="attribute_name" value="date"/>
        <parameter key="target_role" value="id"/>
        <list key="set_additional_roles"/>
      </operator>
      <connect from_op="Create ExampleSet" from_port="output" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
      <connect from_op="Generate Data" from_port="output" to_op="Generate ID (2)" to_port="example set input"/>
      <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
      <connect from_op="Join" from_port="join" to_op="Extract Macro" to_port="example set"/>
      <connect from_op="Extract Macro" from_port="example set" to_op="Generate Macro" to_port="through 1"/>
      <connect from_op="Generate Macro" from_port="through 1" to_op="Date to Numerical" to_port="example set input"/>
      <connect from_op="Date to Numerical" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Date to Numerical" from_port="original" to_op="Join (2)" to_port="left"/>
      <connect from_op="Aggregate" from_port="example set output" to_op="Numerical to Date" to_port="example set input"/>
      <connect from_op="Numerical to Date" from_port="example set output" to_op="Join (2)" to_port="right"/>
      <connect from_op="Join (2)" from_port="join" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>


Tagged:

Best Answers

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,224  RM Data Scientist
    edited January 2019
    kind of known issue, even though this usually appears in another context. The point is, that Extract Macro always uses the default "toString" version for attributes. Usually you end up with a bit of formatting issues if you are working on Dates. But rarely (like in your case) also for numericals. Our currently proposed fix would be, to have a format option in Extract Macro to specify how to do the translation to string. Would this fix your problem?


    Best,
    Martin



    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    lionelderkrikor
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,293   Unicorn
    @mschmitz I've run into this issue before as well.  One question, would your proposed solution require all values to be formatted the same way?  Because it is possible I believe for some values to be in standard notation and others to be in scientific notation since RapidMiner has a threshold built in which determines when it switches.  In that case, would your solution work or would it still create errors?
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,224  RM Data Scientist
    each Extract Macro operator would have one date and one number format. If you want to have different formats you would need to use two or more operators. Is that fine for you?
    BR,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,293   Unicorn
    @mschmitz&nbsp; It's certainly an improvement over the current setup!  I think there might still be cases where this would run into trouble but your proposed solution will probably cover the significant majority of use cases.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,224  RM Data Scientist
    @Telcontar120 I am open for other ideas, if you have some! :)
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,293   Unicorn
    @mschmitz I think the only better solution would be to somehow allow Extract Macro to operate natively on numeric fields, in which case specifying formats for numerics would not be necessary, but I suspect that is not trivial since the underlying macro architecture is built around treating everything as a string.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    lionelderkrikor
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,224  RM Data Scientist
    @Telcontar120 exactly. There is a workaround to use the hex representation of the numerical number as a string and add a function to Generate Attributes to translate it back. That could be one of the formating options for numericals.
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    lionelderkrikor
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 934   Unicorn
    Hi,

    @Telcontar120 , @mschmitz
    Thanks you for sharing your ideas.

    @mschmitz ,
    "Our currently proposed fix would be, to format option in Extract Macro to specify how to do the translation to string. Would this fix your problem ?
    "

    OK, but the problematic macro value  is generated by Generate Macro operator...
    To be more precise, I'm extracting a macro value via Extract Macro operator : 



    then I use Generate Macro operator to create the final (problematic) macro value... :  



    Does your fix cover this case ?

    Regards,

    Lionel
     


  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,224  RM Data Scientist
    edited January 2019
    you need to use the right date_parse. namely:
    date_parse_custom(%{dateOffset},"dd/MM/yyyy hh:mm:ss a z")

    then it works.
    Edit: Nevermind, this is not the case. my fix would not cover this issue, but i think what you need is
         str(numeric, FORMAT)
    in the Parser, right?

    BR,
    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 934   Unicorn
    Hi @mschmitz,

    str(numeric, FORMAT) don't fix the problem...
    To be more complete, when I set directly the value (without macro) in scientific notation in the operator parameters :

     ....I have the same error...

    Regards,

    Lionel
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,224  RM Data Scientist
    with the formating option you could enforce a 12 digit integer instead of scientific notation. Wouldnt that fix it?

    BR,
    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 934   Unicorn
    Hi @mschmitz,

    I can't set a second argument in the function str() : 


    Any ideas ?

    Regards,

    Lionel
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 934   Unicorn
    @mschmitz,

    Ah ah..  :) ..OK, I misunderstood. OK for this solution.

    Thanks Martin,

    Regards,

    Lionel
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 934   Unicorn
    Hi @mschmitz,

    Thanks you for this heads up. I updated my process with this new operator and indeed this helped a lot.

    Have a nice day,

    Regards,

    Lionel
Sign In or Register to comment.