The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Scientific Notation for very small numbers 1E-12

dragoljubdragoljub Member Posts: 241 Contributor II
edited November 2018 in Help
I have imported some data from a csv file using the AML operator. The data has columns of small E-12 valued data.

I noticed that in the results view all very small numbers are represented as zeros. Even in the meta data view the statistics is all zero. However, when you copy and paste the entry you see that the correct E-12 number is stored there.

Does rapid miner correctly use these numbers (E-10 - E-12 range) or does it assume zero for the processing operators. I suppose I could scale up by some constant but is that necessary?

Also is there any way to show scientific notation in the results view?  ;D



  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    I have also noticed that this could be problematic when using the 'Remove Useless' operator. It seems like for very small numbers the statistics are not correctly calculated since they are always interpreted as zero rather than normalized values.  ???

  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi there,

    In Rapido reals are really reals, they are only rounded up for display, according to the 'fractiondigits.number' preference setting. As for imposing scientific notation, or others ....
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="206" width="681">
          <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="111" y="67">
            <parameter key="attributes_lower_bound" value="-1.0E-100"/>
            <parameter key="attributes_upper_bound" value="1.0E-100"/>
          <operator activated="true" class="format_numbers" expanded="true" height="76" name="Format Numbers" width="90" x="313" y="75">
            <parameter key="format_type" value="pattern"/>
            <parameter key="pattern" value="0.###E0"/>
          <connect from_op="Generate Data" from_port="output" to_op="Format Numbers" to_port="example set input"/>
          <connect from_op="Format Numbers" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    in addition to what haddock said: The Remove Useless operator uses the standard deviation of the attribute values to determine if it's useless. If your numbers are very small, you will have to lower the threshold accordingly.
    I think it would be smarter to use some mean weighted threshold, but anyway, the remove useless operator should be avoided for attributes having different values at all if possible. The usage of a learner based attribute selection will be far preferable.

Sign In or Register to comment.