Options

Real numbers get cut off after E17?

Tunguska991Tunguska991 Member Posts: 1 Newbie
Hello there dear community,
this is my first post in any computer science/ data science related forum, which is really exciting and why I also hope that I didnt miss the post that discusses my issue. I also hope that my question is not common knowledge for everyone here.
I basically worked around my problem already, but I am really really curious why this happens and if this is normal.

This morning I have been trying to work with large numbers that are 18 digits long. I read the data from an Excel file as "real", Before i changed the import of these numbers to "polynominal". Now something interesting happens to those numbers in the Results tab when importing them as real:

The example number in my Results tab is portrayed as: 123456789112345670
If you select the cell and Focus on it, the number may be something like this: 123456789112345668
If you copy the cell without focussing on it, it pastes as: 1.2345678911234567E17
Now if you save it as a csv or as an Excel file it also rounds the digits to 1.2345678911234567E17
If you use the operator Numercial to Polynominal the number will be rounded to 123456789112345670 instead of writing 123456789112345668

Is this intentional? Has this something to do with the amount of bits used in storing the real number? Shouldnt be the max real number here somewhere around +/-E32 or smth?


Thanks to the community and everyone working at RapidMiner for this exciting environment full of learning and passion about the topic of data science!


Best regards!






Answers

  • Options
    yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
    It is definitely Overflow issues with big numbers in RapidMiner. You can use nominal string to represent this type of long integers.
  • Options
    MichaelKnopfMichaelKnopf Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 31 RM Data Scientist
    RapidMiner is using so-called IEEE 754 64-bit floating point numbers ("doubles") for representing real attributes.

    It is basically limited to 53Bit integers, after that you are likely to run into rounding errors. This Stack Overflow answer has some more details:

    https://stackoverflow.com/a/1848762

    2^53 is a 16 digit number. Thus, your 18 digit numbers are likely to be rounded. Take for example this conversion from a 64Bit integer ("long") to a "double" back to a "long":

    # the L at the end denotes a 64Bit integer (long)
    jshell> var x = 123456789112345668L
    x ==> 123456789112345668
    
    # convert x to a double and then back to a long
    jshell> (long) (double) x
    $2 ==> 123456789112345664
    

    Now, I would not call the differences you are seeing intended, but simply an artifact of similar rounding issues that might surface in different ways depending on the order of applied conversions.

    Please take note that the code displaying the value might do some conversion as well (but that does not modify the underlying data).
Sign In or Register to comment.