Real numbers get cut off after E17?

Tunguska991 · August 2022

Hello there dear community,

this is my first post in any computer science/ data science related forum, which is really exciting and why I also hope that I didnt miss the post that discusses my issue. I also hope that my question is not common knowledge for everyone here.

I basically worked around my problem already, but I am really really curious why this happens and if this is normal.

This morning I have been trying to work with large numbers that are 18 digits long. I read the data from an Excel file as "real", Before i changed the import of these numbers to "polynominal". Now something interesting happens to those numbers in the Results tab when importing them as real:

The example number in my Results tab is portrayed as: 123456789112345670
If you select the cell and Focus on it, the number may be something like this: 123456789112345668

If you copy the cell without focussing on it, it pastes as: 1.2345678911234567E17

Now if you save it as a csv or as an Excel file it also rounds the digits to 1.2345678911234567E17

If you use the operator Numercial to Polynominal the number will be rounded to 123456789112345670 instead of writing 123456789112345668

Is this intentional? Has this something to do with the amount of bits used in storing the real number? Shouldnt be the max real number here somewhere around +/-E32 or smth?

Thanks to the community and everyone working at RapidMiner for this exciting environment full of learning and passion about the topic of data science!

Best regards!

yyhuang · August 2022

It is definitely Overflow issues with big numbers in RapidMiner. You can use nominal string to represent this type of long integers.

MichaelKnopf · August 2022

RapidMiner is using so-called IEEE 754 64-bit floating point numbers ("doubles") for representing real attributes.

It is basically limited to 53Bit integers, after that you are likely to run into rounding errors. This Stack Overflow answer has some more details:

https://stackoverflow.com/a/1848762

2^53 is a 16 digit number. Thus, your 18 digit numbers are likely to be rounded. Take for example this conversion from a 64Bit integer ("long") to a "double" back to a "long":

# the L at the end denotes a 64Bit integer (long)
jshell> var x = 123456789112345668L
x ==> 123456789112345668

# convert x to a double and then back to a long
jshell> (long) (double) x
$2 ==> 123456789112345664

Now, I would not call the differences you are seeing intended, but simply an artifact of similar rounding issues that might surface in different ways depending on the order of applied conversions.

Please take note that the code displaying the value might do some conversion as well (but that does not modify the underlying data).

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Real numbers get cut off after E17?

Answers