**RapidMiner 9.7 is Now Available**

### Lots of amazing new improvements including true version control! Learn more about what's new here.

### CLICK HERE TO DOWNLOAD

# "Integer vs float performance"

Assume that you have a lossless way to convert your data from floats to integers.

Would this speed up your rapid-miner process?

And what about memory usage?

If so, what algorithms would mostly benefit from doing all calculations on integers?

I found this table on the internet:

Comparison of Pentium Floating Point and Integer Speeds

Operation Floating Point clocks Integer Clocks

add 1-3 1-3

multiply 1-3 10-11

division 39-42 22-46

convert 6 (double to long) 3 (long to double)

Is this true always?

Would this speed up your rapid-miner process?

And what about memory usage?

If so, what algorithms would mostly benefit from doing all calculations on integers?

I found this table on the internet:

Comparison of Pentium Floating Point and Integer Speeds

Operation Floating Point clocks Integer Clocks

add 1-3 1-3

multiply 1-3 10-11

division 39-42 22-46

convert 6 (double to long) 3 (long to double)

Is this true always?

Tagged:

0

## Answers

1,749RM FounderI am afraid I cannot say much about runtime. Looking at the table you provided it indeed could be that some calculations are performed quicker. But I would expect that many of the calculations done internally are performed on a double base anyway so this probably would not really help. If we calculate a linear regression, for example, the data is transformed to a double matrix which is then inverted and there will no runtime improvement then.

What is true is that the amount of used memory should be approximately reduced to the half when you change the data management to integer instead of double. The same would be true for float instead of double since only 4 bytes are used in both cases instead of the 8 bytes for double. We actually had one RapidMiner version (4.0 or 4.1 if I remember correctly) where the default data management was set to float. But it turned out that for many applications the precision was not high enough, especially for larger numbers, and for that reason we changed the default back to double.

Cheers,

Ingo

21MavenI remember that there are some operators where we can set data management to integer or float, but I cannot find those parameters in the current release. I was looking for it in e.g Read CSV. How can I set data management in this case?

Thanks, Zoltan

1,749RM Founderyou are right. The parameter is still there for several input operators but it is missing now for CSV, Excel, Database, and Arff for some reason. I have opened a bug report at

http://bugs.rapid-i.com/show_bug.cgi?id=446

Cheers and thanks for pointing this out,

Ingo