"Integer vs float performance"

wesselwessel Member Posts: 537 Maven
edited May 2019 in Help
Assume that you have a lossless way to convert your data from floats to integers.

Would this speed up your rapid-miner process?
And what about memory usage?

If so, what algorithms would mostly benefit from doing all calculations on integers?

I found this table on the internet:
Comparison of Pentium Floating Point and Integer Speeds
Operation Floating Point clocks Integer Clocks
add 1-3 1-3
multiply 1-3 10-11
division 39-42 22-46
convert 6 (double to long) 3 (long to double)

Is this true always?
Tagged:

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Wessel,

    I am afraid I cannot say much about runtime. Looking at the table you provided it indeed could be that some calculations are performed quicker. But I would expect that many of the calculations done internally are performed on a double base anyway so this probably would not really help. If we calculate a linear regression, for example, the data is transformed to a double matrix which is then inverted and there will no runtime improvement then.

    What is true is that the amount of used memory should be approximately reduced to the half when you change the data management to integer instead of double. The same would be true for float instead of double since only 4 bytes are used in both cases instead of the 8 bytes for double. We actually had one RapidMiner version (4.0 or 4.1 if I remember correctly) where the default data management was set to float. But it turned out that for many applications the precision was not high enough, especially for larger numbers, and for that reason we changed the default back to double.

    Cheers,
    Ingo
  • PrekoPreko Member Posts: 21 Contributor II
    Hi,

    I remember that there are some operators where we can set data management to integer or float, but I cannot find those parameters in the current release. I was looking for it in e.g Read CSV. How can I set data management in this case?

    Thanks, Zoltan
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Zoltan,

    you are right. The parameter is still there for several input operators but it is missing now for CSV, Excel, Database, and Arff for some reason. I have opened a bug report at

    http://bugs.rapid-i.com/show_bug.cgi?id=446

    Cheers and thanks for pointing this out,
    Ingo
Sign In or Register to comment.