The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Integer vs float performance"
Assume that you have a lossless way to convert your data from floats to integers.
Would this speed up your rapid-miner process?
And what about memory usage?
If so, what algorithms would mostly benefit from doing all calculations on integers?
I found this table on the internet:
Comparison of Pentium Floating Point and Integer Speeds
Operation Floating Point clocks Integer Clocks
add 1-3 1-3
multiply 1-3 10-11
division 39-42 22-46
convert 6 (double to long) 3 (long to double)
Is this true always?
Would this speed up your rapid-miner process?
And what about memory usage?
If so, what algorithms would mostly benefit from doing all calculations on integers?
I found this table on the internet:
Comparison of Pentium Floating Point and Integer Speeds
Operation Floating Point clocks Integer Clocks
add 1-3 1-3
multiply 1-3 10-11
division 39-42 22-46
convert 6 (double to long) 3 (long to double)
Is this true always?
Tagged:
0
Answers
I am afraid I cannot say much about runtime. Looking at the table you provided it indeed could be that some calculations are performed quicker. But I would expect that many of the calculations done internally are performed on a double base anyway so this probably would not really help. If we calculate a linear regression, for example, the data is transformed to a double matrix which is then inverted and there will no runtime improvement then.
What is true is that the amount of used memory should be approximately reduced to the half when you change the data management to integer instead of double. The same would be true for float instead of double since only 4 bytes are used in both cases instead of the 8 bytes for double. We actually had one RapidMiner version (4.0 or 4.1 if I remember correctly) where the default data management was set to float. But it turned out that for many applications the precision was not high enough, especially for larger numbers, and for that reason we changed the default back to double.
Cheers,
Ingo
I remember that there are some operators where we can set data management to integer or float, but I cannot find those parameters in the current release. I was looking for it in e.g Read CSV. How can I set data management in this case?
Thanks, Zoltan
you are right. The parameter is still there for several input operators but it is missing now for CSV, Excel, Database, and Arff for some reason. I have opened a bug report at
http://bugs.rapid-i.com/show_bug.cgi?id=446
Cheers and thanks for pointing this out,
Ingo