Malcolm McLean wrote, On 28/10/07 22:13:
It *is* reading and writing a lot of data. For example, it was over 7
years ago that the 2GB file size limit became a problem and it is
working with a lot of files. Running a report can easily involve reading
a million records and there can be a significant number of reports being
run on data which is being changed during peak periods.
Even in the very worst case you have to be incredibly unlucky to slow
down the program by more than 50%. (I was going to say "can't", but
that's not strictly true - you can construct a scenario in which you
thrash the cache by doubling int size). That might sound a lot to an
electrical engineer, but in software terms it really is on the boundary
of a worthwhile optimisation - if we're not doubling speed at least we
aren't really speeding things up.
Most customers would complain loudly about a 10% slow down.
However that represents the sever at peak capacity.
Due to the usage the time when it is running at peak is *also* the time
when performance is most important. Outside peak times most users would
not notice a 50% slow down (a few users on each server would), but at
peak times almost all users would notice a 5% slow down.
But if it has got
hundreds of simultaneous users, how often is the system running at peak
capacity?
Every month end. An even bigger peak every year end. Other peaks at
project start-ups and close-downs.
Doubling int sizes would doubtless have some impact on performace. But
not the sort that is being suggested.
Well, the slow-down does not have to be much before it has major impact
in some areas. There are always people running close to the limit.
There is also the impact on storage requirements, and the time to
convert the data currently stored on disk (10s of GB of data for small
customers, 100s of GB or terrabytes for larger customers, so just the
time reading and then writing would be significant).
Then I don't believe that most of
the calculations are of amounts of money, anyway.
Have you seen the code? No? Well, I have and there is not much indexing
since the data that could be stored in arrays is actually stored in the
database on disk. We have a lot of manipulation of quantities other than
money, but they are also quantities we have to keep an exact tally of
and so use integers of fixed-point numbers.
I think the system
spends most of its time copying data from one location to another.
Well, the larger the data the longer the copying takes to copy, so
assuming you are correct (and if you count copying between memory and
disk you are for the application I am referring to) then you have just
given a reason why increasing the size of an int is a bad idea.
It could be farming out memory to disk, however. In which case doubling
memory take would have a significant impact.
Yes, it would MASSIVELY increase the cost of the servers. I am not
talking about desktop machines here, I am talking about nice high spec
servers currently being used where customers at a minimum buy as much
RAM as they can without hitting RAM being a ridiculously large cost and
lots of disk in a RAID set up to maximise bandwidth.
IO is still largely latency
bound, but the number of writes would double.
Increasing the number of reads and writes by 10% would be unpopular
unless it brought some other massive benefit, and your proposed change
would not provide my customers with any benefit, only costs.