Martin v. Löwis said:
Yes. Python uses an int for storing the size. This is a real reason, and
changing it is not trivial. ....
Changing it to int64 would be wrong. Changing it to size_t would be
better, although it must be signed, so it should be changed to ssize_t.
But then, ssize_t is not available on all platforms. And so on.
I didn't mean this literally, but rather, at a slightly more abstract level,
one could imagine simply replacing whatever types mentioned in the python
source that map to 32-bit integers with corresponding types that map to 64-bit
integers (on a 64-bit platform like alpha or amd64). Thinking about it
naively, this ought to just work (at the expense of a larger memory
footprint). This would give 10GB strings, etc., straightaway. But perhaps
there is some subtle reason why things are more complicated than this?
For curiosity: how much memory do you have in the machine where you
want to store 10GB strings? What microprocessor is that?
Well, at work we've had a Tru64 alpha box with 8GB RAM for a couple years. We
do bioinformatics, so mmap'ing genome files (which can be significantly larger
than 4GB), making them visible as python strings, would be quite handy. The
size of these files potentially increases over time (as more sequence becomes
known)--I just picked 10GB out of the air as a proxy for "as big as my RAM and
definitely bigger than 4GB".
To put it a little more simply, I'd like to be able to assume that I can do a
read() or mmap() without having to think about any limits other than VM,
working set and available RAM.
I suspect that within a year or two everyone will want this (as RAM gets
cheaper and everyone gets an amd64 (or compatible
CPU).
Mike