B
Bartc
Tony said:I'm thinking that the overhead of such a design is unnecessary. Consider: .....
Now consider the overhead of the above while handling a 1 MB text file as
one string per line with an average line length of 80 bytes:
1000000 b/(80 b/line) = 12500 lines
On a 32-bit platform:
string16: 12500 lines*(8 bytes overhead/line) = 100 KB overhead/MB = 10%
string_t: 12500 lines*(12 bytes overhead/line) = 150 KB overhead/MB =
15%
string_null: 12500 lines*(8 bytes overhead/line) = 100 KB overhead/MB =
If the file is split into one line per string, then that's already 4 bytes
per line or 50KB/5% overhead.
Adding a length makes that 8 bytes per line or 100KB/10% overhead. But it is
only 5% /extra/. And a total size of 1.1MB instead of 1MB.
The buff_sz as you say can be optional, especially as it could double the
overhead (from 8 to 16 bytes). Where the 1MB text has been loaded into a
single memory block, then each line is not individually allocated anyway.
(And even if it was, there may be other ways of deriving the capacity of a
string of length N without storing that value with every string of length
N.)
(My efforts at this use a 16-byte descriptor, but this is part of a dynamic
type system (for a separate language). The spare capacity of a string
allocation is not stored but handled by the memory allocator, but then my
strings tend to be immutable when it comes to extending them.)
On a 64-bit platform:
string16: 12500 lines*(16 bytes overhead/line) = 200 KB overhead/MB =
20%
string_t: 12500 lines*(24 bytes overhead/line) = 300 KB overhead/MB =
30%
string_null: 12500 lines*(16 bytes overhead/line) = 200 KB overhead/MB =
20%
64-bit is always going to have this problem, effectively doubling the memory
and bandwidth requirements for the (likely) majority of programs that fit
happily into 32-bits.