J
jacob navia
As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.
This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.
A more efficient representation is:
struct string {
size_t length;
char data[];
};
The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.
Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.
A string like the one described above is not able to
resize itself. Any pointers to it would cease to be valid
when it is resized if the memory allocator is forced to
move memory around. The block where that string was
allocated is bounded by another blocks in memory, and
it is not possible to resize it.
A pointer ( an indirect representation) costs a sizeof(void *)
but allows to resize strings without invalidating the pointers
to them.
struct string {
size_t length;
char *data;
};
There is no compelling reason to choose one or the other.
It depends on the application. In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.
Syntactic sugar.
I have added some sugar to this coffee. I always liked coffee
with a bit of sugar. I feel that is too acid without it.
Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?
The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.
Not a big deal for today's compilers.
Length checked strings can then use:
String s;
....
s[2] = 'a';
I think I am proposing the obvious.
Do you agree?
jacob
pointer for its representation of strings.
This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.
A more efficient representation is:
struct string {
size_t length;
char data[];
};
The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.
Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.
A string like the one described above is not able to
resize itself. Any pointers to it would cease to be valid
when it is resized if the memory allocator is forced to
move memory around. The block where that string was
allocated is bounded by another blocks in memory, and
it is not possible to resize it.
A pointer ( an indirect representation) costs a sizeof(void *)
but allows to resize strings without invalidating the pointers
to them.
struct string {
size_t length;
char *data;
};
There is no compelling reason to choose one or the other.
It depends on the application. In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.
Syntactic sugar.
I have added some sugar to this coffee. I always liked coffee
with a bit of sugar. I feel that is too acid without it.
Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?
The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.
Not a big deal for today's compilers.
Length checked strings can then use:
String s;
....
s[2] = 'a';
I think I am proposing the obvious.
Do you agree?
jacob