Generating a unique string without normal character sets

J

Jon Gómez

Checked the math, just to be sure :).

Only concern: Would it not be possible for a Java implementation to
create Strings that can be accessed at larger indexes than integers
allow, for example, using a non-array implementation? I didn't find any
explicit mention of a length bound for Strings or String literals in JSL
3ed and VM 2ed. Of course, if Strings are implemented using char
arrays, we have the limit you set, because arrays have to be indexed by
integers.
I will do that.

But it may take some time to send.

You should receive it in a few billion years so stay tuned ....

:)

Arne

Everyone yet to be born? Perhaps one day man will have invented the
organic or cybernetic enhancements to allow individuals to communicate
efficiently immense strings of characters, only for it to be wasted,
lolspeak having come to reign in true Orwellian style, and thence the
world will come to such a pass that some knave will give their child a
name like ('lol' x (2**34359738368)). Okay, that was quite a stretch... :).

Jon.
 
J

Joshua Cranmer

Jon said:
Only concern: Would it not be possible for a Java implementation to
create Strings that can be accessed at larger indexes than integers
allow, for example, using a non-array implementation? I didn't find any
explicit mention of a length bound for Strings or String literals in JSL
3ed and VM 2ed. Of course, if Strings are implemented using char
arrays, we have the limit you set, because arrays have to be indexed by
integers.

String literals have a bound in the JVM:
<http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#7963>

CONSTANT_Utf8_info {
u1 tag;
u2 length;
u1 bytes[length];
}

This implies that literals cannot be longer than 65,535 characters. It's
not exactly an explicit bound, though.
 
A

Arne Vajhøj

Lew said:
This one already has trouble. Suppose the string contained more than
Integer.MAX_VALUE/2 characters with values > 0xff each.

Yep.

1 gigachar seems to be the safe upper limit (ignoring chars that is more
than 2 bytes in UTF-8).

Arne
 
E

Eric Sosman

Jon said:
Checked the math, just to be sure :).

Only concern: Would it not be possible for a Java implementation to
create Strings that can be accessed at larger indexes than integers
allow, for example, using a non-array implementation? I didn't find any
explicit mention of a length bound for Strings or String literals in JSL
3ed and VM 2ed. Of course, if Strings are implemented using char
arrays, we have the limit you set, because arrays have to be indexed by
integers.

Java *could* implement a String that wasn't array-based,
but that by itself wouldn't get rid of the length limitation.
You'd need to re-fit the length() and indexOf() and ... methods
to use something other than `int' to describe features of the
String -- simply changing to `long' wouldn't do, because of all
the existing code that would break. I suppose you could add new
methods like longLength() and longSubstring() and so on, but then
you'd probably have to start throwing exceptions if plain old
length() et al. are applied to a String that's too long (see
DataOutput#writeUTF() for precedent). Don't forget to re-fit
Matcher and CharSequence and ... while you're at it.
 
L

Lew

Arne Vajhøj wrote:
<http://java.sun.com/javase/6/docs/api/java/lang/String.html#getBytes()>
1 gigachar seems to be the safe upper limit
(ignoring chars that is more than 2 bytes in UTF-8).
^^^^^^^^
which, of course, we can't really do because that's how these minor
inconsistencies introduced themselves in the first place: "ignoring a char
that is more than one byte in UTF-8". Doesn't matter; computers are finite
devices and we'll always have use cases that exceed available capacity and the
unpredictability of String byte length. A method might as well return 'int'.

The moral of the story is that if the difference between a little under one
GiB and a little over two makes a difference to you, one may wish to consider
a type other than 'String', perhaps a 'Collection<StringBuilder>'.
 
A

Arne Vajhøj

Lew said:
Arne Vajhøj wrote:
<http://java.sun.com/javase/6/docs/api/java/lang/String.html#getBytes()>


^^^^^^^^
which, of course, we can't really do because that's how these minor
inconsistencies introduced themselves in the first place: "ignoring a
char that is more than one byte in UTF-8". Doesn't matter; computers
are finite devices and we'll always have use cases that exceed available
capacity and the unpredictability of String byte length. A method might
as well return 'int'.

The moral of the story is that if the difference between a little under
one GiB and a little over two makes a difference to you, one may wish to
consider a type other than 'String', perhaps a 'Collection<StringBuilder>'.

I think we can live with String limitation itself.

But the array limitation will hurt us badly in the next decade.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,278
Latest member
BuzzDefenderpro

Latest Threads

Top