Using Enumerated Types as Array Indexes

R

Robert Klemme

[ SNIP ]

Fair cop, I am semi-dyslexic I guess. I don't think it's a
particularly valid point - arguing that someone might screw up and
not know what the valid range is for bytes or shorts, for example,
and not argue that this numerically-challenged person will also have
problems with ints, is somewhat disingenuous.

Or is it just that there's so much more room with int that a sloppy
person is much less likely to have problems? That approach doesn't
fill me with confidence.

Furthermore, once this hypothetical coder starts getting negative
array indexes the odds are pretty decent that they may figure out
what the issue is.
He also failed to answer my question: What is the advantage of
thinking imprecisely in this matter?

Why did you ask the question? I wasn't thinking imprecisely. I know
that byte/short/char values will get widened when used as array
indexes. Personally I'll also ensure that these values are valid
bytes/shorts/chars...just like I would do for int.

Where's the imprecise thinking? My original reply didn't mention
anything whatsoever about array indexes _not_ being ints: in fact I
started talking about widening conversions right away. If your
original byte/short/char value is valid, then you've got no problems
at all. If you want to suppose that people are using invalid values,
well, hell, that's a bigger problem anyhow.

I tried to reconstruct how Lew and you got into this discussion but it's
hard to track it down. I think things got a bit off track when Lew wrote
One gets problems with byte->int and short->int with array indexes
the way one gets into trouble with those widening conversions
generally. The usual suspect is the lack of unsigned versions, so
widening a (byte)0xA0, for example, would result in a negative
index.

The fact that you have to use a cast demonstrates that 0xA0 is not a
literal which can be used as byte:

byte b = 0xA0; // won't compile

0xA0 is an int literal and the cast to byte already introduces the
negative sign (the value is -96 decimal). So in this case the error
would be already to assume that

byte b = (byte) 0xA0;

leaves a positive value in b. So it's not the widening conversion which
causes trouble but the cast from int to byte which looses information by
cutting off bits 8 through 31 leaving only bits 0 to 7.

http://java.sun.com/docs/books/jls/third_edition/html/conversions.html#25363

The lowest int literal directly assignable to byte is -0x80 (-128
decimal) and the highest is 0x7f (127 decimal). 0xA0 is outside this range.

It seems nobody exactly stated that you could index arrays with bytes.
Arne said
Java is a simpler language than Ada. In Java array indexes are
int and you have byte/short/int/long types and that is it.

and Mark said
OK, you mean for indexes? char works too, and long really doesn't --
it has to be cast to an int.

before Lew responded with the quote above. Of course a char "works" as
array index but the char is converted to int before indexing. However,
this conversion does not change the value of the char.

It's probably worthwhile to note that char is unsigned so in a way it's
better suited to index arrays than byte and short because all values are
legal array indexes. :)

Kind regards

robert
 
A

Andreas Leitgeb

Robert Klemme said:
All solutions to fix remainder into modulo but keep the wrap around
which have been proposed so far share a common disadvantage: if the
number range is not a multiple of the divisor the value distribution
will not be uniform.

The solution I used to fix this employed a do while loop and
compareAndSet() to reset the value to the beginning of the range (0 in
this case) in a thread safe manner.

There's a much simpler "meta-way" to fix it: just stick to powers of
2 for the array length - then the overflow *does* always happen at a
multiple of the divisor and the value distribution uniform :)
 
A

Andreas Leitgeb

Robert Klemme said:
byte b = 0xA0; // won't compile

Actually, it *does* compile due to some special rule in
the JLS w.r.t initializers.

That, however doesn't impact the the point you made.
Had you just written it thusly:
byte b; b = 0xA0; // won't compile
it would have been correct altogether. :)
 
A

Arved Sandstrom

Actually, it *does* compile due to some special rule in
the JLS w.r.t initializers.

That, however doesn't impact the the point you made.
Had you just written it thusly:
byte b; b = 0xA0; // won't compile
it would have been correct altogether. :)
Not with JDK 1.6 it doesn't compile. I usually test these things if I'm
talking about them, mainly to avoid looking like a complete dilettante
in the case that I am overlooking something basic. This scenario, and
many permutations, I tried.

"possible loss of precision" errors are reported if you try to compile
with outside range integer literals assigned to byte, short or char
variables, without casts. It doesn't matter if you try

byte b = 128;

or

byte b; b = 0x80;

either will error with a "possible loss of precision" message.

Since it's routine and OK to create values of type byte, short, int and
long from int literals (and longs from long literals also), and a cast
is redundant for such an integer literal assignment, _and_ because a
cast masks the error, I would suggest that assignments with casts such as

byte b = (byte)0xA0;

or

byte b = (byte)intVariable;

be strenuously avoided.

AHS
 
R

Robert Klemme

There's a much simpler "meta-way" to fix it: just stick to powers of
2 for the array length - then the overflow *does* always happen at a
multiple of the divisor and the value distribution uniform :)

I reckon you do not mean that suggestion serious. Just in case:
limiting to powers of 2 was not an option here. And if it was, I would
not have used modulo or AtomicInteger's features but just used binary &
to cut off leading bits.

Kind regards

robert
 
R

Robert Klemme

Andreas, which rule and which compiler?
Not with JDK 1.6 it doesn't compile. I usually test these things if I'm
talking about them, mainly to avoid looking like a complete dilettante
in the case that I am overlooking something basic. This scenario, and
many permutations, I tried.

Same here: Eclipse's compiler refuses both with "Type mismatch: cannot
convert from int to byte".
"possible loss of precision" errors are reported if you try to compile
with outside range integer literals assigned to byte, short or char
variables, without casts. It doesn't matter if you try

byte b = 128;

or

byte b; b = 0x80;

either will error with a "possible loss of precision" message.

Also, I cannot see any reason why these two situations should be treated
differently: the loss of precision is there regardless.
Since it's routine and OK to create values of type byte, short, int and
long from int literals (and longs from long literals also), and a cast
is redundant for such an integer literal assignment, _and_ because a
cast masks the error, I would suggest that assignments with casts such as

byte b = (byte)0xA0;

or

byte b = (byte)intVariable;

be strenuously avoided.

That's certainly a good rule of thumb which helps keeping one out of
trouble. Although it sometimes seems reasonable to use values in the
range [0x80, 0xFF] e.g. when defining bit masks.

Kind regards

robert
 
A

Andreas Leitgeb

Robert Klemme said:
Andreas, which rule and which compiler?

The four letter rule "oups", as it seems.

The rule itself was, that you can assign an in-range literal (e.g. 42)
to a byte variable, although you cannot feed it to a method that takes
bytes.
But I got both the rule and the context wrong in my head, so ... Duh, me!

And now let's forget this subthread, will we? ;)
 
A

Andreas Leitgeb

Robert Klemme said:
I reckon you do not mean that suggestion serious. Just in case:
limiting to powers of 2 was not an option here. And if it was, I would
not have used modulo or AtomicInteger's features but just used binary &
to cut off leading bits.

Of course I don't know the wider context, but to me it seemed as if
a power-of-2 sized array would be such a natural choice, that you'd
surely use one, anyway, already. Under these circumstances, (n % N)
would have seemed to be the same as (n & (N-1)), yet shorter and
clearer - except, as then learned, for negative n ...

May I ask, what other entity imposes the array length on you, and why
choosing the next larger power of 2 will not do?
 
R

Robert Klemme

Of course I don't know the wider context, but to me it seemed as if
a power-of-2 sized array would be such a natural choice, that you'd

That never occurred to me. Limiting to powers of 2 does have some
advantages in some situation but it's certainly nothing I would consider
"natural" generally.
surely use one, anyway, already. Under these circumstances, (n % N)
would have seemed to be the same as (n& (N-1)), yet shorter and
clearer - except, as then learned, for negative n ...

May I ask, what other entity imposes the array length on you, and why
choosing the next larger power of 2 will not do?

The value is user configurable and limiting to powers of 2 as valid
values would reduce the range of possible values unnecessarily
especially since the value is resource intensive - so the difference
between 6 and 8 could be significant. That's also why silently
adjusting was not an option.

Kind regards

robert
 
A

Andreas Leitgeb

Robert Klemme said:
That never occurred to me. Limiting to powers of 2 does have some
advantages in some situation but it's certainly nothing I would consider
"natural" generally.

Not generally, but in a context, where you've got an integer uniformly
cycling through a power-of-2 sized range, and quickly(develop-time-wise)
need a uniform cycle on a smaller range.
The value is user configurable and limiting to powers of 2 as valid
values would reduce the range of possible values unnecessarily
especially since the value is resource intensive - so the difference
between 6 and 8 could be significant.

Since I can't believe the array's length itself is at stake here, I guess
it's the objects that get stored in it, and only freed when overwritten.

If that's the case, I'd do this:
M being the configured value for the "size"
N being the smallest power-of-2 that is >= M

create the array with a size of N, and before each write for a new object
into position n, you write a null at position (n - M) & (N-1).
(modulo eventual off-by-one-errors)

But at this point it is possible that your compareAndSet approach
is actually clearer/easier to understand :)
 
R

Robert Klemme

Not generally, but in a context, where you've got an integer uniformly
cycling through a power-of-2 sized range, and quickly(develop-time-wise)
need a uniform cycle on a smaller range.

??? If the power of two is given then of course limiting sizes to power
of two is "natural" - whatever that means in case of a tautology.
Since I can't believe the array's length itself is at stake here, I guess
it's the objects that get stored in it, and only freed when overwritten.

Roughly. Items are not overwritten, dunno where you got that from.
Andreas, you are on the Holzweg. :)
If that's the case, I'd do this:
M being the configured value for the "size"
N being the smallest power-of-2 that is>= M

create the array with a size of N, and before each write for a new object
into position n, you write a null at position (n - M)& (N-1).
(modulo eventual off-by-one-errors)

Number M is given, it's the number of items to pick (in a thread safe
manner also but that is irrelevant for the math). If M is not a power
of two then N > M and N mod M != 0 and no matter how you store your M
items in an array of length N, iterating through the complete array of N
elements will never produce uniform distribution of picks per item. If
you only use first M slots of the array and iterating them you simply
waste N - M slots of the array and the power of two magic is completely
worthless.
But at this point it is possible that your compareAndSet approach
is actually clearer/easier to understand :)

Certainly.

Cheers

robert
 
A

Andreas Leitgeb

Robert Klemme said:
??? If the power of two is given then of course limiting sizes to power
of two is "natural" - whatever that means in case of a tautology.

It was merely a semi-tautology: big (2^32) range on the one hand means
that a (smaller) 2^# range on the other hand is easier handled.
A triviality, of course, but not exactly a tautology. An irrelevancy,
too, as it turned out, as you have no free choice on the smaller range.
Roughly. Items are not overwritten, dunno where you got that from.

That was a weak (and obviously inaccurate) extrapolation from some
vague patterns that I believed to recognize.
Andreas, you are on the Holzweg. :)

I expected the possibility of it happening :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top