strong/weak typing and pointers

S

Steven Bethard

Diez B. Roggisch said:
I'd second that - writing apus in php can lead to great surprises of what
actually happens - take this for example:

$foo = "abc";
$foo[0] = 65;

The result is

"6bc"

If I learned nothing else from this thread, I learned that I *never* want to
screw around with PHP. ;)
And don't forget: If you don't like the way someone overloaded some
operator, you can alter that behaviour according to your own design
philosophies.

Python has the nice property that you're not allowed to modify builtins, so no
one can ever make your Python code do anything other than:
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
TypeError: object does not support item assignment

I wonder what people think about Ruby, which, I understand, does allow you to
modify builtins. Can anyone tell me if you could make Ruby strings do the
horrible coercion that PHP strings do?

Steve
 
D

Diez B. Roggisch

I'm sorry, I guess I still don't understand this example. It sounds like
you're just going between untyped (array of bytes) and typed (functions,
data
structures, etc.) I'm not seeing how you're, for example, casting a
function to a data structure, or a data structure of one type to a data
structure of another type.

Anything can (and sometimes should) be treated just as an array of bytes,
in an analogous method to how in an OO language with a base class Object,
sometimes it's appropriate to treat a given instance of a class as an
Object, rather than
it's particular subclass of Object. I wouldn't consider treating
function, data structures, etc. as arrays of bytes (or vice versa) as
something that takes advantage of "weak-typing", but rather something that
takes advantage of the ability to treat data as untyped.

Hopefully this clarified my confusion and when you get a chance, you can
try to
explain it to me again? Thanks,


I think what Alex means is that if you allow for re-interpreting data as
simply byte arrays and the other way round, you end up with exactly the
weak typing you defined before: The arbitrary reinterpretation of memory
portions.

Consider this simple example:

int main() {
float f = 13345.0;
void *mem = (void*)&f;
int *i = ((int*)mem);
printf("%f, %i, %d\n", f, mem, *i);
}

For read-only-cases (read-only with respect to the RAM) one might be able to
allow access to the memory without opening the weak-typing loophole. But if
you want read bytes into memory, you'll end up with an array of bytes
firsthand, and whatever interpretation you impose on it by casting, there
is no way of forbidding a wrong casting.
 
D

Diez B. Roggisch

I wonder what people think about Ruby, which, I understand, does allow you
to
modify builtins. Can anyone tell me if you could make Ruby strings do the
horrible coercion that PHP strings do?

I've no idea of how well ruby developers deal with the possibility to
redefine their builtins - but it sure scares the hell out of _me_, so I'm
glad the BDFL chose not to allow us to shoot large holes into various
bodyparts by making them unmodifiable...
 
S

Steven Bethard

Diez B. Roggisch said:
I think what Alex means is that if you allow for re-interpreting data as
simply byte arrays and the other way round, you end up with exactly the
weak typing you defined before: The arbitrary reinterpretation of memory
portions.

Consider this simple example:

int main() {
float f = 13345.0;
void *mem = (void*)&f;
int *i = ((int*)mem);
printf("%f, %i, %d\n", f, mem, *i);
}

Ahh, ok, I understand where he was going now, thanks.

However, this doesn't really address my concern. Clearly, allowing you to treat
things as untyped *allows* you to cast one type of structure to another, but it
definitely doesn't *require* you to do so. In your example, yes, casting to
(void*) lets you cast a piece of memory back and forth between float and int,
but I still don't know when I would actually want to do that...

Does this make my concern any clearer? What I'm asking for is an example like
yours above that not only shows that you can treat, say, the bits of a float as
an integer, but also shows why this would be useful.

Thanks again,

Steve
 
D

Diez B. Roggisch

Does this make my concern any clearer? What I'm asking for is an example
like yours above that not only shows that you can treat, say, the bits of
a float as an integer, but also shows why this would be useful.

The question is not so much if there is an actual usecase, but more that if
things _can_ be done, they inevitably _will_ be done. People have done so
all the time. I can remeber abusing 32bit pointers in 68k processors by
altering the most-significant byte. It took advantage of the fact that the
old 68k had only 24 address registers, thus ignoring the msbyte. That
allowed to pass 2 parameters in one pointer. And was for a short period of
time considered a clever trick....

Today, with lots of memory and faster processers, certain optimization
techniques that required creative reinterpretaiton of bits might have come
somewhat out of fashion - but the more low-level you get (e.g. drivers or
embedded devices) the more appaling they might look.
 
S

Steven Bethard

Diez B. Roggisch said:
The question is not so much if there is an actual usecase, but more that if
things _can_ be done, they inevitably _will_ be done.

No, the (my) question really was for an actual usecase. ;) People do a lot of
things in programming languages, not all of them particularly good or
appropriate. The recent example of intrinsics.replace to change the behavior of
Python's str class strikes me as one such example. =) I would be much more
convinced that weak-typing is useful if someone could actually use it for me. ;)
I can remeber abusing 32bit pointers in 68k processors by
altering the most-significant byte. It took advantage of the fact that the
old 68k had only 24 address registers, thus ignoring the msbyte. That
allowed to pass 2 parameters in one pointer. And was for a short period of
time considered a clever trick....

I'm not sure I see how this is taking advantage of weak-typing, unless I
misunderstand what you were doing here. Did you ever interpret your
two-parameter structure as anything other than either a set of bits or the two
parameters? You didn't ever treat the set of bits as a float, for example,
right? You *could* have, of course, but I'm guessing your interpretation of the
bits was consistent...

My point here is that I think in most code, even when people do a bunch of
bit-twiddling, they have a single underlying structure in mind, and therefore
you see them treat the bits as one of two things: (1) The sequence of bits, i.e.
the untyped memory block, or (2) the intended structure. IMHO, an example of
taking advantage of weak-typing would be a case where you treat the bits as
three different things: the sequence of bits, and two (mutually exclusive)
intended structures.

I guess I'm drawing a thin line here, but I see a difference between using the
untyped (bit-based) representation of a structure and actually converting a
structure between two different types.

Steve
 
A

Alex Martelli

Steven Bethard said:
Does this make my concern any clearer? What I'm asking for is an example
like yours above that not only shows that you can treat, say, the bits of
a float as an integer, but also shows why this would be useful.

Given a float, extract the (so-called) "mantissa" (what a misnomer!) and
exponent. Can you see the usefulness of _that_? Can you see that
treating the bits that compose the float as an int and using masking and
shifting is the obvious way to perform this task?

Say I need to compute some unary float function, such as 'sin', with
high speed and precision. One reasonable approach: normalize the float
input to a standard range (say 0 to pi/4, remembering what kind of sign
inversions &c you need to perform at result time); get "mantissa" (pah!)
and exponent and use the latter, partly to select the right lookup table
and partly to shift the mantissa appropriately to make it an index into
said result table, while keeping track of the bits that shifted out;
read out the result base and the multiplier for interpolation, multiply
the latter by the bits that shifted out and add the result to the result
base; perform sign or other symmetry inversions as previously recorded.

There -- you have the function computer to whatever precision is
requested, typically an ULP. Depending on your CPU, you may have HW
that obviates the need for some or all of these manipulations - but you
won't have it for all transcendentals, and what you don't have in HW
you'll need to do in SW -- and being able to get at the bits of the
floating point representation is often the best approach for that task.

Although the details were more antiquated, that was elementary stuff
taught in electronic engineers' first-year computing courses back in the
'70s, just in case we ever needed to code our own transcendentals (in
Fortran, of course -- you weren't expected to master machine code unless
you took computing electives in 3rd and later years, much less exoterica
such as Pascal, Lisp or APL). Is it considered advanced or specialized
these days?!


Alex
 
S

Steven Bethard

[snip description of using integer parts of float representation]
Although the details were more antiquated, that was elementary stuff
taught in electronic engineers' first-year computing courses back in the
'70s, just in case we ever needed to code our own transcendentals (in
Fortran, of course -- you weren't expected to master machine code unless
you took computing electives in 3rd and later years, much less exoterica
such as Pascal, Lisp or APL). Is it considered advanced or specialized
these days?!

I'm obviously upsetting you, and I can see that we're still not quite
understanding each other. I have to assume that you're not the only one I'm
upsetting through these misunderstandings, so for the sake of the list, I'll
stop responding to this thread. Thanks everyone for a good discussion!

Steve

P.S. If anyone would like to know my response to the float representation
example, please contact me directly instead.
 
A

Alex Martelli

Steven Bethard said:
I'm obviously upsetting you, and I can see that we're still not quite
understanding each other. I have to assume that you're not the only one I'm
upsetting through these misunderstandings, so for the sake of the list, I'll
stop responding to this thread. Thanks everyone for a good discussion!

I apologize if I have given the impression of being upset. I am, in a
way, I guess -- astonished and nonplusses, as if somebody asked me to
justify the existence of bread -- not of some exotic food, mind you, but
of the most obvious, elementary, fundamental substance of earthly
sustenance (in my culture, and many others around it).
P.S. If anyone would like to know my response to the float representation
example, please contact me directly instead.

I promise not to ACT upset if you explain it here. So, we have an area
of 8 bytes in memory which we need to be able to treat as:
8 bytes, for I/O purposes, say;
a float, to feed it to some specialized register, say;
a bit indicating sign plus 15 for mantissa plus 48 for significand,
or the like, to perform masking and shifting thereof in SW -- a
structure of three odd-bit-sized integers juxtaposed;
and this is ONE example -- the specific one you had asked for.

Another example: we're going to send a controlblock of 64 bytes to some
HW peripheral, and get it back perhaps with some mods -- a typical
control/status arrangement. Depending on the top 2 (or in some case 4)
bytes' value, the structure may need to be interpreted in several
possible ways, in terms of juxtaposition of characters, halfwords and
longwords. Again, the driver responsible for talking with this
peripheral needs to be able to superimpose on the 64 bytes any of
several possible C-level struct's -- the cleanest way to do this would
appear to be pointer-casting, though unions would (as usual, of course)
be essentially equivalent. In Python, or another language that lets me
pack and unpack a struct to/from bytes in a controlled way (in Python's
case via the struct module) I can do that through a _copy_ -- I need to
go through a 'raw bytes' stage, cannot do the overlay directly; but
that's little more than a figleaf arrangement -- spending real CPU and
RAM operations because I can't be lowlevel/weakly-typed enough.


Alex
 
A

Alex Martelli

Steven Bethard said:
I wonder what people think about Ruby, which, I understand, does allow you to
modify builtins. Can anyone tell me if you could make Ruby strings do the
horrible coercion that PHP strings do?

Yes, you could. Reliable Ruby friends tell me that's not DONE in the
real world of Ruby, any more than pythonistas call their methods' first
argument 'foo' rather than 'self' or pepper their code with 'exec'
statements or code 200-chars nested-lambda oneliners. But though
culturally frowned on, it _is_ technically possible.

The one real example I saw, which was enough to turn me off my quest to
explore Ruby for production purposes, was making (builtin) string
comparisons case-insensitive -- apparently that _IS_ the kind of thing
_SOME_ perhaps-inexperienced Rubystas _DO_ perpetrate (breaking library
modules left, right, and center, of course). Maybe it's similar to
rather inexperienced Pythonistas dead keen on "exec myname+'='+value"; I
_have_ seen that horror perpetrated in real Python code (doesn't break
any library, but slows function execution down by 10 times w/o any real
advantage wrt dicts or bunch usage, and is a bug-prone piece too...).


Alex
 
M

Michael Hobbs

Steven Bethard said:
My point here is that I think in most code, even when people do a bunch of
bit-twiddling, they have a single underlying structure in mind, and therefore
you see them treat the bits as one of two things: (1) The sequence of bits, i.e.
the untyped memory block, or (2) the intended structure. IMHO, an example of
taking advantage of weak-typing would be a case where you treat the bits as
three different things: the sequence of bits, and two (mutually exclusive)
intended structures.

One word: union
 
S

Steven Bethard

Alex Martelli said:
I apologize if I have given the impression of being upset.

No problem -- my mistake for misinterpreting you. I'm just sensitive to these
kind of things because I know I've previously miscommunicated, and
unintentionally got people upset before (you being one of them). ;)
I am, in a
way, I guess -- astonished and nonplusses, as if somebody asked me to
justify the existence of bread -- not of some exotic food, mind you, but
of the most obvious, elementary, fundamental substance of earthly
sustenance (in my culture, and many others around it).

Yeah, this goes to the heart of the misunderstanding. I'm not asking anyone to
justify the _existence_ of weak-typing. Weak-typing is a direct result of a
language's support for untyped (bit/byte) data. I agree 100% that this sort of
data is not only useful, but often essential in any low-level (e.g. OS, hardware
driver, etc.) code.
So, we have an area
of 8 bytes in memory which we need to be able to treat as:
8 bytes, for I/O purposes, say;
a float, to feed it to some specialized register, say;
a bit indicating sign plus 15 for mantissa plus 48 for significand,
or the like, to perform masking and shifting thereof in SW -- a
structure of three odd-bit-sized integers juxtaposed;

As a quick refresher, I quote myself in what I was looking for:
"taking advantage of weak-typing would be a case where you treat the bits as
three different things: the sequence of bits, and two (mutually exclusive)
intended structures."

My response to this example is that your two intended structures are not
mutually exclusive. Yes, you have to do some bit-twiddling, but only because
your float struct doesn't have get_sign, get_mantissa and get_significand
methods. ;) You're still dealing with the same representation, not converting
to a different type. You're just addressing a lower level part of the
representation.

I can see the point though: at least in most of the languages I'm familiar with,
float is declared as a type while there's no subtype of float that specifies the
sign, mantissa and significand.

(Oh, and by the way, in case you really were wondering, they still do teach
float representations, even in computer science (as opposed to computer
engineering), or at least they did through 1999.)
Another example: we're going to send a controlblock of 64 bytes to some
HW peripheral, and get it back perhaps with some mods -- a typical
control/status arrangement. Depending on the top 2 (or in some case 4)
bytes' value, the structure may need to be interpreted in several
possible ways, in terms of juxtaposition of characters, halfwords and
longwords. Again, the driver responsible for talking with this
peripheral needs to be able to superimpose on the 64 bytes any of
several possible C-level struct's -- the cleanest way to do this would
appear to be pointer-casting, though unions would (as usual, of course)
be essentially equivalent.

Is the interpretation of the controlblock uniquely defined by the top 2 or 4
bytes, or are there some values for the top 2 or 4 bytes for which I have to
apply two different interpretations (C-level structs) to the same sequence of
bits?

If the top 2 or 4 bytes uniquely define the structs, then I would just say
you're just going back and forth between a typed structure and its untyped
representation. If the top 2 or 4 bytes can specify multiple interpretations
for the same sequence of bits, then this is the example I was looking for. =)

Steve
 
S

Steven Bethard

Michael Hobbs said:
One word: union

Interestingly, unions can be well-defined even in a strongly-typed language,
e.g. OCaml:

# type int_or_list = Int of int | List of int list;;
type int_or_list = Int of int | List of int list
# Int 1;;
- : int_or_list = Int 1
# List [1; 2];;
- : int_or_list = List [1; 2]

The reason for this is that at any given time in OCaml, the sequence of bits is
only interpretable as *one* of the two types, never both. If you have a good
example of using a union (in C probably, since OCaml wouldn't let you do this I
don't think) where you want to treat a given sequence of bytes as both types *at
once*, that would be great!

Thanks,

Steve
 
A

Alex Martelli

Steven Bethard said:
Yeah, this goes to the heart of the misunderstanding. I'm not asking
anyone to justify the _existence_ of weak-typing. Weak-typing is a direct
result of a language's support for untyped (bit/byte) data. I agree 100%
that this sort of data is not only useful, but often essential in any
low-level (e.g. OS, hardware driver, etc.) code.

But so is the ability to get at the same bits/bytes in structured ways.
As a quick refresher, I quote myself in what I was looking for: "taking
advantage of weak-typing would be a case where you treat the bits as three
different things: the sequence of bits, and two (mutually exclusive)
intended structures."

My response to this example is that your two intended structures are not
mutually exclusive. Yes, you have to do some bit-twiddling, but only
because your float struct doesn't have get_sign, get_mantissa and
get_significand methods. ;) You're still dealing with the same
representation, not converting to a different type. You're just
addressing a lower level part of the representation.

What do you mean by "mutually exclusive"? "Never useful at the same
time"? You're asking for an example of things never useful at the same
time that are useful at the same time?!

The struct type with so many bits being signs, exponent, significands,
IS a distinct type from double-precision float -- it's the
representation of the latter according to some standard. To multiply by
0.1 I have to have a float, to 'get the N-bit integer that gives the
exponent shifted right by 3' I have to have that struct type. They're
totally distinct (not "mutually exclusive" because they ARE useful as
ways to look at the same bitbunch at the same time, of course) types,
ways to analyze or interpret the same bunch of bits (apart from the
untyped representation where I can do binary I/O with them, too).

I can see the point though: at least in most of the languages I'm familiar
with, float is declared as a type while there's no subtype of float that
specifies the sign, mantissa and significand.

Right. To get at the bitfields, you use weaktyping instead.

Is the interpretation of the controlblock uniquely defined by the top 2 or 4
bytes, or are there some values for the top 2 or 4 bytes for which I have to
apply two different interpretations (C-level structs) to the same sequence of
bits?

In the HW I was thinking of, the former is the case.
If the top 2 or 4 bytes uniquely define the structs, then I would just say
you're just going back and forth between a typed structure and its untyped
representation. If the top 2 or 4 bytes can specify multiple interpretations
for the same sequence of bits, then this is the example I was looking for. =)

I need to examine the top bytes of the block as the HW returned it, in
some cases, to know what struct type is most useful to interpret the
bunch of bits. There is typically only one type (besides 'just a bunch
of 64 bytes') that it useful at _one_ given time. But weak typing does
not require parallel processing without locks -- only if two independent
threads of controls were looking at the same bits concurrently from two
separate processors would saying "at ONE time" make sense... true and
unfettered concurrent access...

As for two different interpretations of the same bits being useful (not
"at the same time"), consider a 16-bit field that can be seen as one
16-bit word or two 8-bit bytes. In the former case, '0' means the whole
operation concluded successfully, any non-0 means problems were
encountered. So, a piece of code that just needs a pass/nonpass filter
on the operation is best advised to tread that field as a 16-bit word,
so it can test it for == or != 0 atomically.

At a deeper level, one byte indicates possible problems of one kind (say
ones "intrinsic" to the procedure/operation in question), another
indicates possible problems of a different kind (say ones "extrinsic" to
the procedure per se, but caused by preemption, power failures, etc).
Unix return-status values aren't too far away from this. If you need
accurate diagnosis of what went wrong, seeing the same field as two
8-bit bytes is handier (assuming you can get some kind of lock in that
case, since you are then dealing with nonatomic testing).

You could see a test such as "if x->field16 == 0:" as a weird shorthand
for "if x->field8_a == 0 and x->field8_b == 0:", but depending on
considerations of atomicity it might not even be.


Another example where the same sequence of bits may be usefully
interpreted in more ways at the same time: given a string of bytes which
encodes some unicode text in utf-8 it's clearly useful to consider it as
such, parsing it left to right byte by byte to find the unicode chars
being encoded and display the proper glyphs, etc. But I may also want
to walk the same area of memory as a sequence of 64-bit words to compute
a simple checksum to ensure data integrity (as well as the usual need
for 'untyped' bytescan for I/O). Or, say I don't know whether the
incoming data were utf-8 or utf-16; by walking over them in both 1-byte
(utf-8) and 2-byte units I may well be able to get strong heuristic
indications of which of the two encodings was in use. Similar
heuristics are sometimes very useful even in determining whether a bunch
of 4-byte words from a record are floats or ints -- as long, of course,
as you CAN walk them both ways and compare strangeness-indicators. If
you even need to recover old data from datasets whose details were lost,
you'll find that out for yourself.


Alex
 
C

Christophe Cavalaria

Michael said:
One word: union
Note that in the C standard, writing to part A of an union and reading from
part B is UB : undefined behavior and so it should *not* be used.
 
S

Steven Bethard

[snip example decomposing float representation into mantissa, etc.][snip example determining struct type from first few bytes][snip example decomposing 16 bit error code into two 8 bit error codes][snip example determining utf-8 or utf-16 by trying byte stream as both]

Thanks for the examples!

I'm not quite convinced by the decomposition examples or the struct type
example, but the UTF example is definitely convincing. I can imagine that you
could extend this type of example to any case where you didn't know the actual
type of a struct. Given this situation, you could try treating the bytes as
each of the possible struct types, and see (heuristically or perhaps with a
machine learning approach) which struct type is most appropriate.

This definitely meets my criterion of treating the same set of bytes as two
different structures, and it's even useful! =) Thanks!

Steve
 
M

Michael Hobbs

Steven Bethard said:
The reason for this is that at any given time in OCaml, the sequence of bits is
only interpretable as *one* of the two types, never both. If you have a good
example of using a union (in C probably, since OCaml wouldn't let you do this I
don't think) where you want to treat a given sequence of bytes as both types *at
once*, that would be great!

This example is a little weak, but may be sufficient. The in_addr
structure used for sockets usually uses a union to provide different
views to the underlying 32-bit address. You can access the address
as 4 8-bit values, 2 16-bit values, or 1 32-bit value. Most code
these days only use the 4 8-bit representation, but the interface is
there.

Another possible example comes from the Windows API. Some of the
functions take an arbitrary length structure. If you want to make a
simple call to the function, you pass a small structure. If you
want to make a more complex call to the function, you pass a larger
structure that has more fields tacked on to the end. Usually, the
first field in the structure is an int that specifies how large the
structure is. It is used as sort of a crude version of OO in C.

I'm not sure if these are the kinds of examples you're looking for.
I don't know how anyone would be able to use a sequence of bytes as
two types of data at once. There is almost always some sort of
indicator that specifies how to interpret the bytes; otherwise, it
is just garbage.

-- Mike
 
D

Diez B. Roggisch

Steven said:
Michael Hobbs said:
One word: union

Interestingly, unions can be well-defined even in a strongly-typed
language, e.g. OCaml:

# type int_or_list = Int of int | List of int list;;
type int_or_list = Int of int | List of int list
# Int 1;;
- : int_or_list = Int 1
# List [1; 2];;
- : int_or_list = List [1; 2]

Unions in functional languages are also known as direct sums of types (as
opposed to products, which form tuples). And trying to access a union that
holds an int as list will yield an error - runtime, most probably. So there
is no way of reinterpreting an int as list, which still satisfies the
paragdigms of a strong typed language.
 
G

Greg Ewing

Diez said:
I can remeber abusing 32bit pointers in 68k processors by
altering the most-significant byte.

Apple did this in early versions of the Memory Manager
of classic MacOS, using the upper 8 bits of a Handle
for various flags. You weren't supposed to make any
assumptions about what the upper byte contained, but
of course some people did... and their applications
broke when 32-bit addressing came in...
 
M

Mike Meyer

Steven Bethard said:
I'm sure some people would argue that assembly language is untyped (not
statically or dynamically typed) and that the operations are defined on bits,
but this is definitely the best example I've seen. Thanks!

The previously mentioned BCPL has the exact same property. For that
matter, early versions of C used to allow it to a large degree. I've
actually compiled programs written as "char *main = { ... }".

To me, a dynamically typed language is one where objects - rather than
variables - have a type attached.

<mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top