strong/weak typing and pointers

Mike Meyer · Nov 3, 2004

Steven Bethard said:
Don't get me wrong -- I do understand your point. In every case I can think of,
there is no reason to want weak-typing (PL theory definition) in a
dynamically-typed language. On the other hand, I haven't really seen any good
cases for wanting weak-typing in a statically-typed language either.

First of all, let's mention a truly weakly-typed language. BCPL, one
of C's predecessors. Variables don't have types, they just hold
words. Operations treat those variables as having a different type:
adding them as ints, or adding them as floats. Dereferencing them to
arrive at another word. Subroutines were just variables that held a
pointer to code, and people would actually write code that looked like
(pseudo-c):

func() {
init_func ;
func = foo ;
foo: func_code ;
}

To only run the initialization code the first time the function was
invoked, but not any other times. Of course, there was an external
program (written in BCPL) that did type inferencing, and would warn
you when you used something as other than what it really was.

And yes, BCPL saw real use. I've used a desktop DOS that was written
in BCPL. The rest of the system was written in C, which made life
*very* interesting.

Now, as to why one would *want* languages that let you treat things as
other than what they were.

It's much easier to write functions that convert 16, 32 and 64 bit
quantities from network order to host order (and vice versa) if you
can treat them as an array of bytes, even though you'll want to treat
them as longer hunks while dealing with them. When talking to
hardware, you can get some really *strange* things. You may have a
location that is an address most of the time, but part of the time is
a control register full of bits to toggle. When doing cryptography,
you very often want to treat the string of characters you're
encrypting as a string of words of some length, because that's the
size chunk that the algorythm encrypts. Marshelling has already been
mentioned on this thread. You may well want to marshal ints and floats
in binary form, meaning you'll need to treat that array of bytes as
being of that type.

You can also look through the python library for places where struct
is used - most of those will involve doing something where you want to
treat a string of bytes as something else.

Finally, I don't see that there's that much difference between the two
different definitions of 'weakly typed'. Both can be described as
treating an object as if it were of some type other than what it
really is. In one case, you abuse the raw bits, and in the other you
coerce the object to a different type. Both amount to the same thing:

a = "10"
b = 5
c = a + b

In a strongly typed language, I get an error. In a weakly typed
language, I get something else. Either a pointer beyond the end of the
string a, or 15, depending on exactly how the a object is abused.

<mike

Steven Bethard · Nov 3, 2004

Mike Meyer said:
First of all, let's mention a truly weakly-typed language. BCPL, one
of C's predecessors. Variables don't have types, they just hold
words.

So BCPL had no compile time checking? If this is true, BCPL is a good example
of a dynamically- and weakly-typed (PL theory definition) language...

Finally, I don't see that there's that much difference between the two
different definitions of 'weakly typed'. Both can be described as
treating an object as if it were of some type other than what it
really is. In one case, you abuse the raw bits, and in the other you
coerce the object to a different type.

Would you then classify BCPL as weakly- or strongly-typed? It seems like you
might call it "strongly-typed" since every variable just holds words, so every
use of a variable is thus just the use of a word, thus you would never be
"treating an object as if it were of some type other than what it really is".

Steve

Steven Bethard · Nov 3, 2004

Mike Meyer said:
Finally, I don't see that there's that much difference between the two
different definitions of 'weakly typed'. Both can be described as
treating an object as if it were of some type other than what it
really is. In one case, you abuse the raw bits, and in the other you
coerce the object to a different type.

One other thing: If you lump coercions with weak-typing, you allow the code
written in a language to adjust the degree of "weakness" of a language. Any
language (like Python) that allows you to override operators allows you to
create new coercions[1]. So if I don't like the strong/weak classification of
my language, I can always make it more "weakly-typed" by just adding more
nonstandard coercions.

IMHO, classification of a language as strongly- or weakly-typed should be
invariant of the code written in a language -- it should be something associated
with the language definition itself. Lumping weak-typing with coercion makes
this impossible.

Steve

[1]http://mail.python.org/pipermail/python-list/2004-November/249023.html

Mike Meyer · Nov 3, 2004

Steven Bethard said:
So BCPL had no compile time checking? If this is true, BCPL is a good example
of a dynamically- and weakly-typed (PL theory definition) language...

I wouldn't call BCPL dynamically typed. BCPL has no run-time type
checking either. That seems to be the defining feature of dynamically
typed languages.

Would you then classify BCPL as weakly- or strongly-typed? It seems like you
might call it "strongly-typed" since every variable just holds words, so every
use of a variable is thus just the use of a word, thus you would never be
"treating an object as if it were of some type other than what it really is".

A word is just a unit of storage, not a type. Words hold values with
types - integer, float, pointer, code, chars. Nothing in BCPL prevents
you from treating a word as any type at all. You can call a pointer to
string, or do an integer add of a pair of floats. So it's weakly
typed.

<mike

JCM · Nov 3, 2004

ah, good example.

So, would it be valid to say:
the more coercion (or automatic conversion) rules a language has, the
weaker the typing?

If that's what your definition of weak typing is. The OP seemed to be
asking about re-interpreting the representation of a value of one type
as a different type.

Gabriel Zachmann · Nov 3, 2004

Just a little question:

would sort of a summary of this thread be of any help?

Regards,
gab.

--
/-------------------------------------------------------------------------\
| There are works which wait, |
| and which one does not understand for a long time; [...] |
| for the question often arrives a terribly long time after the answer. |
| (Oscar Wilde) |
+-------------------------------------------------------------------------+
| (e-mail address removed)-bonn.de __@/' www.gabrielzachmann.org |
\-------------------------------------------------------------------------/

JCM · Nov 3, 2004

If that's what your definition of weak typing is. The OP seemed to be
asking about re-interpreting the representation of a value of one type
as a different type.

Oops--I guess you are the OP.

When people talk about "weak typing" they generally mean either
implicit conversions (or operations on values of different types),
or reinterpreting representations of values as a different type.
The former, in my opinion, is not about weak typing.

Steven Bethard · Nov 3, 2004

Gabriel said:
would sort of a summary of this thread be of any help?

Here's a first stab at one:

In summary, there are basically three interpretations of "weak-typing" discussed
in this thread:

(1) A language is "weakly-typed" if it allows code to take a block of memory
that was originally defined as one type and reinterpret the bits of this block
as another type.

(2) A language is "weakly-typed" if it has a large number of implicit coercions.

(3) A language is "weakly-typed" if it often treats objects of one type as other
types.

Some points and problems addressed with each of these definitions:

Definition 1 is the definition most commonly used in Programming Languages
literature, and allows a language to be called "weakly-typed" based only on the
language definition. However, for all intents and purposes, it is only
applicable to statically typed languages; no one on the list could come up with
a dyamically typed language that allowed bit-reinterpretation.

Definition 2 seemed to be the definition most commonly used on the list, most
likely because it is actually applicable to a dynamically typed language like
Python. It has the problem that in a language that supports operator
overloading (like Python), programmers can make their language more
"weakly-typed" by simply providing additional coercions, thus whether or not a
language is called "weakly-typed" depends both on the language definition and
any code written in the language.

Definition 3 was an attempt to unify the first two definitions into a single
definition by describing both coercion and bit-reinterpretation as treating
"objects of one type as other types". This definition has the advantage of
better coverage, but has all the disadvantages of Definition 2. It is also
unclear as to how weak a "weakly-typed" language is if it both allows
bit-reinterpretation and has a large number of implicit coercions. (For
example, is a language that allows bit-reinterpretation and only a few implicit
coercions more or less "weakly-typed" than a language that doesn't allow
bit-reinterpretation, but has a large number of implicit coercions?)

I'll leave it to others to classify the various languages by these definitions.

Steve

Alex Martelli · Nov 3, 2004

JCM said:
When people talk about "weak typing" they generally mean either
implicit conversions (or operations on values of different types),
or reinterpreting representations of values as a different type.

I've seen people complain about "weak typing" mostly to mean an entirely
different issue: types being attached to objects and NOT to names.

Alex

Alex Martelli · Nov 3, 2004

Steven Bethard said:
On the other hand, I haven't really seen any good
cases for wanting weak-typing in a statically-typed language either.

How would an operating system's filesystems store arbitrary sequences of
bytes (which might be floats, int, whatever -- only the application
knows) into disk pages (blocks of, say , 4096 bytes each) otherwise? Or
are you saying that operating systems' kernels should all be implemented
in dynamically-typed languages, or that the structureless filesystem
concept that was the fortune of Unix (and is common today to other OSs,
too), is not "good"?

Even if you design a new OS based on a filesystem whose files are all
"strongly typed" (EEK, but that's another issue), how do you have
interoperate with other boxes, with the whole internet, without the
ability to type-pun ("weak-typing") when necessary...?

Alex

Alex Martelli · Nov 3, 2004

Steven Bethard said:
Some programmers may actually want "a" + 10 == 10.

Some people may actually want to drink poisoned kool-aid and join the
great wise extraterrestrials on their comet in the skies. That doesn't
mean I will look with favour upon those who aid and abet such goals.

Alex

Jeff Shannon · Nov 3, 2004

Steven said:
Gabriel Zachmann writes:

would sort of a summary of this thread be of any help?

Click to expand...

Here's a first stab at one:

[...]
(2) A language is "weakly-typed" if it has a large number of implicit coercions.

[...]
Definition 2 seemed to be the definition most commonly used on the list, most
likely because it is actually applicable to a dynamically typed language like
Python. It has the problem that in a language that supports operator
overloading (like Python), programmers can make their language more
"weakly-typed" by simply providing additional coercions, thus whether or not a
language is called "weakly-typed" depends both on the language definition and
any code written in the language.

A case could be made that this "problem" isn't really valid if you look
at "implicit coercions" in the right way.

I'd argue that a programmer-overloaded operation providing coercion is
not _implicit_ in the same sense that language-default coercion is.
Admittedly, the coercion may not be immediately evident at the point of
use, but one can still find the explicitly-coercing code somewhere
inside the application (and/or included libraries). In contrast, the
coercions that happen in Perl, PHP, etc., are not explicitly stated
*anywhere* in the application. The difference between these two
scenarios is, at least in my mind, very distinct and (at least as far as
language philosophy) very profound -- it's one of *permitting*
semi-implicit coercions (if the programmer *really* wants them) versus
one of *mandating* implicit coercions whether the programmer wants them
or not.

In other words, definition 2 should read that a language can be
considered "weakly typed" if the *language definition* specifies a large
number of implicit coercions.

Jeff Shannon
Technician/Programmer
Credit International

JCM · Nov 4, 2004

I've seen people complain about "weak typing" mostly to mean an entirely
different issue: types being attached to objects and NOT to names.

Ah yep, there's that one too. But I hope most people call that static
typing. At least I think I do. c.l.py is bad for my mental dictionary.

JCM · Nov 4, 2004

Steven Bethard said:
(1) A language is "weakly-typed" if it allows code to take a block of memory ....
Definition 1 is the definition most commonly used in Programming
Languages literature, and allows a language to be called
"weakly-typed" based only on the language definition. However, for
all intents and purposes, it is only applicable to statically typed
languages; no one on the list could come up with a dyamically typed
language that allowed bit-reinterpretation.

Assembly language. The types of values are implied by what
instructions you use.

Steven Bethard · Nov 4, 2004

Alex Martelli said:
How would an operating system's filesystems store arbitrary sequences of
bytes (which might be floats, int, whatever -- only the application
knows) into disk pages (blocks of, say , 4096 bytes each) otherwise?

Valid point of course. But the OS doesn't really take advantage of weak-typing
here if it takes an arbitrary sequence of bytes and stores an arbitrary sequence
of bytes. I haven't written much OS code (just a prototype system back in
undergrad), but I never cast one type of struct to another -- to and from void*,
but never between types.

Of course, I'm sure there're are a number of good reasons to do so -- my claim
was only that I hadn't seen them. I'd be grateful if you could point me to an
example. =)

Even if you design a new OS based on a filesystem whose files are all
"strongly typed"

You really do think I'm satan, don't you?

Steve

Steven Bethard · Nov 4, 2004

JCM said:
Assembly language. The types of values are implied by what
instructions you use.

I'm sure some people would argue that assembly language is untyped (not
statically or dynamically typed) and that the operations are defined on bits,
but this is definitely the best example I've seen. Thanks!

Steve

Alex Martelli · Nov 4, 2004

Steven Bethard said:
Valid point of course. But the OS doesn't really take advantage of
weak-typing here if it takes an arbitrary sequence of bytes and stores an
arbitrary sequence of bytes. I haven't written much OS code (just a
prototype system back in undergrad), but I never cast one type of struct
to another -- to and from void*, but never between types.

Is the OS going to be able to read something from disk and *USE* it?

Of course, I'm sure there're are a number of good reasons to do so -- my
claim was only that I hadn't seen them. I'd be grateful if you could
point me to an example. =)

Suppose for example that you would like your OS to be able to load
executable code from disk into memory and execute it. Suppose you would
like it to be able to read some configuration parameters from disk and
set its own internal data structures accordingly. At one level, as it
goes to disk or comes back from there, you have arrays of bytes. But in
memory, you want functions that can be called appropriately (a device
driver residing in a module) or data structures which, differently from
an array of byte, DO have structure -- for example, a partition table
for a disk, with information to specific filesystem drivers as to what
partition is to be treated in what way.

I don't understand how you can have failed to see a zillion more
examples of operating systems actively _using_ data read from disk,
since it's such a widespread phenomenon nowadays.

You really do think I'm satan, don't you?

I'm old enough to have fought my way through filesystems more strongly
typed than plain streams of bytes, sure -- Unix was already around but
not all-pervasive yet. I remember peripherals which wanted streams, but
streams of _SIX_-bit "bytes" -- so you had to have somewhere a pack-and-
unpack routine that could take (e.g.) a block of 48 8-bit bytes and
reinterpret it as a block of 64 6-bit bytes, or viceversa. I hope there
aren't any more of _those_ around -- but in exchange we have, for
example, pervasive issues of unicode vs byte streams and encodings.

It's not just files, either. We have memory sliced up into pages, and a
page is, say, a well defined object of 4096 bytes. But we want to be
able to store all different kinds of stuff into those bytes - we HAVE
to, in fact, because that is all the memory we have... all pages...

If your point is that all you need is, not to "overlay" different
structures onto the same address, but "just" to overlay "look at this as
raw bytes" upon any structure and viceversa -- can't you see you're
doing exactly the same thing with just one conceptual extra step?
Instead of
struct foo* p = ...;
struct bar* q = (struct bar*) p;
you're thinking
struct foo* p = ...;
void *v = (void *) p;
struct bar* q = (struct bar*) v;
but it's just the same thing, and v can be optimized away.

Above the lowest levels, you can get away with (at least conceptually)
copying stuff in order to be able to reinterpret bits, as you can do in
Python with x = struct.unpack(f1, struct.pack(f2, y)) -- you can do
plenty of bit-level reinterpretation but not *in-place*, only via
copying. But you can't generally afford that luxury when you dig deep
enough -- copying bits around when you only need to reinterpret them is
paying a real cost in memory and CPU and bus bandwidth, after all. It
doesn't have to be OS-level: any virtual machine has similar issues.
So, look at the CPython interpreter sources, for example... what
performance price would it have to pay if it couldn't cast pointers but
rather had to copy bits around each time it now does a cast?

Alex

Diez B. Roggisch · Nov 4, 2004

A case could be made that this "problem" isn't really valid if you look

at "implicit coercions" in the right way.

I'd argue that a programmer-overloaded operation providing coercion is
not _implicit_ in the same sense that language-default coercion is.
Admittedly, the coercion may not be immediately evident at the point of
use, but one can still find the explicitly-coercing code somewhere
inside the application (and/or included libraries). In contrast, the
coercions that happen in Perl, PHP, etc., are not explicitly stated
*anywhere* in the application. The difference between these two
scenarios is, at least in my mind, very distinct and (at least as far as
language philosophy) very profound -- it's one of *permitting*
semi-implicit coercions (if the programmer *really* wants them) versus
one of *mandating* implicit coercions whether the programmer wants them
or not.

I'd second that - writing apus in php can lead to great surprises of what
actually happens - take this for example:

$foo = "abc";
$foo[0] = 65;

The result is

"6bc"

I have no idea what php actually _does_ here- perform a string conversion on
65, then taking the most signficant digit? There's all sorts of stuff like
that in php.

So while overloading allows for deliberate (and thus hopefully well-defined
or at least more or less understood) coercions, built-in doesen't.

And don't forget: If you don't like the way someone overloaded some
operator, you can alter that behaviour according to your own design
philosophies.

Piet van Oostrum · Nov 4, 2004

SB> (1) A language is "weakly-typed" if it allows code to take a block of
SB> memory that was originally defined as one type and reinterpret the bits
SB> of this block as another type.
[...]
SB> Definition 1 is the definition most commonly used in Programming
SB> Languages literature, and allows a language to be called "weakly-typed"
SB> based only on the language definition. However, for all intents and
SB> purposes, it is only applicable to statically typed languages; no one
SB> on the list could come up with a dyamically typed language that allowed
SB> bit-reinterpretation.

Not in the language, but through library modules like struct.

Steven Bethard · Nov 4, 2004

Alex Martelli said:
Suppose for example that you would like your OS to be able to load
executable code from disk into memory and execute it. Suppose you would
like it to be able to read some configuration parameters from disk and
set its own internal data structures accordingly. At one level, as it
goes to disk or comes back from there, you have arrays of bytes. But in
memory, you want functions that can be called appropriately (a device
driver residing in a module) or data structures which, differently from
an array of byte, DO have structure -- for example, a partition table
for a disk, with information to specific filesystem drivers as to what
partition is to be treated in what way.

I'm sorry, I guess I still don't understand this example. It sounds like you're
just going between untyped (array of bytes) and typed (functions, data
structures, etc.) I'm not seeing how you're, for example, casting a function to
a data structure, or a data structure of one type to a data structure of another
type.

Anything can (and sometimes should) be treated just as an array of bytes, in an
analogous method to how in an OO language with a base class Object, sometimes
it's appropriate to treat a given instance of a class as an Object, rather than
it's particular subclass of Object. I wouldn't consider treating function, data
structures, etc. as arrays of bytes (or vice versa) as something that takes
advantage of "weak-typing", but rather something that takes advantage of the
ability to treat data as untyped.

Hopefully this clarified my confusion and when you get a chance, you can try to
explain it to me again? Thanks,

Steve

python philosophical question - strong vs duck typing	2	Jan 3, 2012
strongly typed	4	Oct 20, 2004
2 questions about scope	4	Oct 25, 2004
Usage statistics?	1	Nov 5, 2004
Elise Mooney reports on Channel 9 about Maths Worldwide and the fraudthat it is	1	Apr 17, 2010
Dr. Dobb's Python-URL! - weekly Python news and links (Nov 10)	1	Nov 10, 2004
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
Ruby Weekly News 5th - 11th June 2006	0	Jun 14, 2006

strong/weak typing and pointers

Mike Meyer

Steven Bethard

Steven Bethard

Mike Meyer

JCM

Gabriel Zachmann

JCM

Steven Bethard

Alex Martelli

Alex Martelli

Alex Martelli

Jeff Shannon

JCM

JCM

Steven Bethard

Steven Bethard

Alex Martelli

Diez B. Roggisch

Piet van Oostrum

Steven Bethard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads