Problem with asm

F

Fredrik Tolf

1. This is downright impossible in user code running on systems with
virtual memory: all the addresses in the program are interpreted as
virtual addresses.

Precisely my point. If you load an absolute pointer on such a system and
dereference it, it is interpreted as a virtual address, and I can't see
the problem (except the OS will likely raise a page fault or similar,
which is besides the problem). Care to enlighten me?
2. There are platforms where loading an arbitrary address in an address
register generates a fault. If the compiler decides to store the
pointer in an address register or if loading something into a pointer
involves storing the data first into an address register...

You have to forgive me, but I find this hard to believe. Surely, there
must be a way for malloc (and other related function - mmap etc.) to
return the address to the newly allocated chunk, and surely, that must
involve storing the address of the chunk into an address register? Or
did I misunderstand something?

Fredrik Tolf
 
D

Dan Pop

In said:
Precisely my point. If you load an absolute pointer on such a system and
dereference it, it is interpreted as a virtual address, and I can't see
the problem (except the OS will likely raise a page fault or similar,
which is besides the problem). Care to enlighten me?

The problem is that you *cannot* load an absolute address into a pointer
in such an execution environment. ALL addresses are virtual, you have no
access to absolute addresses.
You have to forgive me, but I find this hard to believe.

Then, read the Rationale of the C standard.
Surely, there
must be a way for malloc (and other related function - mmap etc.) to
return the address to the newly allocated chunk, and surely, that must
involve storing the address of the chunk into an address register? Or
did I misunderstand something?

Yup. They first allocate the chunk and only after that they store
its address into an address register. Can't see any problem...

Dan
 
F

Fredrik Tolf

The problem is that you *cannot* load an absolute address into a pointer
in such an execution environment. ALL addresses are virtual, you have no
access to absolute addresses.

Sorry, we seem to be out of sync. By absolute pointer I mean an
arbitrary address, not a physical address.
Then, read the Rationale of the C standard.

It's not that I doubt that you have a point, it's just that I'm just
having a hard time digesting it.
Yup. They first allocate the chunk and only after that they store
its address into an address register. Can't see any problem...

I'm thinking that a malloc implementation must somehow "make up" an
address to allocate, and in that case it seems to me that it's a bit
arbitrary.

I guess my problem in seeing it is I don't see the difference between an
arbitrary pointer and a non-arbitrary one. That is, what's the
difference between these following two statements?

char *buf1 = NULL; /* Should be valid, right? */
char *buf2 = 0xdeadbeef /* Generates an exception, if I follow you. */

If loading NULL, which is defined to zero AFAIK, is allowed, then what
about this?

char *buf = NULL;
buf += 0xdeadbeef;

For surely, you have to be allowed to add offsets to a pointer like
this?

for(p = buf; *p; p++) {...}

Or could it be that they (these platforms) have a concept of an
undefined pointer when storing NULL? In that case, what about this?

char *buf = (char *)malloc(1);
buf += (0xdeadbeef - (int)buf); /* Note incrementation, not storage */

I guess I'm just having a hard time understanding how such a platform
would work... would you mind providing an example of such a platform for
me to study? I guess it is at times like these that I really hate the
fact that archs other than i386 are so prohibitively expensive to get my
hands on... :-(

Fredrik Tolf
 
S

Stephen Sprunk

Fredrik Tolf said:
Precisely my point. If you load an absolute pointer on such a system and
dereference it, it is interpreted as a virtual address, and I can't see
the problem (except the OS will likely raise a page fault or similar,
which is besides the problem). Care to enlighten me?

If the address is not valid, some machines will fault when it is loaded into
a register -- not just when you try to dereference it. That your current
platform does not fault should not be taken as an indication such behavior
is portable.

Odds are that any arbitrary pointer you try to load will be invalid, though
there's a small chance you might get "lucky" and point into a valid area
allocated by some other means.
You have to forgive me, but I find this hard to believe. Surely, there
must be a way for malloc (and other related function - mmap etc.) to
return the address to the newly allocated chunk, and surely, that must
involve storing the address of the chunk into an address register? Or
did I misunderstand something?

malloc() returns valid addresses, which such a machine will not fault on.
Presumably malloc() does some implementation-defined magic to tell the CPU
which addresses are valid and which aren't. You do not have that luxury if
you want your code to be portable.

There are some addresses on some ABIs that are defined to be invalid,
therefore by definition malloc() (or any similar function) _cannot_ return
them.

S
 
J

Joona I Palaste

Sorry, we seem to be out of sync. By absolute pointer I mean an
arbitrary address, not a physical address.

I.e. something like int *p = (int*)0xcafebabe; where the physical
address need not actually be 0xcafebabe?

(snip)
I'm thinking that a malloc implementation must somehow "make up" an
address to allocate, and in that case it seems to me that it's a bit
arbitrary.
I guess my problem in seeing it is I don't see the difference between an
arbitrary pointer and a non-arbitrary one. That is, what's the
difference between these following two statements?
char *buf1 = NULL; /* Should be valid, right? */
char *buf2 = 0xdeadbeef /* Generates an exception, if I follow you. */
If loading NULL, which is defined to zero AFAIK, is allowed, then what
about this?

NULL, which is defined as zero, is a special case. The pointer constant
zero, which need not be the physical address zero, is guaranteed to be
an address which by itself is fully defined, but dereferencing it
causes undefined behaviour. This does not apply to any other absolute
pointer constant. ("Absolute" used in the Tolf meaning, not the Pop
meaning.)
char *buf = NULL;
buf += 0xdeadbeef;

AFAIK the second line causes undefined behaviour by computing the
non-zero absolute pointer value 0xdeadbeef.
For surely, you have to be allowed to add offsets to a pointer like
this?
for(p = buf; *p; p++) {...}

This is different. If buf points to allocated memory and all bytes from
buf up to and including the first zero byte are also in allocated
memory, this is fully defined and safe. The addresses that allocated
memory resides in are always guaranteed to be fully defined. However,
with the exception of zero, no other addresses are. The only way to
legally end up with allocated memory addresses is either to use the
*alloc() functions, or to take the address of a variable or a string
literal (for example char *p="foobar";).
This code, for example, causes undefined behaviour:
char *p = malloc(100);
if (p != NULL) {
p+200;
}
The reason is that the line "p+200;" computes an absolute address
which does not reside in allocated memory.
Or could it be that they (these platforms) have a concept of an
undefined pointer when storing NULL? In that case, what about this?
char *buf = (char *)malloc(1);
buf += (0xdeadbeef - (int)buf); /* Note incrementation, not storage */

This, I think, also causes undefined behaviour, simply because
computing the address 0xdeadbeef causes undefined behaviour, unless
it happens to be in allocated memory.
I guess I'm just having a hard time understanding how such a platform
would work... would you mind providing an example of such a platform for
me to study? I guess it is at times like these that I really hate the
fact that archs other than i386 are so prohibitively expensive to get my
hands on... :-(

Get a second-hand Amiga, Atari ST or old-fashioned Macintosh. You can get
an MC68000 environment, which is definitely non-i386, for less than
$50.
 
K

Keith Thompson

Fredrik Tolf said:
Correct me if I'm wrong, but I have to think that loading an absolute
address into a pointer cannot be wrong, no matter what code. The
impression I have of a pointer is that of a number which points out an
address. Just because you load an absolute address doesn't mean that
address has to be a hardware address. If the code is running under
memory management, segmentation, paging and what not, that absolute
address just points out an address in the process' address space, is
that not true?

It's best not to think of a C pointer as a number. Just think of it
as a pointer. It can point to an object, or it can be a null pointer,
or it can be invalid. You can perform some limited arithmetic and
comparison operations on pointers, but only what's defined by the
standard.

If pointers were numbers, it would make sense to add or multiply two
pointer values. It doesn't.

Pointers, or addresses, are of course implemented as integers on many
systems, but assuming that will get you into trouble when you try to
port your code to a system with a different pointer representation.

Suggested reading: C FAQ, section 4.

Further suggested reading: C FAQ, the whole thing.

<http://www.eskimo.com/~scs/C-faq/faq.html>
 
D

Dan Pop

In said:
Sorry, we seem to be out of sync. By absolute pointer I mean an
arbitrary address, not a physical address.

You have written "absolute address" in the text I was repying to. I was
not aware that "absolute" and "arbitrary" can be used interchangeably in
context. But the issue is clarified now.
It's not that I doubt that you have a point, it's just that I'm just
having a hard time digesting it.


I'm thinking that a malloc implementation must somehow "make up" an
address to allocate, and in that case it seems to me that it's a bit
arbitrary.

You're missing a couple of points:

1. malloc's implementation can use whatever works on the underlying
hardware. If the hardware doesn't support arbitrary addresses in
address registers, then this is not an option.

2. malloc seldom (if ever) has to manipulate arbitrary addresses. The
malloc arena is entirely manipulated using pointer arithmetic. Look
at the sample implementations provided by K&R2 and Plauger.
I guess my problem in seeing it is I don't see the difference between an
arbitrary pointer and a non-arbitrary one. That is, what's the
difference between these following two statements?

char *buf1 = NULL; /* Should be valid, right? */
Right.

char *buf2 = 0xdeadbeef /* Generates an exception, if I follow you. */

This is not valid C code. Integers are not converted automatically to
pointers.

char *buf2 = (char *)0xdeadbeef; /* MAY generate an exception */
If loading NULL, which is defined to zero AFAIK, is allowed, then what
about this?

char *buf = NULL;
buf += 0xdeadbeef;

Undefined behaviour. You can perform pointer arithmetic only inside
objects (or one byte after). You cannot perform pointer arithmetic on
null pointers.
For surely, you have to be allowed to add offsets to a pointer like
this?

You may want to actually learn C *before* continuing this discussion...
for(p = buf; *p; p++) {...}

This is correct only as long as p stays within buf. And completely
irrelevant to our dicussion.
Or could it be that they (these platforms) have a concept of an
undefined pointer when storing NULL? In that case, what about this?

They don't need such a concept, they merely have to insure that storing
the value of a null pointer in an address register doesn't generate any
exception. Trivially achieved by allocating a reserved object at the
corresponding address.
char *buf = (char *)malloc(1);
buf += (0xdeadbeef - (int)buf); /* Note incrementation, not storage */

Nope, this is compound assignment, not incrementation. But both imply
a storage operation. Which may never happen, because your pointer
aritmetic invokes undefined behaviour, unless 0xdeadbeef - (int)buf
evaluates to 0 or 1.
I guess I'm just having a hard time understanding how such a platform
would work...

You're having a hard time understanding how pointer arithmetic is defined
in C. And this is NOT a guess!
would you mind providing an example of such a platform for me to study?

80286 in protected mode, according to some people. Load some garbage in
a segment register (instead of a proper selector value) and it traps.

Dan
 
K

Keith Thompson

Fredrik Tolf said:
I'm thinking that a malloc implementation must somehow "make up" an
address to allocate, and in that case it seems to me that it's a bit
arbitrary.

malloc() has to know what it's doing. The C code that implements
malloc() (assuming it's implemented in C; it doesn't have to be) very
likely invokes undefined behavior, but it's free to take advantage of
implementation-specific characteristics of the underlying system.

A pointer returned by malloc() isn't "arbitrary" in the sense that
we're using the term here. It's either a null pointer or a pointer to
a newly allocated object.
I guess my problem in seeing it is I don't see the difference between an
arbitrary pointer and a non-arbitrary one. That is, what's the
difference between these following two statements?

char *buf1 = NULL; /* Should be valid, right? */

Right. A NULL pointer is valid for some operations (assignment,
comparison, etc.) but invalid for others (dereferencing, pointer
arithmetic, etc.).
char *buf2 = 0xdeadbeef /* Generates an exception, if I follow you. */

It invokes undefined behavior. That can mean generating an exception,
or it can mean storing a value in buf2 that happens to be valid, or it
can mean making demons fly out your nose.
If loading NULL, which is defined to zero AFAIK, is allowed, then what
about this?

char *buf = NULL;
buf += 0xdeadbeef;

For surely, you have to be allowed to add offsets to a pointer like
this?

Pointer arithmetic is valid as long as the resulting pointer points
into the same object as the original value. (It's actually slightly
more complex than that.) For example:

int arr[10];
int *ptr = arr; /* points to arr[0] */
ptr += 5; /* points to arr[5] */
ptr += 100; /* invalid */

If the original pointer doesn't point to an object, any arithmetic on
it will invoke undefined behavior. (If ptr==NULL, I'm not sure what
the standard says about ptr+0 -- but I'm not sure I care.)

[...]
char *buf = (char *)malloc(1);
buf += (0xdeadbeef - (int)buf); /* Note incrementation, not storage */

You can convert a pointer to an integer, but there's not much you can
portably do with the result. The statement above is very likely to
give you an invalid pointer, so don't do that.

[...]
 
K

Keith Thompson

Joona I Palaste said:
Fredrik Tolf <[email protected]> scribbled the following: [...]
I guess I'm just having a hard time understanding how such a platform
would work... would you mind providing an example of such a platform for
me to study? I guess it is at times like these that I really hate the
fact that archs other than i386 are so prohibitively expensive to get my
hands on... :-(

Get a second-hand Amiga, Atari ST or old-fashioned Macintosh. You can get
an MC68000 environment, which is definitely non-i386, for less than
$50.

But a 68000 isn't particularly exotic. In fact, it's a much more
regular architecture than the x86. No segment registers, just a
simple linear address space (and a richer set of general-purpose data
and address registers).

A second-hand IBM AS/400 might be instructive, but I seriously doubt
that you could get a working one for $50.
 
F

Fredrik Tolf

I.e. something like int *p = (int*)0xcafebabe; where the physical
address need not actually be 0xcafebabe?

Yeah. An offset into the address space current in use in the execution
environment, if you will.
[snip]
NULL, which is defined as zero, is a special case. The pointer constant
zero, which need not be the physical address zero, is guaranteed to be
an address which by itself is fully defined, but dereferencing it
causes undefined behaviour. This does not apply to any other absolute
pointer constant. ("Absolute" used in the Tolf meaning, not the Pop
meaning.)

I think I will be having trouble continuing this discussion if one thing
is not made clear.

I am (possible incorrectly) under the impression that C is defined to
operate in an execution environment where you have exactly one linear
address space, and pointers are defined as offsets into this address
space.

If this is not so and pointers are defined in a more abstract meaning, I
fully understand why all these described actions cause undefined
behavior. Would anyone care to clarify this?

Fredrik Tolf
 
F

Fredrik Tolf

It's best not to think of a C pointer as a number. Just think of it
as a pointer. It can point to an object, or it can be a null pointer,
or it can be invalid. You can perform some limited arithmetic and
comparison operations on pointers, but only what's defined by the
standard.

Indeed, I know that is the best way to think of it. However, I'm in the
dark as to whether it's _possible_ to think of them as numbers. See, I'm
under the impression that C is defined to operate in a linear address
space, with pointers being defined as offsets into this address space.
Please tell me if this is not the case. (Also, if this is not the case,
can someone be as kind as to provide an example of an architecture where
it isn't?)

Fredrik Tolf
 
K

Keith Thompson

Fredrik Tolf said:
I think I will be having trouble continuing this discussion if one thing
is not made clear.

I am (possible incorrectly) under the impression that C is defined to
operate in an execution environment where you have exactly one linear
address space, and pointers are defined as offsets into this address
space.

If this is not so and pointers are defined in a more abstract meaning, I
fully understand why all these described actions cause undefined
behavior. Would anyone care to clarify this?

Yes, that impression is incorrect. C does not guarantee a single
linear address space (though it certainly allows one). A valid C
implementation could have a distinct address space for each object
(each chunk of memory returned by malloc() or equivalent, and each
declared variable). Comparing pointers to different objects (other
than for equality) invokes undefined behavior, because there may not
be a defined greater-than/less-than relationship between them.
 
F

Fredrik Tolf

Yes, that impression is incorrect. C does not guarantee a single
linear address space (though it certainly allows one). A valid C
implementation could have a distinct address space for each object
(each chunk of memory returned by malloc() or equivalent, and each
declared variable). Comparing pointers to different objects (other
than for equality) invokes undefined behavior, because there may not
be a defined greater-than/less-than relationship between them.

That does explain a lot of things. Thanks for the clarification.

Fredrik Tolf
 
C

Chris Torek

Richard Pennington said:

The examples I gave (with the possible exception of Windows, I haven't
seen the source) have very little assembly language used in their
implementation.

Take a look at the NetBSD and FreeBSD <machine/bus.h> headers.

What look like C function calls actually expand to inline assembly.

Device drivers are chock full of assembly code, on Intel-based
systems at least.

Linux does the same sort of thing, but in different headers. None
of these three OSes can be compiled with anything but the GNU
compilers, for this reason.

The BSD code *is* specified well enough to allow (in most cases)
substituting some other compiler, but not easily. (For instance,
the semantics of the bus_read_* and bus_write_* "functions" are
such that you could write assembly-language subroutines to do the
same job.)
... I think that many people don't understand
how the real world works. We compiler and OS writers will wink and nod
at each other and continue to rely on undefined behavior.

There is nothing wrong with using undefined behavior to extend any
given compiler and then write code that depends on it -- but you
(the generic "you") should be *aware* of it when you do it, because
it ties you down. It often removes the freedom to change compilers,
as in the cases mentioned above. You must make sure the price you
pay buys you something worthwhile (and, if you are clever enough,
you can "keep the price relatively low" the way the BSD guys did).
 
G

Gordon Burditt

Correct me if I'm wrong, but I have to think that loading an absolute
Precisely my point. If you load an absolute pointer on such a system and
dereference it, it is interpreted as a virtual address, and I can't see
the problem (except the OS will likely raise a page fault or similar,
which is besides the problem). Care to enlighten me?

Raising a page fault or similar corresponds to what the compiler
calls "undefined behavior".

If you load what you think is a physical address pointer into a
register and then attempt to use it, on many systems it will be
interpreted as a virtual address. One of the worst things that can
happen here is that it appears to work, but you access the *WRONG
MEMORY*. Just because physical address 0x0B0000 or 0xB000:0000 is
video memory doesn't mean that any virtual address refers to video
memory, or if there IS a way to access video memory, that it looks
anything like 0xB0000.

If there is a way to map physical memory, it is likely to be done
by asking the OS nicely, and the return value from the request will
tell you where (in virtual memory) the OS put it. (Consider mmap(),
for example).
You have to forgive me, but I find this hard to believe. Surely, there
must be a way for malloc (and other related function - mmap etc.) to
return the address to the newly allocated chunk, and surely, that must
involve storing the address of the chunk into an address register? Or
did I misunderstand something?

There is a way for malloc() to return the address of a newly allocated
chunk. Put some arbitrary bit pattern in there instead, and KABOOM!
Consider an address with a random segment number on an i386-architecture
in protected mode. The chances that the random segment number refers
to one actually in use are small.

Gordon L. Burditt
 
G

Gordon Burditt

Indeed, I know that is the best way to think of it. However, I'm in the
dark as to whether it's _possible_ to think of them as numbers.
See, I'm
under the impression that C is defined to operate in a linear address
space, with pointers being defined as offsets into this address space.

Stop thinking that right now!
C is defined to operate in a non-liner address space, where pointers
are *NOT* necessarily offsets into this address space.

It sorta accidentally happens that a linear address space works too.
Please tell me if this is not the case. (Also, if this is not the case,
can someone be as kind as to provide an example of an architecture where
it isn't?)

Take a hard look at the Intel [3456]86 architecture.

Consider large-model 16-bit real mode, where a pointer consists of
a segment register and an offset.

Consider large-model protected mode (16 or 32 bit) where loading a
segment register with an invalid segment number causes a trap.

Gordon L. Burditt
 
C

CBFalconer

Fredrik said:
.... snip ...


You have to forgive me, but I find this hard to believe. Surely,
there must be a way for malloc (and other related function -
mmap etc.) to return the address to the newly allocated chunk,
and surely, that must involve storing the address of the chunk
into an address register? Or did I misunderstand something?

But malloc etc. are not returning an _/arbitrary/_ address. They
are returning the address of a chunk of memory that has been
specifically made available and usable to the caller. How it does
this is not specified, and obviously must be system specific.
 
F

Fredrik Tolf

Stop thinking that right now!
C is defined to operate in a non-liner address space, where pointers
are *NOT* necessarily offsets into this address space.

It sorta accidentally happens that a linear address space works too.

I already found out in another reply, but thanks for replying anyway.
Please tell me if this is not the case. (Also, if this is not the case,
can someone be as kind as to provide an example of an architecture where
it isn't?)

Take a hard look at the Intel [3456]86 architecture.

Consider large-model 16-bit real mode, where a pointer consists of
a segment register and an offset.

Consider large-model protected mode (16 or 32 bit) where loading a
segment register with an invalid segment number causes a trap.

Well, that was precisely the thing: The only real mode C implementation
I've seen (MSVC 1.0, long ago) uses a non-compliant extension called
"far pointers" that incorporate the segment, and all protected mode C
implementations I've seen don't deal with selectors at all. That was
what had made my impression of that.

Fredrik Tolf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,200
Latest member
LaraHunley

Latest Threads

Top