Question about C

S

spinoza1111

Hi, I am not a student at IUT, but I do have two questions.

Can a C programmer by means of hacking get access to the machine code
of a function? I've never used the feature by means of which a
function may be passed as its address, but it sounds like one could
use this (in real programming, don't pass functions: write a small
interpreter).

Shouldn't C be redesigned so as to make the machine code of a program
a full object, one that can be addressed and used, possibly changed? I
mean, you can mung everything else in this language. Why not be Mung
the Merciless?

int a()
{
printf("%s\n", a);
}

Strangely, given .Net's higher level orientation than C, which claims
to be a high level low level language for high level guys who like to
get down, I can actually access, if not the machine code, the byte
code in .Net. cf http://spinoza1111.wordpress.com/20...t-tool-for-executing-source-code-at-run-time/.
 
J

jacob navia

spinoza1111 a écrit :
Hi, I am not a student at IUT, but I do have two questions.

Can a C programmer by means of hacking get access to the machine code
of a function? I've never used the feature by means of which a
function may be passed as its address, but it sounds like one could
use this (in real programming, don't pass functions: write a small
interpreter).

Shouldn't C be redesigned so as to make the machine code of a program
a full object, one that can be addressed and used, possibly changed? I
mean, you can mung everything else in this language. Why not be Mung
the Merciless?

int a()
{
printf("%s\n", a);
}

Strangely, given .Net's higher level orientation than C, which claims
to be a high level low level language for high level guys who like to
get down, I can actually access, if not the machine code, the byte
code in .Net. cf http://spinoza1111.wordpress.com/20...t-tool-for-executing-source-code-at-run-time/.

#include <math.h>
#include <stdio.h>

static int fn(int s)
{
if (s > 0)
return sqrt(s)/pow(s,3);
else return pow(s,3);
}

int main(void)
{
int (*fptr)(int) = fn;
char *p = (char *)fptr;

printf("First 10 bytes of the opcodes of fn are:\n");
for (int i = 0; i<10;i++) {
printf("[%d] %d 0x%x\n",i,p,p);
}
}
 
M

Malcolm McLean

Hi, I am not a student at IUT, but I do have two questions.

Can a C programmer by means of hacking get access to the machine code
of a function?
Depends on your machine.
On many architectures, casting a function pointer to an unsinged char
* will work. You can then read off the opcodes of the macine
instructions, sometimes even change them as the program runs. However
more modern architectures will choke on this - it is asking for
someone to exploit the system with a virus. The proection is at quite
a low level, it's nothing to do with C as such.
 
S

sandeep

jacob said:
#include <math.h>
#include <stdio.h>

static int fn(int s)
{
if (s > 0)
return sqrt(s)/pow(s,3);
else return pow(s,3);
}

int main(void)
{
int (*fptr)(int) = fn;
char *p = (char *)fptr;

printf("First 10 bytes of the opcodes of fn are:\n"); for (int
i = 0; i<10;i++) {
printf("[%d] %d 0x%x\n",i,p,p);
}
}


Hello jacob ~~

You probably know it, but that code will NOT work on a platform where C
is interpreted not compiled. Also a static function could be completely
removed by the compiler since it is not externally visible, then you
would get the first opcodes of main().

Regards ~~
 
W

Willem

sandeep wrote:
) Hello jacob ~~
)
) You probably know it, but that code will NOT work on a platform where C
) is interpreted not compiled.

Right.

) Also a static function could be completely
) removed by the compiler since it is not externally visible, then you
) would get the first opcodes of main().

Wrong.
Taking the address of a function forces the compiler to keep a copy of it.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
P

Phil Carmody

Willem said:
sandeep wrote:
) Hello jacob ~~
)
) You probably know it, but that code will NOT work on a platform where C
) is interpreted not compiled.

Right.

) Also a static function could be completely
) removed by the compiler since it is not externally visible, then you
) would get the first opcodes of main().

Wrong.
Taking the address of a function forces the compiler to keep a copy of it.

Chapter and verse, please?

Surely if the address is never used, then the as-if rule permits a
compiler to have no copy of the function. No strictly conforming
program could tell if the function was really there or not.

Phil
 
J

jacob navia

sandeep a écrit :
jacob said:
#include <math.h>
#include <stdio.h>

static int fn(int s)
{
if (s > 0)
return sqrt(s)/pow(s,3);
else return pow(s,3);
}

int main(void)
{
int (*fptr)(int) = fn;
char *p = (char *)fptr;

printf("First 10 bytes of the opcodes of fn are:\n"); for (int
i = 0; i<10;i++) {
printf("[%d] %d 0x%x\n",i,p,p);
}
}


Hello jacob ~~

You probably know it, but that code will NOT work on a platform where C
is interpreted not compiled.


Well, it should get the "opcodes" of whatever form the interpreter uses
to store a function...

Also a static function could be completely
removed by the compiler since it is not externally visible, then you
would get the first opcodes of main().

That can't be completely removed since its used: its address is taken.
Therefore the compiler can't remove it and if it does, the compiler has
a serious bug.
 
J

jacob navia

Phil Carmody a écrit :
Chapter and verse, please?

Surely if the address is never used, then the as-if rule permits a
compiler to have no copy of the function. No strictly conforming
program could tell if the function was really there or not.

Phil

The address *IS* used. It is assigned to a char pointer that is used for
printf. THEN

(1) Apparently you can't read. Get better glasses.
(2) If the compiler removes it, it has a bug.
 
M

Malcolm McLean

Chapter and verse, please?

Surely if the address is never used, then the as-if rule permits a
compiler to have no copy of the function. No strictly conforming
program could tell if the function was really there or not.
In theory yes, it's undefined behaviour once the function pointer is
cast to an unsigned char *, so the compiler is free to inline the
function and report an error message when the cast is executed. In
fact it's probably too hard and also pointless for a compiler to make
this distinction - if a function's address is taken, it won't inline
it, or at least will make a non-inlined copy. As a compiler writer
Jacob knows what he is talking about.
 
S

spinoza1111

jacob said:
#include <math.h>
#include <stdio.h>
static int fn(int s)
{
         if (s > 0)
             return sqrt(s)/pow(s,3);
         else return pow(s,3);
}
int main(void)
{
         int (*fptr)(int) = fn;
         char *p = (char *)fptr;
         printf("First 10 bytes of the opcodes of fn are:\n"); for (int
         i = 0; i<10;i++) {
                 printf("[%d] %d 0x%x\n",i,p,p);
         }
}


Hello jacob ~~

You probably know it, but that code will NOT work on a platform where C
is interpreted not compiled. Also a static function could be completely
removed by the compiler since it is not externally visible, then you
would get the first opcodes of main().

Regards ~~


How about a version of C in which you could get and change both source
and object code at all times?

You can get both in .Net, and you can call the compiler which means
you can change both. In principle (I haven't figured out myself how to
do it).

So much for C's "power".

Hmm...taking Herb Schildt's old C interpreter...porting it
to .Net...adding this ability...nyhah ha haaah
 
M

Malcolm McLean

You can get both in .Net, and you can call the compiler which means
you can change both. In principle (I haven't figured out myself how to
do it).

So much for C's "power".
You're kind of right. Real second generation languages (as opposed to
the jargony use of the term 4gl to refer to scripts with flashy user
interfaces) will provide mechanisms for altering and extending the
program as it runs. However we haven't figured out how to do this in a
useful way yet. I've heard that Lisp makes the attempt.
 
N

Nick Keighley

You're kind of right. Real second generation languages (as opposed to
the jargony use of the term 4gl to refer to scripts with flashy user
interfaces) will provide mechanisms for altering and extending the
program as it runs. However we haven't figured out how to do this in a
useful way yet. I've heard that Lisp makes the attempt.

well you can load and even write code at run time, make fixes to live
systems etc.
I believe Forth attempted this as well
 
B

Ben Bacarisse

jacob navia said:
Phil Carmody a écrit :

Nit: taking the address happens all the time so I think you mean storing
the address somewhere. A simple call like f() involves "taking the
address" of f (at least by some reasonable senses of the term).
The address *IS* used. It is assigned to a char pointer that is used
for printf. THEN

(1) Apparently you can't read. Get better glasses.
(2) If the compiler removes it, it has a bug.

Can you explain why this is a bug? I've used a C compiler in which
(char *)f produces a pointer to an essentially random location unrelated
to f in any way. I can't see why a C compiler has to keep a copy of f
around unless something useful can be done with it.
 
J

jacob navia

Ben Bacarisse a écrit :
Nit: taking the address happens all the time so I think you mean storing
the address somewhere. A simple call like f() involves "taking the
address" of f (at least by some reasonable senses of the term).

I mean "taking the address". If you take the address without storing it anywhere like in f(), the
function MUST be kept since it is called. If you take the address to store it somewhere, it is used
too, and must be kept.
Can you explain why this is a bug?

See above. If I take the address of f() and store it in a function table, to be part of one of my
VTables and used in my container library, the compiler MUST keep the function code.
I've used a C compiler in which
(char *)f produces a pointer to an essentially random location unrelated
to f in any way.

Great! In that machine nothing will work.

So what?

I would recommend you to keep that machine away from me.

:)
I can't see why a C compiler has to keep a copy of f
around unless something useful can be done with it.

Yes, but I would prefer that compilers stay what they are: compilers.

The day a compiler decides that my program is not useful because there is nothing interesting coming
out of it and decides to erase my source files I will stop programming. There is no point in
programming such compilers anyway. You just tell them:

Write my program compiler!

And they will do it without bothering anyone with programming tasks.

Until that day arrives, I will go on programming with this simple assumptions:

(1) I am the one writing the program.
(2) The compiler COMPILES my code to machine code
(3) The circuit board executes what the compiler compiled.

Yes, I know. In your (wonderful and mysterious) machine, making

(char *)f

produces a random pointer. In that machine too, the compiler decides
that functions whose address is taken and stored can be "optimized"
away.

OK. Be it. I never used such a machine, and I would rather prefer not having to worry about them.

jacob
 
T

Tom St Denis

jacob said:
#include <math.h>
#include <stdio.h>
static int fn(int s)
{
         if (s > 0)
             return sqrt(s)/pow(s,3);
         else return pow(s,3);
}
int main(void)
{
         int (*fptr)(int) = fn;
         char *p = (char *)fptr;
         printf("First 10 bytes of the opcodes of fn are:\n"); for (int
         i = 0; i<10;i++) {
                 printf("[%d] %d 0x%x\n",i,p,p);
         }
}

Hello jacob ~~
You probably know it, but that code will NOT work on a platform where C
is interpreted not compiled. Also a static function could be completely
removed by the compiler since it is not externally visible, then you
would get the first opcodes of main().
Regards ~~

How about a version of C in which you could get and change both source
and object code at all times?

You can get both in .Net, and you can call the compiler which means
you can change both. In principle (I haven't figured out myself how to
do it).

So much for C's "power".

Hmm...taking Herb Schildt's old C interpreter...porting it
to .Net...adding this ability...nyhah ha haaah


Dunno about you but I wouldn't want to run programs where it's
standard that users can change it's functionality (in terms of running
new code out of the control of the original developer).

As another pointed out you can always load a dynamic object if you
need to add flexibility at runtime...

Tom
 
K

Keith Thompson

sandeep said:
jacob said:
#include <math.h>
#include <stdio.h>

static int fn(int s)
{
if (s > 0)
return sqrt(s)/pow(s,3);
else return pow(s,3);
}

int main(void)
{
int (*fptr)(int) = fn;
char *p = (char *)fptr;

printf("First 10 bytes of the opcodes of fn are:\n");
for (int i = 0; i<10;i++) {
printf("[%d] %d 0x%x\n",i,p,p);
}
}


[The code layout was messed up in sandeep's followup; I've tried to
correct it.]
You probably know it, but that code will NOT work on a platform where C
is interpreted not compiled.

The code is not guaranteed to "work" on any implementation (unless
the implementation makes guarantees beyond those provided by the
standard). The behavior of a conversion from type int (*)(int) to
type char* is undefined. If you disagree, please cite the wording
in the standard that defines the behavior.

There are non-interpreted implementations on which a function
pointer does not contain the machine address of the function's code.
On some, it points to a descriptor that contains more information
about the function. On others (AS/400?), a function pointer *is*
a descriptor, and is much larger than a char*.
Also a static function could be completely
removed by the compiler since it is not externally visible,

Yes. If the program took the function's address and then used that
address to call the function, the implementation could not remove the
function (unless it performed some additional optimization on the call,
such as inlining it). But in this case, since the program's behavior
is undefined, the compiler can do anything it likes.
then you
would get the first opcodes of main().

How did you reach that conclusion? I suppose that's one possibility,
but it's more likely you'd get the bytes from the location where the
code for fn() *would* have been stored. That, or a segmentation fault.

Even if p actually points to the code for fn(), trying to use the
converted pointer to modify that code is likely to fail.

Yes, jacob's program is likely to work as intended on many systems. But
it's not likely to be particularly useful. And using unsigned char
rather than char produces better output; I got:

First 10 bytes of the opcodes of fn are:
[0] 85 0x55
[1] -119 0xffffff89
[2] -27 0xffffffe5
[3] -125 0xffffff83
[4] -20 0xffffffec
[5] 56 0x38
[6] -125 0xffffff83
[7] 125 0x7d
[8] 8 0x8
[9] 0 0x0
 
K

Keith Thompson

jacob navia said:
Ben Bacarisse a écrit :

I mean "taking the address". If you take the address without storing
it anywhere like in f(), the function MUST be kept since it is
called. If you take the address to store it somewhere, it is used too,
and must be kept.

Taking the address of a function does not call the function.

Consider:

#include <stdio.h>

static void func(void)
{
puts("This is never called");
}

int main(void)
{
void (*funcptr)(void) = func;
puts("Note that the value of funcptr is not used");
return 0;
}

The generated code for this program is not required to contain code that
prints the string "This is never called".
See above. If I take the address of f() and store it in a function
table, to be part of one of my VTables and used in my container
library, the compiler MUST keep the function code.

Correct, if the compiler is unable to prove that the address is never
used to call the function. But in both the code I wrote above, and in
the code that you previously posted, the compiler may be able to prove
that the function is never called, and therefore that its address is
never used.

Compilers are permitted to perform optimizations that do not change the
visible behavior of the program.

Note that a call isn't the only thing that can require the compiler to
keep the function's address. For example, an equality comparison to
another function's address must also work correctly. But the behavior
of a conversion to char* is undefined, so it doesn't prevent the
compiler from eliminating the function's code.

Of course the compiler isn't *required* to perform any such
optimizations; it's free to emit code for the function anyway.
Great! In that machine nothing will work.

I suspect that "unrelated to f in any way" was an exaggeration. It's
entirely plausible that a function pointer could point to a descriptor,
where it's the descriptor that contains the address of the function's
code and other information. Converting a function pointer to
char* might let you access the descriptor, but it wouldn't let you
(directly) access the function's code. The descriptor is certainly
related to the function, but the memory address of the descriptor
has no clear relation to the address of the function's code.

There have also been machines where code and data occupy separate
address spaces. C is carefully designed to be usable on such systems.

[...]

jacob, I understand that you're not interested in seeing factual
corrections from me, but perhaps others might benefit from this.
 
B

Ben Bacarisse

jacob navia said:
Ben Bacarisse a écrit :

I mean "taking the address". If you take the address without storing
it anywhere like in f(), the function MUST be kept since it is
called. If you take the address to store it somewhere, it is used too,
and must be kept.

Why? Can't it be inlined even if this is not requested?
See above. If I take the address of f() and store it in a function
table, to be part of one of my VTables and used in my container
library, the compiler MUST keep the function code.

See above. I doubt any compiler can -- or even tries to -- inline a
function call once the address has been stored, but you seem to be
saying that it would be a compiler bug if it did. If the function has
internal linkage and all calls have been inlined I don't see any
requirement from the standard that the code must there as a separate
addressable entity.

[The function could be removed even if it has external linkage if the
compiler or linker is very sophisticated, but why would anyone bother to
do that?]
Great! In that machine nothing will work.

Eh? It worked very well. Code injection attacks were almost impossible
but legitimate access to executable memory simply required a little help
from the OS.
So what?

I would recommend you to keep that machine away from me.

:)

You're safe. The last time I saw one was in a museum (literally).
Yes, but I would prefer that compilers stay what they are: compilers.

The day a compiler decides that my program is not useful because there
is nothing interesting coming out of it and decides to erase my source
files I will stop programming. There is no point in programming such
compilers anyway. You just tell them:

Write my program compiler!

And they will do it without bothering anyone with programming tasks.

Until that day arrives, I will go on programming with this simple assumptions:

(1) I am the one writing the program.
(2) The compiler COMPILES my code to machine code
(3) The circuit board executes what the compiler compiled.

That's beside the point. On a machine were data accesses can't
(normally) access executable code (char *)f has no meaning. What would
you have the compiler do? C is defined in such a way that it can be
implemented on a machine like that. If you don't like that design,
don't buy that kind of hardware, but you can't say it's a bug for a
compiler to ignore C code that is meaningless on the target it is
compiling for.
Yes, I know. In your (wonderful and mysterious) machine, making

(char *)f

produces a random pointer. In that machine too, the compiler decides
that functions whose address is taken and stored can be "optimized"
away.

OK. Be it. I never used such a machine, and I would rather prefer not
having to worry about them.

That's fine; and anyone writing (char *)f is obviously doing the same.
I have no problem with that, but the question was about what would
constitute a bug in a C compiler. That's a more general question than
what constitutes a bug in a compiler that you'd like to use of Intel
hardware.
 
W

William Hughes

In theory yes, it's undefined behaviour once the function pointer is
cast to an unsigned char *, so the compiler is free to inline the
function and report an error message when the cast is executed. In
fact it's probably too hard and also pointless for a compiler to make
this distinction - if a function's address is taken, it won't inline
it, or at least will make a non-inlined copy

If the compiler first takes the "address" of the function, then
casts it to a (char *) I would agree that it is unlikely that
the implementation would take advantage of the fact that undefined
behaviour was invoked to inline the function. However, in this
case taking the address of the function and casing it to (char* )
may take place at the same time. Indeed, it is not until the cast
the the compiler knows it needs an "address" for the function.
If the complier knows that the cast will not produce a predictable
result [1] it may simply use 0 for the value of the (char *) and
*not* mark the function as having the address taken. The compiler
is then free to only inline the function.

- William Hughes

[1] The cast invokes undefined behaviour, so the compiler can
use 0 no matter what. However, the compiler will probably not
make use of this license if there is at least a
semi-reasonable meaning for the cast,
 
S

Stargazer

Ben Bacarisse a écrit :




I mean "taking the address". If you take the address without storing it anywhere like in f(), the
function MUST be kept since it is called. If you take the address to store it somewhere, it is used
too, and must be kept.

Actually, case for your code is especially mentioned in "portability
issues" (Annex K):

"K.5 Common extensions


[#1] The following extensions are widely used in many
systems,
but are not portable to all implementations. The inclusion
of
any extension that may cause a strictly conforming program
to
become invalid renders an implementation nonconforming."

and further:

K.5.7 Function pointer casts


[#1] A pointer to an object or to void may be cast to a
pointer
to a function, allowing data to be invoked as a function
(6.3.4).

[#2] A pointer to a function may be cast to a pointer to
an
object or to void, allowing a function to be inspected
or
modified (for example, by a debugger) (6.3.4).

In particular, your code will fail (read crash) in some common cases
if:

1) Instruction and data pointers are of different size
2) Instructions and data reside in separate segments / address spaces
3) Code section is non-readable (execute-only)
See above. If I take the address of f() and store it in a function table, to be part of one of my
VTables and used in my container library, the compiler MUST keep the function code.

The compiler needs not at least "keep the function code". You can
assign a pointer to external function, of which compiler knows
nothing.

BTW, you used "static" storage specifier in your example, but I
couldn't fund any precise requirements for "static" functions in the
standard (other than the "static" specifier may be used. I know what
is the "common handling" but can you (or anybody else) provide
citation for static functions?
Great! In that machine nothing will work.

So what?

I would recommend you to keep that machine away from me.

:)

IIRC, at least on some AIX platforms when you looked at function's
pointer it pointed to relocatable unconditional branch instruction. I
don't remember all the details, but apparently it was because the
architecture didn't allow indirect branches from data section, so
relocation had to include patching the instruction.
Yes, but I would prefer that compilers stay what they are: compilers.

The day a compiler decides that my program is not useful because there is nothing interesting coming
out of it and decides to erase my source files I will stop programming. There is no point in
programming such compilers anyway. You just tell them:

Write my program compiler!

And they will do it without bothering anyone with programming tasks.

Until that day arrives, I will go on programming with this simple assumptions:

(1) I am the one writing the program.
(2) The compiler COMPILES my code to machine code

.... according the the language's specifications.
(3) The circuit board executes what the compiler compiled.

If the compiler compiles not according to the C standard requirements
(or possibly with additional requirements), then it's not a C
compiler. Even if you devise your own language, once you release
specifications you'll practically have to follow them since.

Daniel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top