how to parse an executable in C and find out if there is any return(RET in assembly) or not

P

priyanka

Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?

Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it.
But if we give the -finline options, it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
?

Thanks for reading all of my questions :)

-priyanka
 
K

Keith Thompson

priyanka said:
I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?

Executable files have different formats on different systems; the C
standard says nothing about any of them. There's no way to do what
you want to do in standard C.
Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it.
But if we give the -finline options, it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
?

The "inline" keyword is new in C99; not all C compilers support it.
Even for those that do, it's not guaranteed to inline the function.
Quoting the standard:

Making a function an inline function suggests that calls to the
function be as fast as possible. The extent to which such
suggestions are effective is implementation-defined.

The "-finline" option is specific to some particular compiler; it's
not defined by the C standard.
 
W

Walter Roberson

I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?

If you knew the exact format of the executable; formats of executables
are not specified by the C standard, and are subject to change
with different compiler options and different compiler patches and
different operating systems.

Could you explain why you want to look for return instructions in
the generated machine code? Everything in C is expressed in
terms of functions, and all functions must return. The only
exception is that if all execution paths in in a routine provably
ended up at an exit() call, then the compiler could optimize out
the dead return; some compilers probably do actually bother,
but it is a sufficiently unusual case that surely you would
have phrased the question mentioning exit if you'd been thinking
of that situation...

Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it.

inline is never more than a hint.
But if we give the -finline options

then you are dealing in compiler options that lie outside of the
C standards, and are matters of implementation.

it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ?

Although you aren't using SGI IRIX, you might find some useful
information about inlining at http://techpubs.sgi.com .
In particular, see SGI's manual page for "ipa" which briefly
describes the process.

There are probably also a number of good papers about inlining
available -- try google scholar .

I need to build a small compiler ?

Inlining... your own compiler... the first question.... I wonder
if what you are trying to figure out is not whether a routine
has *some* return instruction (which they almost all do), but
rather at which points in the logical flow that returns might
occur, so that you can try to inline it? If so, then you are
working at the wrong level: you should be working at the
intermediate representation level, after the parse tree is
generated but before code generation.
Can I build it ?

Small? And complete with inlining? Ummm, that's a non-trivial task
unless the language to be compiled is much much simpler than C,
and you aren't going to be trying to do extensive machine-language
level optimization.
 
C

Chris McDonald

Executable files have different formats on different systems; the C
standard says nothing about any of them. There's no way to do what
you want to do in standard C.


No way to parse an executable in standard C?
No way to find out if there is a return in generated code in standard C?

A little over zealous trying to protect the role of this newsgroup?
 
W

Walter Roberson

No way to parse an executable in standard C?
No way to find out if there is a return in generated code in standard C?

If the file format and OS and language restrictions are such
that it is possible to place into execution a section marked as
data, then figuring out whether there is a machine return instruction
or not is equivilent to solving The Halting Problem. I believe it
is generally agreed that The Halting Problem is not solvable in
standard C.
 
K

Keith Thompson

Chris McDonald said:
No way to parse an executable in standard C?
No way to find out if there is a return in generated code in standard C?

A little over zealous trying to protect the role of this newsgroup?

To be precise, there's no *portable* way to do what the OP wants in
standard C (or, probably, in any other language).

If we had a complete definition of the executable file format, it
would probably be possible to write a standard C program that could
parse it and search for RET instructions (asssuming the target
instruction set has a RET instruction at all).
 
B

Barry Schwarz

Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language

Only in a very system dependent way.
like perl/python to find out the information about the executable ? Is
it possible ?

Ask in newsgroups that deal with perl or python.
Also, how does the compiler add inling to the program ? I know that

I imagine it does it the same way it generates any other code but the
real answer is implementation dependent.
whenever it sees"inline" in front of the procedure name, it inlines it.
But if we give the -finline options, it inline all the procedures ? How

Options are compiler specific.
does it do that ? does it parse ? Is there any good book or article

Ask in a newsgroup that deals with your compiler.
that I can refer to ? I need to build a small compiler ? Can I build it
?

With enough experience, I would think so.


Remove del for email
 
R

Richard Heathfield

priyanka said:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C).

Call fopen to open the file. If that worked, do whatever it is that you want
to do with or to the file, and then call fclose when you're done.
Also, how does the compiler add inling to the program ?

That depends on the compiler.
 
G

Gordon Burditt

I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ?

Find out *WHAT* information about the executable?

C does not document any format for an executable; that's
machine-dependent. If you have an executable in a known format
(say, a CP/M executable for a Z80 processor), you can examine the
file by fopen()ing it in binary mode and reading it. If you write
your program portably then you an examine a CP/M Z80 executable on
any machine you choose to run the program on.

There is no guarantee that the machine you're running the program
on even HAS a "return instruction". However, for most machines
that have one and static-link executables, there will likely be
oodles of them in library code (used or not).

Determining things like the boundaries of machine instructions,
what is machine instructions and what is data, and whether certain
code is reachable is likely to be equivalent to the halting problem.
Is
it possible ?
Also, how does the compiler add inling to the program ?

Isn't that a little bit like "how does the manufacturer add wings
to a school bus to make an AirBus"? Inlining is not something added
after code generation.
I know that
whenever it sees"inline" in front of the procedure name, it inlines it.

I believe a more accurate statement is:

In C89, if inline is NOT specified, the compiler may inline the function.
In C89, if inline IS specified, the compiler may inline the function but
this is a syntax error, so it's unlikely to produce any code at all.
In C99, if inline is NOT specified, the compiler must not inline the function.
In C99, if inline IS specified, the compiler may inline the function.
A compiler may always choose the option of not inlining.
But if we give the -finline options, it inline all the procedures ? How

ANSI C does not specify compiler options, and that one violates the
requirements of ANSI C.

does it do that ? does it parse ?

A compiler parses source code. It does not parse executables, even
if that's an appropriate word to describe interpreting the content
of an executable.
Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
?

Gordon L. Burditt
 
I

Ian Collins

priyanka said:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?
It's possible, but not easy. You would probably have to disassemble the
executable to get the correct context for whatever machine code
represents 'RET'.

The language is irrelevant, the problem remains the same.
Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it.

inline is a hint, nothing more.
But if we give the -finline options, it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
?
These questions are too tool specific to answer here.
 
K

Keith Thompson

(e-mail address removed) (Gordon Burditt) writes:

(Still snipping attribution lines. *Please* stop doing that.)

[...]
I believe a more accurate statement is:

In C89, if inline is NOT specified, the compiler may inline the function.
True.

In C89, if inline IS specified, the compiler may inline the function but
this is a syntax error, so it's unlikely to produce any code at all.

True, except that a C89 compiler is allowed to support inlining as an
extension. If it uses "inline" as a keyword, it has to have a way to
turn it off, since "inline" is a valid identifier in C89.
In C99, if inline is NOT specified, the compiler must not inline the
function.

False. As long as the inlined function behaves the same way as a
non-inlined function (including the ability to take its address),
inlining is a perfectly valid and common optimization.

If anything takes the address of the function, the compiler must
generate a callable body for it. If the compiler can prove that the
function's address is never taken (except implicitly in an ordinary
function call), it can inline all calls and not generate a body.
In C99, if inline IS specified, the compiler may inline the function.
True.

A compiler may always choose the option of not inlining.

True.
 
S

santosh

priyanka said:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?

Yes, however you need to understand the executable's format and machine
code. This varies from platform to platform and even from compiler to
compiler. Basically, you need to disassemble the executable. Post in a
more appropriate forum like alt.lang.asm.
Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it.

The keyword inline was added to C with the 1999 revision of the
standard. The standard merely states that access to functions qualified
as inline should be made as fast as possible. How it is actually done
is upto the implementation.

Generally though, I would assume that the machine code generated for
the function is made continuous, (i.e. embedded), with the sarrounding
machine code. This results in the function's code being duplicated at
each place it's called, instead of having one copy of the function's
code and executing it via the CALL/RET mechanism.
But if we give the -finline options, it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
?

Is this a college project? If so, it seems to be beyond your abilities.
Constructing a compiler, even a bare-bones one, is rather involved.
These links may get you started...

http://cs.wwc.edu/~aabyan/464/Book/
http://www.scifac.ru.ac.za/compilers/

Also post compiler related questions to comp.compilers.
Thanks for reading all of my questions :)

In the future, please remember to post only standard C related
questions to this group.
 
H

Haider

priyanka said:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C).
use _open function and open the file in binary mode then use your logic
to parse the executable.
How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?
python people will tell you.
Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it.
it is not necessary and depends on compiler.
But if we give the -finline options, it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
?
Yes you can it depends on your desire and effort.
 
S

santosh

Haider said:
use _open function and open the file in binary mode then use your logic
to parse the executable.

There is no standard C function called _open(). fopen() in binary mode
will accomplish the task.
 
R

ritesh

Walter said:
If the file format and OS and language restrictions are such
that it is possible to place into execution a section marked as
data, then figuring out whether there is a machine return instruction
or not is equivilent to solving The Halting Problem. I believe it
is generally agreed that The Halting Problem is not solvable in
standard C.

Hi Walter,

Could you elaborate on what is "The Halting Problem". I'm not able to
remember it, but I did hear of it some time back.

Thanks
Ritesh
 
F

Flash Gordon

Almost always a bad idea. Optimisers can do very strange things to code
so working out things about the source is not going to be easy.
use _open function and open the file in binary mode then use your logic
to parse the executable.

There is no function named _open in standard C and many common
implementations do not have it. On those few where it does exist I don't
see any benefit for this problem over the standard fopen function/
python people will tell you.

About perl as well? I think the OP will need to ask in groups for both
languages.
it is not necessary and depends on compiler.
Yes you can it depends on your desire and effort.

Strange, -finline is rejected by my compiler. Asking in a group
dedicated to the compiler will get more helpful replies.
 
H

Haider

santosh said:
There is no standard C function called _open(). fopen() in binary mode
will accomplish the task.
sorry for _open please use open it is standard one.
 
C

Chris Dollin

ritesh said:
Could you elaborate on what is "The Halting Problem". I'm not able to
remember it, but I did hear of it some time back.

<ot>Google is your friend.</>
 
R

Richard Bos

Haider said:
sorry for _open please use open it is standard one.

Before you make claims about a standard, you would be wise to know it.
There is no open() function in ISO C, nor is it in any way needed to
solve the OP's problem.

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top