Finding the source files in the binary

S

Subra

Hi,

How can I find out the list of source files gone into a binary ?
Suppose if I have used the command some thing like below :-

gcc f1.o f2.o f4.o -o exe.o

Then what command will give me the constituents of exe.o ?
command_required exe.o
f1.o f2.o f4.o
 
D

Don Bruder

Subra said:
Hi,

How can I find out the list of source files gone into a binary ?
Suppose if I have used the command some thing like below :-

gcc f1.o f2.o f4.o -o exe.o

Then what command will give me the constituents of exe.o ?

f1.o f2.o f4.o

I'm not absolutely sure, but I strongly suspect that for anything but
the most trivial cases, there's no practical way to retrieve that
information.

What you're trying to achieve isn't too much different than handing
somebody a bowl of stew and asking them to tell you the exact
ingredients used to make it, how much of each was used, and in what
order each one was added to the pot.

If you don't have the source code, I'd say it's completely impossible.
 
S

Subra

I'm not absolutely sure, but I strongly suspect that for anything but
the most trivial cases, there's no practical way to retrieve that
information.

What you're trying to achieve isn't too much different than handing
somebody a bowl of stew and asking them to tell you the exact
ingredients used to make it, how much of each was used, and in what
order each one was added to the pot.

If you don't have the source code, I'd say it's completely impossible.

--
Don Bruder - (e-mail address removed) - If your "From:" address isn't on my whitelist,
or the subject of the message doesn't contain the exact text "PopperAndShadow"
somewhere, any message sent to this address will go in the garbage without my
ever knowing it arrived. Sorry... <http://www.sonic.net/~dakidd> for more info

I have the complete source code of the project.
I need to debug the project.
For that I have compiled the code with -g option.
I think there should be a way to retrieve the required info.
 
K

Keith Thompson

Subra said:
How can I find out the list of source files gone into a binary ?
Suppose if I have used the command some thing like below :-

gcc f1.o f2.o f4.o -o exe.o

Then what command will give me the constituents of exe.o ?

f1.o f2.o f4.o

Your question is about gcc (or about the linker, or perhaps the OS),
not about C. Try gnu.gcc.help or perhaps comp.unix.programmer.
 
S

santosh

Subra said:
Hi,

How can I find out the list of source files gone into a binary ?
Suppose if I have used the command some thing like below :-

gcc f1.o f2.o f4.o -o exe.o

Then what command will give me the constituents of exe.o ?

command_required exe.o
f1.o f2.o f4.o

In the general case this is not easy. Why not wrap compilation in a shell
script that saves this sort of information to a file? This seems to me to
be a sensible and easy option, rather than probing binary files. I think
you can even coerce make into doing this for you.

For more details consult a GNU/GCC group or a UNIX group like
<news:comp.unix.programmer>
 
J

jacob navia

Subra said:
Hi,

How can I find out the list of source files gone into a binary ?
Suppose if I have used the command some thing like below :-

gcc f1.o f2.o f4.o -o exe.o

Then what command will give me the constituents of exe.o ?

f1.o f2.o f4.o

There are many possibilities:

1) Easy, since you have the source code. In each file add:

static char *myFileNameVariable = "This file is called @@@ foo.c@@@";

Then, a grep "@@@" *.o will give you the file names, or better
strings *.o | grep @@@

2) Read the debug information and find the file name. With stabs
format debug information the answer could be found with grep, using
DWARF format is probably more complicated. You will have to write a
stabs/DWARF reader. There are several already developed, use them.

Obviously the debugger DOES find that information so you can find it
also.

Under windows or other operating systems, the debug information has
other formats, there is no standard. Recently, I proposed to use
an XML format to exchange this type of information in comp.std.c
newsgroup but the reception wasn't very good. Mostly because
nobody wants to do the work involved... But it would be nice if
we had a standard format that allowed a simple reader to read the
information, without developing a new reader for each compiler.

jacob
 
J

Jens Thoms Toerring

I have the complete source code of the project.
I need to debug the project.
For that I have compiled the code with -g option.
I think there should be a way to retrieve the required info.

This makes a lot of a difference: a) your question has become
off-topic here;-) since it's about the tools you use and not C
and b) your problem most probably have some kind of a solution
which you probably will find when you read the documentation for
your debugger carefully (<OT> if its by any chance gdb try the
'info sources' command <OT>).
Regards, Jens
 
C

CBFalconer

Subra said:
How can I find out the list of source files gone into a binary ?
Suppose if I have used the command some thing like below :-

gcc f1.o f2.o f4.o -o exe.o

Then what command will give me the constituents of exe.o ?

Off-topic, but in general simply use a makefile. Then compile with
make.
 
R

Richard Bos

jacob navia said:
There are many possibilities:

There are, however, none that are reliable.
1) Easy, since you have the source code. In each file add:

static char *myFileNameVariable = "This file is called @@@ foo.c@@@";

And what if this string is optimised away? You will probably get away
with it by printing that string, but you have to do so for each source
file you link in, which probably involves uncoordinated masses of init
functions; and that still overlooks the problem with #included files.
2) Read the debug information and find the file name.

Which is only conditionally possible, and even _when_ it is, _how_ it
may be possible is off-topic here.
Under windows or other operating systems, the debug information has
other formats, there is no standard. Recently, I proposed to use
an XML format to exchange this type of information in comp.std.c
newsgroup but the reception wasn't very good. Mostly because
nobody wants to do the work involved...

No, mostly because it's a stupid idea (particularly the XML bit - those
three letters always make me shake my head in sadness) which will end in
a. tears and b. yet another semi-standard to be violated by Microsoft.

In the general case, there simply is no solution for this problem. In
the OP's specific case, there may be a solution; but even if so, it will
be on-topic in a newsgroup for the OP's implementation, not here.

Richard
 
J

jacob navia

Richard said:
There are, however, none that are reliable.


And what if this string is optimised away? You will probably get away
with it by printing that string, but you have to do so for each source
file you link in, which probably involves uncoordinated masses of init
functions; and that still overlooks the problem with #included files.

This method was used by the rcs source code control system, and by
MANY tools under Unix. Some compiler would optimize
that away, but since those tools worked perfectly under
Unix and under windows, I suppose that wasn't the case.

Alternatively you can drop the "static" and figure out a
unique variable name.
Which is only conditionally possible, and even _when_ it is, _how_ it
may be possible is off-topic here.


Personally, if you think it is off topic I do not really care.
No, mostly because it's a stupid idea (particularly the XML bit - those
three letters always make me shake my head in sadness) which will end in
a. tears and b. yet another semi-standard to be violated by Microsoft.

If you do not like Microsoft I do not really care either.

Like many amateur Microsoft haters, you do not advance any arguments.

XML is an international standard used under linux / solaris/
Macintosh/ Windows and MANY other operating systems. You do NOT
specify why XML is a "semi standard" or what YOUR problem with
XML is. (As I said above NO ARGUMENTS).

You just advance your opinion as it was a fact, then, because I speak
about XML, I *must* be a Microsoft slave (obvious isn't it?) and
I want to promote Microsoft.
In the general case, there simply is no solution for this problem. In
the OP's specific case, there may be a solution; but even if so, it will
be on-topic in a newsgroup for the OP's implementation, not here.

Again, if you feel this is off topic DO NOT REPLY, or use a killfile.
Put me in there and the problem is solved.
 
S

SM Ryan

# Hi,
#
# How can I find out the list of source files gone into a binary ?
# Suppose if I have used the command some thing like below :-

In general, you can't. Propietrary software depends on the fact
that is expensive and difficult to extract source from object.
Not impossible, but not cheap.

Some object code does have additional information pointing back
to the original code, the debugging versions of the code, but
this information is not necessary for execution and you cannot
rely on it being present.
 
C

Cedric Roux

Subra said:
I have the complete source code of the project.
I need to debug the project.
For that I have compiled the code with -g option.
I think there should be a way to retrieve the required info.

Try the nm utility.
nm -a <your program> | grep " a "
(with a gnu nm, and -g for compiling the program)
But sure, it's off-topic here.
 
D

Dik T. Winter

....
> This method was used by the rcs source code control system, and by
> MANY tools under Unix.

"source code control system" should tell you everything. It is not about
the strings being maintained in the binary. Moreover, for RCS, it should
work for *all* kind of sources, not only C programs. So (if I remember
correctly) the RCS version checking was triggered by the string $Header$
somewhere in the source module intended to be maintained.
> Some compiler would optimize
> that away, but since those tools worked perfectly under
> Unix and under windows, I suppose that wasn't the case.

Those tools do not work on binaries. Tools that work for binaries have
to make assumptions on the compilers used, but they generally tend to
be part of a more integrated development environment.
 
K

Keith Thompson

Dik T. Winter said:
...
This method was used by the rcs source code control system, and by
MANY tools under Unix.

"source code control system" should tell you everything. It is not about
the strings being maintained in the binary. Moreover, for RCS, it should
work for *all* kind of sources, not only C programs. So (if I remember
correctly) the RCS version checking was triggered by the string $Header$
somewhere in the source module intended to be maintained.[/QUOTE]
[...]

<OT>
It's actually common practice to use declaration similar to the above
to embed a specified string in the source file, the object file, and
the executable. There's no guaranteed way to ensure that the string
actually appears in the object file or executable, but there's often a
way to do so for a given system.

The RCS "ident" command finds strings with a certain format in one or
more files; it works on binary files. (BTW, it's not really used *by*
RCS; it's used *with* RCS.)
</OT>
 
R

Richard Bos

jacob navia said:
This method was used by the rcs source code control system, and by
MANY tools under Unix. Some compiler would optimize
that away, but since those tools worked perfectly under
Unix and under windows, I suppose that wasn't the case.

Actually, if a compiler would _not_ optimise such a string away if it
wasn't actually used, I'd doubt the quality of the optimiser. It might
work if you turn off optimisation during development, but that's never
going to be a general solution. You might have had a point if you'd
declared it volatile.
Personally, if you think it is off topic I do not really care.

Yes, that's been obvious for years.
If you do not like Microsoft I do not really care either.

Like many amateur Microsoft haters, you do not advance any arguments.

Oh, I assure you that I'm a very professional Microsoft hater. So would
you be, if you'd had to manage a large number of MS Windows desktops,
but also some Apple, Novell and Unix machines to show you how a serious
operating system _can_ be written.
XML is an international standard used under linux / solaris/
Macintosh/ Windows and MANY other operating systems. You do NOT
specify why XML is a "semi standard" or what YOUR problem with
XML is. (As I said above NO ARGUMENTS).

I have many arguments, but they are off-topic here. You can find them
whereever I've had reason to comment on XML; do a search.

One important _on_-topic reason, though, is that the XML Standard - such
as it is - is completely separate from the C Standard, so adding it into
ISO C would create a needless dependency.
You just advance your opinion as it was a fact, then, because I speak
about XML, I *must* be a Microsoft slave (obvious isn't it?)

Not very obvious, and to an intelligent reader it is in fact clear that
that is not what I wrote. Your lack of common sense and Microsoft's
evilness are two separate regrettabilities, though in this case one
leads to the other.
Again, if you feel this is off topic DO NOT REPLY, or use a killfile.
Put me in there and the problem is solved.

You misunderstand the nature of Usenet; and you underestimate the
tragedy of the commons.

Richard
 
A

Army1987

Actually, if a compiler would _not_ optimise such a string away if it
wasn't actually used, I'd doubt the quality of the optimiser. It might
work if you turn off optimisation during development, but that's never
going to be a general solution. You might have had a point if you'd
declared it volatile.
Or drop the static. That's a .o file, and the compiler doesn't
know whether tomorrow I'm going to compile a file with
extern char *myFileNameVariable;
and link it with exe.o.
 
R

Richard Bos

Army1987 said:
Or drop the static. That's a .o file, and the compiler doesn't
know whether tomorrow I'm going to compile a file with
extern char *myFileNameVariable;
and link it with exe.o.

There is, however, nothing keeping optimising linkers from omitting it,
just as they can omit functions which aren't called in the rest of the
program.

Richard
 
K

Kenny McCormack

Richard Bos said:
There is, however, nothing keeping optimising linkers from omitting it,
just as they can omit functions which aren't called in the rest of the
program.

For that matter, there's nothing to prevent an implementation from
encrypting the string such that even though it *is* used, it doesn't
appear in the binary (in the sense in which we are discussing here).

In fact, most modern installation programs for Windows products have
this attribute; they are
compressed/encrypted/whatever-you-want-to-call-it, with the result that
strings you know are in there, can't be found via the usual tools.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top