Getting source file from the object file

P

pavunkumar

Dear Friend

Can we get the source file from the object ?
example :

file a.c
cc a.c
Now I have a.out
But I not have the a.c
how could I get the source file using a.out

thanks
 
N

Nate Eldredge

pavunkumar said:
Dear Friend

Can we get the source file from the object ?

Generally, no.
example :

file a.c
cc a.c
Now I have a.out
But I not have the a.c

That's unfortunate.
how could I get the source file using a.out

On most systems, it is impossible. Much information is lost in
compilation.
 
P

pavunkumar

Generally, no.



That's unfortunate.


On most systems, it is impossible. Much information is lost in
compilation.

Here , For example only I told , my source file is deleted, My doubt
whether we can get or not
 
G

Guest

Here ,  For example only I told , my source file is deleted, My doubt
whether we can get or not- Hide quoted text -

which bit of "on most systems, it is impossible. Much information is
lost in
compilation" didn't you understand? The short answer is "no you cannot
turn
object into source". Some very specialised systems do it but you
almost
certainly do not have one.
 
U

user923005

Dear Friend

                  Can we get the source file from the object ?
example :

file a.c
cc a.c
Now I have a.out
But I not have the a.c
how could I get the source file using a.out

In general, what you are trying to do is to turn the hamberger back
into the cow.

Now, it is probably impossible to decompile 100%, because it has been
shown to be equivalent to the "does the turing machine halt?" problem.

However, pragmatically speaking, it can sometimes be made to work (up
to a point -- e.g. the comments are lost for sure, and probably the
variable names but you might be able to collect the algorithm).

Do a web search for "decompiler".

A better approach is to use IdaPro disassembler. It will do a flow
chart for you and you can collect the algorithm from that.
 
R

Richard Tobin

user923005 said:
Now, it is probably impossible to decompile 100%, because it has been
shown to be equivalent to the "does the turing machine halt?" problem.

What?

Obviously decompilation is impossible in a sense, because different
source programs can produce the same object file.

Equally obviously, it is theoretically possible to find a source
program which produces that code by enumerating all the source
programs and compiling them until we find one which produces the
desired object file. (Assuming of course that the object file really
is the result of compiling a source file.)

So where does the halting problem come into it?

-- Richard
 
C

Chris Dollin

Richard said:
What?

Obviously decompilation is impossible in a sense, because different
source programs can produce the same object file.

Equally obviously, it is theoretically possible to find a source
program which produces that code by enumerating all the source
programs and compiling them until we find one which produces the
desired object file. (Assuming of course that the object file really
is the result of compiling a source file.)

Nitpick: this assumes that the object code depends /only/ on the source
code, and not (say) on the user that compiled it, or the time it was
compiled, or the IP address of the machine it was compiled on -- all of
which /might/ be left, as provenance, in the object code.

I agree that decompilation is not a halting-problem issue. On the other
hand, we might want a slightly faster algorithm than blind search ...
 
R

Richard Tobin

Chris Dollin said:
Nitpick: this assumes that the object code depends /only/ on the source
code, and not (say) on the user that compiled it, or the time it was
compiled, or the IP address of the machine it was compiled on -- all of
which /might/ be left, as provenance, in the object code.

All of these have only countably many values, so we can iterate over
them.

-- Richard
 
U

user923005

Nitpick: this assumes that the object code depends /only/ on the source
code, and not (say) on the user that compiled it, or the time it was
compiled, or the IP address of the machine it was compiled on -- all of
which /might/ be left, as provenance, in the object code.

I agree that decompilation is not a halting-problem issue. On the other
hand, we might want a slightly faster algorithm than blind search ...

It has been formally proven to be equivalent to the halting problem.
Some citations are found here:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.1272
 
U

user923005

It has been formally proven to be equivalent to the halting problem.
Some citations are found here:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.1272


The actual proof is in here:
P.T.Breuer and J.P.Bowen, "Decompilation: The enumeration of types and
grammars", Tech. Rep. PRG-TR-11-92, Oxford University Computing
Laboratory, 11 Keble Road, Oxford OX1 3QD, 1992.
Found here:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.8799

Where:
"Proposition 1 There is no computable algorithm which will take an
arbitrary attribute grammar description and always produce a piece of
computer code which enumerates all the valid phases of the grammar,
and then terminates (if the grammar is finite)."
is proposed and then proven.
 
P

Phil Carmody

What?

Obviously decompilation is impossible in a sense, because different
source programs can produce the same object file.

Equally obviously, it is theoretically possible to find a source
program which produces that code by enumerating all the source
programs and compiling them until we find one which produces the
desired object file. (Assuming of course that the object file really
is the result of compiling a source file.)

So where does the halting problem come into it?


Because decompilation of arbitrary programs is equivalent to
solving the halting problem. Even detection of which part of
the program might be executed, and which part of the program
will not be executed is impossible to decide for arbitrary
programs. You claim to have an '.ac.uk' address - use it.

Phil
 
N

Nate Eldredge

Phil Carmody said:
Because decompilation of arbitrary programs is equivalent to
solving the halting problem. Even detection of which part of
the program might be executed, and which part of the program
will not be executed is impossible to decide for arbitrary
programs.

There's some confusion here. If we know that the given executable file
is the output of a particular compiler whose output is a deterministic
function of its input, then certainly a brute-force search will find, in
finite time, some source file which when compiled by that compiler
yields the desired output.

The more general problem, which is described in the papers cited,
appears to suppose that we are given an executable file and a compiler
for some high-level language, but the executable was not necessarily
produced by the compiler. We are asked to find a source file which,
when compiled, produces an executable which behaves equivalently to the
given one (let us say, it produces the same output as the given
executable for all possible inputs). Of course, the new executable need
not be byte-by-byte identical to the original one; the compiler may not
be capable of producing such a thing. I presume there are also some
restrictions needed on the language, so that the source really is a
high-level description of the program and not just a disassembly or
something. (Certainly we can disassemble the program, but we won't know
that the instructions in the disassembly are actually the instructions
that the machine executes. For instance, it may jump into the middle of
one of the "instructions" in the proposed disassembly.)

I don't doubt that the latter problem is reducible to the halting
problem. The former certainly is not, and it more accurately describes
the OP's question. Not that it's a practical solution, of course.
You claim to have an '.ac.uk' address - use it.

Was that necessary?
 
J

jameskuyper

Phil said:
Because decompilation of arbitrary programs is equivalent to
solving the halting problem.

That doesn't answer the question; that is just a restatement of the
subject of the question. How does the halting problem get connected to
decompilation?

I suppose a sufficiently sophisticated compiler might translate a C
program into machine code that carries out some arbitrarily
complicated set-up process before actually implementing the source
code. Figuring out what the original source code might have been
requires figuring out what the end result of that set-up process will
be, which in turn requires figuring out when it will stop. For
instance, the actual program might contain nothing more than a
decompression routine and a compressed version of the actual
executable. The first thing it does is decompress the executable, and
the second thing it does is start executing it. This could go through
several levels: a very sophisticated decompression routine might
itself be compressed, to be decompressed using a much simpler routine.
 
R

Richard Tobin

Equally obviously, it is theoretically possible to find a source
program which produces that code by enumerating all the source
programs and compiling them until we find one which produces the
desired object file. (Assuming of course that the object file really
is the result of compiling a source file.)

So where does the halting problem come into it?
[/QUOTE]
Because decompilation of arbitrary programs is equivalent to
solving the halting problem.

The paragraph above gives a terminating algorithm for decompiling
a program, given the compiler used to compile it. Taking an arbitrary
piece of object code and finding an equivalent program in some other
"source" language may well have the problem you mention, but that is
not the case we're considering. The OP has simply lost the source
to one of his programs.
Even detection of which part of
the program might be executed, and which part of the program
will not be executed is impossible to decide for arbitrary
programs.

We are not considering arbitrary programs, we are considering programs
generated by a known compiler. And we don't need to determine which
parts may be executed to produce a source program that compiles to
the given object code.

-- Richard
 
U

user923005

Phil Carmody said:
[email protected] (Richard Tobin) said:
There's some confusion here.  If we know that the given executable file
is the output of a particular compiler whose output is a deterministic
function of its input, then certainly a brute-force search will find, in
finite time, some source file which when compiled by that compiler
yields the desired output.

The more general problem, which is described in the papers cited,
appears to suppose that we are given an executable file and a compiler
for some high-level language, but the executable was not necessarily
produced by the compiler.  We are asked to find a source file which,
when compiled, produces an executable which behaves equivalently to the
given one (let us say, it produces the same output as the given
executable for all possible inputs).  Of course, the new executable need
not be byte-by-byte identical to the original one; the compiler may not
be capable of producing such a thing.  I presume there are also some
restrictions needed on the language, so that the source really is a
high-level description of the program and not just a disassembly or
something.  (Certainly we can disassemble the program, but we won't know
that the instructions in the disassembly are actually the instructions
that the machine executes.  For instance, it may jump into the middle of
one of the "instructions" in the proposed disassembly.)

I don't doubt that the latter problem is reducible to the halting
problem.  The former certainly is not, and it more accurately describes
the OP's question.  Not that it's a practical solution, of course.

I think that this approach assumes that the compiler vendor is writing
the decompiler.

For instance, a compiler vendor can substitute inline assembly for a
function call and do other operations that are not actually attainable
from the C language itself. It is entirely possible that some of the
inline source code cannot be reversed to C (we can easily imagine
system specific things like file operations).

Compilers have bugs. I found a bug generated in the assembly of the
Intel compiler, for instance. Now, what is the decompiler going to do
with this incorrect assembly language when trying to turn it back into
C?

I think that in general, the problem is a lot more difficult than most
people think. Not to say that useful decompilers cannot be created.
Just that these decompilers will exhibit lots and lots of problems,
especially when you try do do something complex with them.

I have also found that even the much more simplistic job of converting
an executable into its assembly language can sometimes go awry, even
with the best available disassemblers. If this task can be
problematic, I guess that the far more difficult job of translating
that assembly into code that could have made it will be a lot more
difficult.

I have tried a few executable decompilers. My bottom line assessment
is that none of them work (their net value is close to zero). Now,
pcode decompilers work great. DotNet code or Java can be turned back
into reliable source straight away. The new source code won't look
much like the original, but it can be compiled again most of the time
and the new binary will pretty much behave like the old one.

IMO-YMMV.
[snip]
 
D

Dik T. Winter

> It has been formally proven to be equivalent to the halting problem.
> Some citations are found here:
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=3D10.1.1.22.1272

In the abstract you point to there is nothing said about a formal proof
of equivalence, only that there is a link. If there is a proof, I think
it only shows that the *methodology* used makes it equivalent to the
halting problem.

If I may interprete this, it means that with the given methodology you do
not know whether for a given input the process will stop and produce output
that in turn produces when compiled the given input.

However, I think that it is possible to create a program that takes the
object code, translates it back to C, and when the C source is compiled
the resulting object code will do the same. Of course the recovered source
will be ugly. And moreover, I think that in many cases the recovered source
will show quite a bit of undefined behaviour at times. But I have once
done it for a program compiled by a DEC RSX-11 Fortran compiler, of which
the product was running on a VAX. The object was hacked such that it could
also run on Unix V6, this in turn was hacked to run under Unix V7, and that
ran in PDP-11 emulation mode on the VAX under BSD Unix. The program was
(of course) dungeon (aka zork). The decompiled object had many statements
like:
M046354 = M056132
but it was sufficient to recover all information needed to rewrite it for
my very personal Pascal compiler (and discover some bugs in the source).
If you look in the archives for comp.sources.games you will find bug-fixes
that were the result of this effort.
 
D

Dik T. Winter

> Where:
> "Proposition 1 There is no computable algorithm which will take an
> arbitrary attribute grammar description and always produce a piece of
> computer code which enumerates all the valid phases of the grammar,
> and then terminates (if the grammar is finite)."
> is proposed and then proven.

Aha. Here is the problem: "which will take an arbitrary attribite grammar
description...". Deompilation to a particular language does *not* use an
arbitrary grammar description. There can be algorithms that use a
*particular* grammar description.
 
D

Dik T. Winter

> Because decompilation of arbitrary programs is equivalent to
> solving the halting problem. Even detection of which part of
> the program might be executed, and which part of the program
> will not be executed is impossible to decide for arbitrary
> programs.

The last part is irrelevant. It does not matter whether the resulting
program contains parts that are not executed. When I did my decompiling
I started at the main entry, followed all jumps and so did decompile all
parts that possibly could be executed. This did not always work, but
when it did not work it was clear that the object was not based on source
in a HOL only (like encrypted object code that first had to be decrypted).

When a program is the result of a HOL compiler only a source can be found
that will result in an object that does exactly the same.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top