anyone interested in decompilation

C

CBFalconer

QuantumG said:
Really, the way you guys are hung up on symbols anyone would figure
you've never read source code written in French or German or
whichever natural language it is that you don't understand.
Recreating sensible symbols is not a problem people have trouble
solving (for foriegn languages, or even asm code), so clearly it is
not a problem they will have solving for the output of a decompiler.

And for such translation of code written in other natural
languages, my ID2ID utility serves very nicely. It can process a
whole suite of source files, making compatible changes across the
whole set. See:

<http://cbfalconer.home.att.net/download/>
 
E

emmerik

Frederick said:
QuantumG posted:



The addition of two numbers yields 100.

What are the two numbers?

This is an extreme example, but it does illustrate a point. If you all
you want to do is to find source code for a program that adds two
numbers to get the result 100 (let's assume decimal here :), then
there are obviously infinitely many such programs. But this is the
point: they are all just as good.

The program that adds 50 and 50 can be written infinitely many ways
too. Any program can be modified by changing only comments and
identifiers and not affect the result. Programs can be written with for
or while loops, you can use array indexes or pointers, there are lots
of variations at the source code level.

This turns up a lot on machine code, of course, since operations like
addition are so overloaded. The function that adds a pointer and an
integer (let's assume the pointer points to something with sizeof(1)
for simplicity) is the same as the program that adds two integers.
Identical binary code. So to decompile that function in isolation,
there are three possibilities (ignoring the infinite variations with
identifier names and comments): type the parameters as pointer, int;
int, int; or int, pointer. In a real-world program however, it will be
obvious which of these will mesh with the rest of the program, since
there are type clues all over.

Here is the important point: you can't get the original source code
back, but it doesn't matter. The important thing is to get source code
that encapsulates what the program is doing. In many cases, it is
important that the code can be recompiled. In some cases, it is
important that the code is also readable and maintainable. I can't
think of a situation where you absolutely need the original source
code, or even very close to it. The reason is that the program does
something; there is enough information there for the processor to
execute it. In principle, it should be possible to produce source code
that when compiled does the same thing. We've done it for small
programs, and even for larger programs with a lot of manual help.
It also suffers from a lack of "reality".
Frederick Gotham

Well, you are entitled to your opinion, of course. I believe that the
evidence so far shows otherwise.

- emmerik
 
Q

QuantumG

CBFalconer said:
And for such translation of code written in other natural
languages, my ID2ID utility serves very nicely. It can process a
whole suite of source files, making compatible changes across the
whole set.

Neat.

Of course, this does bring to the fore an important question: for what
uses of a decompiler do you need to do this? If you're trying to
determine what malware does or look for security flaws you really don't
need maintainable source code.. you just need enough understanding of
the parts that do what you are interested in.

On the other hand, if you've lost your source code and you want to
"recover" it from one of the binaries you have produced from it, you
definitely want maintainable source code. However, you hardly need to
reverse engineer the output of the decompiler do you? After all, if
you're the original programmer, chances are you remember what most of
the identifiers were and can quickly replace the generic ones with
them. Your tool would certainly be better than using a text editor to
do it though!

QuantumG
 
R

Richard

QuantumG said:

Not really : you could use SED or a windows equivalent with much more
confidence since there appears to be no source code or decent
documentation. Or use IDE refactoring tools.

But if you are willing to run an unknown exe good luck :-;
Of course, this does bring to the fore an important question: for what
uses of a decompiler do you need to do this? If you're trying to
determine what malware does or look for security flaws you really don't
need maintainable source code.. you just need enough understanding of
the parts that do what you are interested in.

On the other hand, if you've lost your source code and you want to
"recover" it from one of the binaries you have produced from it, you
definitely want maintainable source code. However, you hardly need to

How ironic - maybe this utility can be used after all to help produce
the original source for itself that is lost.
 
K

Keith Thompson

Richard said:
Not really : you could use SED or a windows equivalent with much more
confidence since there appears to be no source code or decent
documentation. Or use IDE refactoring tools.

But if you are willing to run an unknown exe good luck :-;

What are you talking about? The zip file contains the complete
sources, along with a "readme.txt" file and a Windows executable, and
a makefile.
 
R

Richard Bos

This is an extreme example, but it does illustrate a point. If you all
you want to do is to find source code for a program that adds two
numbers to get the result 100 (let's assume decimal here :), then
there are obviously infinitely many such programs. But this is the
point: they are all just as good.

No, that _is_ the point: they're not. Suppose you have code that
decompiles to this:

return Var00042 + 100;

Now, what does the 100 stand for? Decompilers say that it doesn't
matter; 100 is 100 is 100, and you have working code, right? Maintenance
programmers say that it matters a lot. Was it:
- a literal constant 100?
- a #defined constant WEEKS_IN_TWO_WORKING_YEARS_MINUS_VACATION?
- two #defined constants WEEKS_IN_TWO_YEARS and VACATION_PER_YEAR, *2?
- three #defined constants NUMBER_OF_YEARS, WEEKS_IN_YEAR, WEEKS_OFF?
- 'd'?

For blindly recompiling the program, it does not matter. For maintaining
it, it matters quite a bit.
Here is the important point: you can't get the original source code
back, but it doesn't matter. The important thing is to get source code
that encapsulates what the program is doing. In many cases, it is
important that the code can be recompiled. In some cases, it is
important that the code is also readable and maintainable. I can't
think of a situation where you absolutely need the original source
code, or even very close to it.

When the code must be readable and maintainable, it must also be obvious
_why_ the code does what it does, and that it can be modified and
customised in the same ways that the original could.

Richard
 
Q

QuantumG

Richard said:
When the code must be readable and maintainable, it must also be obvious
_why_ the code does what it does, and that it can be modified and
customised in the same ways that the original could.

Absolutely. And it's the job of the re-engineer to turn the output of
a decompiler into maintainable code. It takes intelligence to do this
and depends on the kind of maintainence tasks you need to perform. It
is clearly not the duty of the decompiler.

QuantumG
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,241
Latest member
Lisa1997

Latest Threads

Top