Normally people ask if you can convert machine language to C source. Two
reasons for this. First is that I have the binaries but lost the source
code (it happens even with backups). The second is that I have someone
else's binaries and I want to reverse engineer them.
Binary to Asm is often difficult, requireing many person-oriented passes
with a disassembler, making the judgements: (often for each byte!)
is that BYTE DATA or part of an INSTRUCTION?
When you decide it is an instruction, because of the flow, is the
referenced word an "long int", "float", pointer or some struct or array
base-address?
This process can only be verified when the resultant Assy source is
understandable, assembled and then linked into the IDENTICAL core image the
original program had.
I have proposed an instruction interpretor which could mark each byte as it
is used by each instruction while running the code (but i've never seen
one!).
Most disassemblers dont work(by themselves)! -- especially with variable
length instruction formats (like x86!)
If you want to go from machine code to C source there are programs out
there that will do something. All are operating system specific and most
are compiler specific as well.
A person who knows what a compiler will generate for each statement can
de-compile the assembly source fairly easily, that person can also write a
program to do the same thing more rapidly
Just do a search on "reverse engineer <your
OS> <your compiler>" and you might find something. The source code they
product is difficult to read and next to impossible to maintain. It is
often easier to reverse engineer the requirements and write the program
from scratch.
This depends a lot on the AMOUNT of programs you need to de-compile
certainly Hundreds of lines, maybe a Thousand lines and NOT MILLIONS of
lines.
If you have actual assembly source code and want to turn it into C source
code that might actually be harder.
Applied Conversion Technologies (
www.actworld.com) was originally started
to exploit the technology I developed to translate 45MB of DG NOVA assembly
to C to move a CAM system to the PC/AT platform in the 80's. Much of the
Assy source contained comments that were useful in maintaining the
translated C source, some was not. The main features were simularities in
the programs that could be recognized and consistently translated.
Another project: involving the CDC 469 (Phalanx Gun) computer Assy to Ada
required discarding lots of comments relating to fixed point arithmetic
magnitue which was irrelevant when variables were re-cast to floating
point.
The market for people who know C but
have some assembly code is a lot smaller than people who want to reverse
engineer binaries.
RIGHT!
It would also be specific to the assembler and the
operating system. Maybe the search for reverse engineering might find
something but the results will be about the same or worse than going from
binary to C source.
absolutely NOT, there is information in the Assy source that shouldn't be
lost! However, each case will be different and custom for each
programmer/compiler and the effort expended to extract the design will be
a judgement of the business-persons involved.
If you cannot find an assembly language to C source
converter you can try getting an assembler, create a binary then use
machine language to C source converts.
one step forward, TWO or more back!
Bottom line, it is usually more effort to maintain the resulting source
code then it would be to write the application from scratch.
UNLESS you factor in the NEWLY introduced bugs while writing fro SCRATCH.
also: "better the bugs you know than the bugs you haven't met yet"
You must also factor in advances in interface design:
Does Visual XX replace all that code with a few mouse KLIKS? and MegaBytes
of DLL
Further, you must consider the goodness in moving forward from previous
designs accurately translated (warts and all), and not "re-inventing the
wheel".
Bob Sheff; PBgeek at att dot net
Independent Consultant:
Software(Pascal,PL/M,CHILL,FORTRAN,..assy) Conversion to C/C++
please do not reply to (e-mail address removed) or
(e-mail address removed)