ByteCode vs MachineCode reverse enginering

Skybuck Flying · Dec 3, 2008

Hello,

It seems like ByteCode is more easily reversed engineered than MachineCode ?

For example:

Java has the JAD Decompiler which seems pretty good.

Delphi/C/C++ have no good decompiler (?).

Why this difference ?

Documentation does not seem to be the problem:

The byte code is probably well documented.
The machine code is probably well documented.

Maybe it's because the machine code is more complex/has more variations ?

Bye,
Skybuck.

Skybuck Flying · Dec 3, 2008

Ok,

Well JAD not that great anymore it seems.

I decompiled some obfuscated jar and then tried compiling with javac 1.6.

Jad produced this code:

int bla()
{
return 0; // code snipped and replaced.
Exception exception;
exception; // javac chokes on this.
return -1;
}

I haven't programmed in java in a long time... so I am not sure if this is
"new valid java syntax"

Maybe it would compile in a newer java ?

Bye,
Skybuck.

Skybuck Flying · Dec 3, 2008

I also tried a "different" decompiler called: DJ.

It turns out this is probably just JAD with a GUI wrapped around it

Bunch of stealers !

Bye,
Skybuck.

Skybuck Flying · Dec 3, 2008

DJ crashes and hangs a lot... just three tries... all three crashed/hanged
hehehehe... like open/copy/help bunch of crap !

and they want money for
it too LOL.

Gonna delete it from my system !

Bye,
Skybuck.

Skybuck Flying · Dec 3, 2008

Best thing about the program was the deinstaller

it removed quite nicely
it seems... even the folder

Gonna leave the download in my download folder though... for eternal shame

Can fool me once... can't fool me twice LOL.

Bye,
Skybuck.

BigZero · Dec 3, 2008

Well i m not good in this but if i guess the things,

Java has the JAD Decompiler which seems pretty good.

it bcoz of the java is platform independent,
that means when the java code complies / convert to class(byte code)
it won't convert the whole in to machine code, that is simple to
decomplie the code.

Delphi/C/C++ have no good decompiler (?).

Where the C/C++ code completely converted to there machine / native to
that hard ware it is very hard to decomplie it.

well all i think is c/C++ convert to native code where the Java code
won't convert to native code but JVM take care for converting to
native code.

Thanks
Vm

Skybuck Flying · Dec 3, 2008

One possible explanation could be:

The virtual machine/byte code has no cpu registers.

The real machine has cpu registers so the high level language which ignores
registers has to be converted to fit inside the registers.

So a real machine decompiler would have to understand how to convert the
"special register code" back to memory only code...

Bye,
Skybuck.

Tom Anderson · Dec 3, 2008

It seems like ByteCode is more easily reversed engineered than
MachineCode ?
Yes.

Maybe it's because the machine code is more complex/has more variations?

Exactly. Bytecode was designed to be simple and transparent, so that it
was easy to write parsers for it, and also to verify - a JVM has to verify
that the code it loads is well-formed before it runs it. Parsing and
analysing bytecode for verification, interpretation or compilation is
pretty similar to what you have to do for decompilation, at least for the
first phase.

tom

Tom Anderson · Dec 3, 2008

I don't think so. Certainly not Java 5

JAD is mostly pretty good, but it does occasionally just flip out like
this. I can understand it failing to decompile, or getting the code wrong,
but emitting code which is simply syntacticaly incorrect, as in the
choke-inducing line above, is quite odd.

I think JAD is a bit of a dancing bear - the amazing thing is not how well
it dances, but that it dances at all.

tom

Silvio Bierman · Dec 3, 2008

Skybuck said:
Hello,

It seems like ByteCode is more easily reversed engineered than MachineCode ?

For example:

Java has the JAD Decompiler which seems pretty good.

Delphi/C/C++ have no good decompiler (?).

Why this difference ?

Documentation does not seem to be the problem:

The byte code is probably well documented.
The machine code is probably well documented.

Maybe it's because the machine code is more complex/has more variations ?

Bye,
Skybuck.

Bytecode has a much higher abstraction level than assembler. "Add A to
B" trnaslates to a direct bytecode instruction and is the only one that
will result from a += b in a Java program. In assembler it might be any
sequence of one to perhaps four or five instructions.

But don't worry, bytecode allows enough (reordering) freedom to
obfuscators to obfuscate bytecode into something that can not be
reversed engineered into compilable Java. Any piece of Java code longer
then say 5-10 lines, compiled and then obfuscated with yguard or an
obfuscator of equivalent quality can not be decompiled into anything
remotely usable by any of the current Java decompilers. I am not
guessing, I have tried this repeatedly and extensively.

Regards,

Silvio Bierman

Arne Vajhøj · Dec 3, 2008

Skybuck said:
It seems like ByteCode is more easily reversed engineered than MachineCode ?

That is correct.

For example:

Java has the JAD Decompiler which seems pretty good.

Delphi/C/C++ have no good decompiler (?).

Why this difference ?

Documentation does not seem to be the problem:

The byte code is probably well documented.
The machine code is probably well documented.

Maybe it's because the machine code is more complex/has more variations ?

Some reasons behind:

1) A native compiler does optimization when it compiles from
.c/.cpp/.pas to .obj/.exe but a Java compiler does not optimize
when compiling .java to .class - the optimization is done by the
JVM when the code is executed (JIT compilation)

2) Java byte code has much more info than native binary code - it has to
because in Java you compile against the byte code while in C/C++ you
compile against source (header files).

3) Java byte code is much more high level than real instruction sets.

Arne

robertwessel2 · Dec 3, 2008

Bytecode has a much higher abstraction level than assembler. "Add A to
B" trnaslates to a direct bytecode instruction and is the only one that
will result from a += b in a Java program. In assembler it might be any
sequence of one to perhaps four or five instructions.

Not it doesn't. Java bytecode is a stack oriented load/store
architecture. So your sample code would translate into something
like: iload / iload / iadd / istore (assuming a and b are integers).

Java byte codes are easier to decompile because there are additional
requirements beyond the opcodes - namely that all control flows meet
certain patterns (single entry/exit, proper structured control
nesting, no self modifying code, etc.), and that the type system is
never violated. That way, the verifier can look at a routine when
it's being loaded, and by making sure the higher level requirements
are met, it can ensure that the routine in question will never cause a
safety problem. For example, if the verifier found the sequence
iload / fload / iadd / istore, it would refuse to load the module
because of the type violation (attempting to add an integer and a
float from the top of the stack). By requiring that only simple
control structures with simple nesting are allowed, it's easy for the
verifier to look through all possible code paths and verify what will
be on the top of the operand stack at all times, and thus verify that
the type system isn't violated.

That additional required structure in the code also makes it much
easier to parse and translate into another format (machine code for a
JVM, something vaguely like the original source for a decompiler).

Skybuck Flying · Dec 4, 2008

Well intel's documents not that clear.

For example take the instruction:

mov ebx, $5

It's real opcode is BB

However the documentation says: B8 + rd, whatever that means

So one can't rely on intel's documentation and will have to trust
documentation of third parties which I will not name here otherwise they
might got nuked by you know who

Bye,
Skybuck.

Skybuck Flying · Dec 4, 2008

Ok, got bitten again... still these documents suck... too much work to
figure it out.

AMD's was easiest to find:

Simply called:

"
2.5.2 Opcode Syntax

+rb, +rw, +rd, +rq—Specifies a register value that is added to the
hexadecimal byte on the left,
forming a one-byte opcode. The result is an instruction that operates on the
register specified by the
register code. Valid register-code values are shown in Table 2-2.
"

Intel documented it too but I couldn't find it a first...:

"
3.1.1.1 Opcode Column in the Instruction Summary Table

+rb, +rw, +rd, +ro — A register code, from 0 through 7, added to the
hexadecimal byte
given at the left of the plus sign to form a single opcode byte. See Table
3-1 for the codes.
The +ro columns in the table are applicable only in 64-bit mode.
"

Bye,
Skybuck.

Mike Schilling · Dec 4, 2008

Arne said:
That is correct.

Some reasons behind:

1) A native compiler does optimization when it compiles from
.c/.cpp/.pas to .obj/.exe but a Java compiler does not optimize
when compiling .java to .class - the optimization is done by the
JVM when the code is executed (JIT compilation)

That is, a Java compiler can emit bytecode instructions in pretty much the
same order that the corresponding lines appear in the source code. A native
compiler will often re-order them significantly for performance.

2) Java byte code has much more info than native binary code - it has
to because in Java you compile against the byte code while in
C/C++ you compile against source (header files).

3) Java byte code is much more high level than real instruction sets.

4: There is a limited set of Java compilers, especially now that Jikes has
gone away.

Given 1 and 4, a decompiler can make very sound guesses about what source
line might have generated a set of bytecode instructions.

Glen Herrmannsfeldt · Dec 4, 2008

Arne said:
Skybuck Flying wrote: (snip)
Some reasons behind:

1) A native compiler does optimization when it compiles from
.c/.cpp/.pas to .obj/.exe but a Java compiler does not optimize
when compiling .java to .class - the optimization is done by the
JVM when the code is executed (JIT compilation)

As I understand it, Java compilers aren't allowed to make some
optimizations expected by C or Fortran compilers. Moving invariant
operations out of loops, common subexpression elimination, and such.

Those can be very hard for a decompiler to figure out, not to mention
a person doing it by hand.

2) Java byte code has much more info than native binary code - it has to
because in Java you compile against the byte code while in C/C++ you
compile against source (header files).

3) Java byte code is much more high level than real instruction sets.

Is it really all that different? Well, there aren't so many stack
machines around anymore. (The B5500, the first computer I ever used...)

I believe it is more the exception model that limits how much code
moving the compiler can do. An example from many years ago...

DO 11 I=1,10
DO 12 J=1,10
9 IF(B(I).LT.0) GO TO 11
12 C(J)=SQRT(B(I))
11 CONTINUE

An optimizing Fortran compiler will move the SQRT out of
the J loop (keeping the intermediate in a register).
The IF is not moved, so that one might have a SQRT exception.
It would also be very difficult for a decompiler to figure out.

It may be that Fortran compilers shouldn't do that, but they do.
I don't believe that Java compilers are allowed to optimize
that way.

-- glen

Lew · Dec 4, 2008

Glen said:
As I understand it, Java compilers aren't allowed to make some
optimizations expected by C or Fortran compilers. Moving invariant
operations out of loops, common subexpression elimination, and such.

Those can be very hard for a decompiler to figure out, not to mention
a person doing it by hand.

This is accurate in the context of decompiling bytecode, but let us not forget
that those optimizations can and do happen in the second compilation step,
from bytecode to machine code.

Glen said:
Is it really all that different? Well, there aren't so many stack
machines around anymore. (The B5500, the first computer I ever used...)

Not so much if the machine instruction set is object-oriented.

....

An optimizing Fortran compiler will move the SQRT out of
the J loop (keeping the intermediate in a register).
The IF is not moved, so that one might have a SQRT exception.
It would also be very difficult for a decompiler to figure out.

It may be that Fortran compilers shouldn't do that, but they do.
I don't believe that Java compilers are allowed to optimize
that way.

Java is allowed to do that optimization, though, at runtime.

-- glen

Consider adhering to the Usenet standard of setting off sigs with a single
line comprising only the characters "-- " (dash dash space).

Roedy Green · Dec 4, 2008

It seems like ByteCode is more easily reversed engineered than MachineCode ?

For example:

Java has the JAD Decompiler which seems pretty good.

Delphi/C/C++ have no good decompiler (?).

Why this difference ?

Documentation does not seem to be the problem:

The byte code is probably well documented.
The machine code is probably well documented.

Maybe it's because the machine code is more complex/has more variations ?

Some of the reasons include:

1. Intel architecture does not cleanly separate code from data. A
decompiler which looks at the code without running it, has to guess
which is which.

2. there is no optimisation on byte code. Thus it flips back into
reasonably comprehensible Java.

3. the JVM and Java are designed for each other. Intel Architecture
was not designed for ANY high level language. It was designed for the
assembler programmer. This means there is a almost as one for one
translation between Java and the JVM.

4. Byte code contains far more information than an EXE file. It has
symbol names and types. The boundaries between methods are clearly
demarked. It more like a *.OBJ file. There are thus many more clues
useful for decompilation.

If you play with a machine code disassembler, I think the problems
will come clearer. I used to use Sourcer, but it has been
discontinued. I don't even know of an Intel 32-bit machine code
disassembler any more.

--
Roedy Green Canadian Mind Products
http://mindprod.com
"Humanity is conducting an unintended, uncontrolled, globally pervasive experiment
whose ultimate consequences could be second only to global nuclear war."
~ Environment Canada (The Canadian equivalent of the EPA on global warming)

Arne Vajhøj · Dec 4, 2008

Mike said:
4: There is a limited set of Java compilers, especially now that Jikes has
gone away.

There exist a few:
http://schmidt.devlib.org/java/bytecode-compilers.html

But in general Java compiler are very uninteresting because
of the simple translation it does. So there are few options
to do a smarter compiler.

Arne

Lew · Dec 4, 2008

Arne said:
There exist a few:
http://schmidt.devlib.org/java/bytecode-compilers.html

But in general Java compiler are very uninteresting because
of the simple translation it does. So there are few options
to do a smarter compiler.

That site leaves out IBM's and Oracle's, to name two.

does bytecode and machine code are same ?	13	Sep 20, 2006
Improving efficiency of a bytecode interpreter	45	Oct 27, 2009
[SUMMARY] Bytecode Compiler (#100)	0	Nov 9, 2006
FAQ 4.24 How do I reverse a string?	0	Apr 27, 2011
Sun compiled bytecode running on IBM's VM?	9	Nov 6, 2003
loading a class whose bytecode comes in a byte[]	6	Nov 17, 2004
test(void *data) vs test(void &data)	91	Jun 18, 2011
Reverse engineering OO perl code	3	Aug 28, 2007

ByteCode vs MachineCode reverse enginering

Skybuck Flying

Skybuck Flying

Skybuck Flying

Skybuck Flying

Skybuck Flying

BigZero

Skybuck Flying

Tom Anderson

Tom Anderson

Silvio Bierman

Arne Vajhøj

robertwessel2

Skybuck Flying

Skybuck Flying

Mike Schilling

Glen Herrmannsfeldt

Lew

Roedy Green

Arne Vajhøj

Lew

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads