code to large for machine-generated code

O

Oliver Wong

jmcgill said:
The compiler should know when it's writing an invalid classfile and should
abort with a fatal error.

The OP mentions that the compiler is throwing a "Code too large" error.

- Oliver
 
J

jmcgill

Oliver said:
The OP mentions that the compiler is throwing a "Code too large" error.

Ah, so the ball is in the translator's court.

I still want to know if the input to javac looks reasonable. Somehow I
doubt it.
 
M

Mike Schilling

Oliver Wong said:
From my understanding, the OP has some compiler which takes source code
written into some language (call it language A), and produces code written
in another language (language B). It just so happens that language B is a
superset of Java. I.e., it is exactly like Java, but without various size
limitations.

The limitations on the amount of byte code in a method can't reasonably be
called part of the Java language, since the translation from source to byte
code isn't part of the language definition. There are programs with methods
large enough to be near the limit that, one compiler, say javac, can
compile, while another, say jikes, can't, and quite possibly vice versa.

To put it another way, the program:

class Big
{
int i;

void meth()
{
i = 12; // 10922 of these
...
}
}

compiles successfully under javac 1.4.2_08. Make it 10923 copies of that
line, and it no longer compiles. Nothing in the language specification
will allow you to determine that 10922 is the magic number.
 
J

jmcgill

Mike said:
compiles successfully under javac 1.4.2_08. Make it 10923 copies of that
line, and it no longer compiles. Nothing in the language specification
will allow you to determine that 10922 is the magic number.

I have a feeling that's a version of the Halting Problem or something
related. You can't know until you've actually gone through the
compilation process, and by then it's too late.

However, in my mind I cannot separate the JVM spec from the language spec.

On the other hand, if you showed me a method that was ten thousand lines
long, I wouldn't agree that it was reasonable, even if machine
generated. I don't care if it won't compile, or won't run, because I
don't consider it reasonable to expect it to.

However, my participation in this thread is because I'm still curious
about what code situation led to the problem for the OP. He claims a
quite simple routine has generated ab uncompilable java class. So I'm
still hoping to see the 20 lines of C or Cobol or Befunge or whatever,
that creates 20,000 lines of java, or (even more interesting) the 20, or
400, lines of java that choke the compiler.

I'm tired of speculating and I really want to see the code that caused
the failure.
 
E

EJP

jmcgill said:
But it's still not clear whether the problem is javac turning reasonable
java statements into invalid bytecode, or whether the problem is a code
translator that creates unreasonable (or invalid) java source.

It's perfectly clear that it's a translator outputting lots of Java
source from a short sequence in the source language. Javac and invalid
byte code have nothing to do with it except that javac reports 'method
too long'.
 
J

jmcgill

EJP said:
It's perfectly clear that it's a translator outputting lots of Java
source from a short sequence in the source language.

If you say so. I didn't find it all that clear.

Javac and invalid
byte code have nothing to do with it except that javac reports 'method
too long'.

That's everything to do with it. There's not a clear, well-defined
limit for the length of a method, and I assume that's at least partly
because this boundary cannot be determined easily, if at all.

Are we arguing that the language allows arbitrarily large methods while
a given language implementation does not?

Like I said, I'm tired of speculating on this, and I just wish the OP
would post the original snippet, the resulting java, I could rub my
beard in curiosity and then move on :)
 
M

Mike Schilling

jmcgill said:
I have a feeling that's a version of the Halting Problem or something
related. You can't know until you've actually gone through the
compilation process, and by then it's too late.

If you really wanted to, you could assign costs to various language
constructs, require compilers to generate bytecode sequences no longer than
the total of those costs, and define the maximum cost of a method. This
would be silly, of course, but it's possible.
and I cannot separate the JVM spec from the language spec.

That's a shame; you should. One describes a language, one describe an
implementation of that language. They are not the same thing.
 
J

jmcgill

Mike said:
That's a shame; you should. One describes a language, one describe an
implementation of that language. They are not the same thing.

In my narrow corner of the real world, there is only Sun. I'm not
particularly proud of that :)

The truth is I understand the distinction fully.
 
E

EJP

jmcgill said:
If you say so. I didn't find it all that clear.

I don't know why not. Nobody else talked about the compiler producing
invalid byte code, you made that up. It doesn't, it produces an error
message instead, which is what the OP is talking about.
There's not a clear, well-defined
limit for the length of a method, and I assume that's at least partly
because this boundary cannot be determined easily, if at all.

There is a clear well-defined limit for the length of a method of 65535
bytes specified in #4.10 of the Java Virtual Machine Specification.

The distinction between what the language permits and what the JVM
permits is meaningless. We are as always talking about a specific
compiler which has a specific target machine, in this case the JVM. The
same thing would happen with a C compiler targeting a 16-bit machine:
the code won't compile. It doesn't mean it's illegal C but it still
won't produce an object file. And the Java compilers which *dont'*
target the JVM and therefore encounter this limit form a rather small
and IMHO not very useful set.
 
J

jmcgill

EJP said:
I don't know why not.

Because there were no code examples, or even enough specific information
about the code in question to get a reasonable picture of what is going
on. I had to speculate that the code he's trying to compile is some
insanely long method.
Nobody else talked about the compiler producing
invalid byte code you made that up.

You're splitting hairs; it gives a message because its only alternative
is to write an invalid class file.
There is a clear well-defined limit for the length of a method of 65535
bytes specified in #4.10 of the Java Virtual Machine Specification.

Yes I cited that myself a while back.
The distinction between what the language permits and what the JVM
permits is meaningless.

Matter of opinion, and I don't care about that. I just want to see the
code that caused this mess, or at least hear a better description of it.
 
J

John Gagon

Thomas said:
Hi folks,

is there some way how to persuade the javac compiler to accept very long
methods? No, don't worry, I'm not writing this kind of mess. The code in
question is the result of a meta-compilation from another language, and
it turned out that this compiler generated a pretty long java code from
a seemingly simple source. Unfortunately, the compilation of the
generated java code then fails with the infamous "code too large" error.

Is there any kind of tweaking that can be done to make this code
acceptable (besides fixing the compiler that generated the java code
in first place, that is.)?

So long,
Thomas

VerifyError yes.
Check with the copy and paste detector and see how much code that will
save you. Then see if you can't get the simple source and then the
other languages compilation to deal with it. Used to get these all the
time with Jasper (tomcat) and tags that were in the service method and
not in their own module/method.

John Gagon
 
M

Mike Schilling

jmcgill said:
In my narrow corner of the real world, there is only Sun. I'm not
particularly proud of that :)

I'd go further than Sun vs. other JVMs, though. My claim is that Java is a
language with its own definition, and that JVM-related restrictions are not
part of that language definition per se.

Thought experiment: consider a Java environment that isn't JVM-based;
rather, it compiles Java to .class files, and allows you to link those files
into a native executable. If it can compile methods too large for javac,
would you consider this a violation of the language spec? Does is matter if
they can theoretically be presented in 65K of bytecode?
 
J

jmcgill

Mike said:
I'd go further than Sun vs. other JVMs, though. My claim is that Java is a
language with its own definition, and that JVM-related restrictions are not
part of that language definition per se.

Thought experiment: consider a Java environment that isn't JVM-based;
rather, it compiles Java to .class files, and allows you to link those files
into a native executable. If it can compile methods too large for javac,
would you consider this a violation of the language spec? Does is matter if
they can theoretically be presented in 65K of bytecode?

Mike we're much closer to full agreement than you seem to think.

But I've got the mindset where, if something doesn't work in practice,
you miss the delivery of your iteration. All the theory in the world
won't put words in the email to your project manager that spins it like
a positive thing ;-)

I saw the OP's routine. It's in some vector-based language that I think
I haven't ever seen. It's obvious that, where the original language
declares things as sets, the translation to java has to implement
loops... and naturally, unrolls those loops. 8K iterations, some of them.

I have to admit that I don't understand the source language, that if I
did understand it, I don't understand the physics or geometry behind
the original problem (Ising model/thermodynamic analysis; I got far
enough in physics to have heard of it, but not far enough to understand
what it's useful for.)

I'm still really curious though, because it's rare that I see a
programming language I don't recognize.

What language is this?

topology = torus in {bounded,torus};
width = 512 in {64..4096}; #defines a default and a range.
height = 512 in {64..4096};

or this:

bonds = ([0,1] << 0) + ([0,-1] << 1) + ([1,0] << 2) + ([-1,0] << 3);
 
O

Oliver Wong

jmcgill said:
I saw the OP's routine. It's in some vector-based language that I think I
haven't ever seen. [...]
I don't understand the physics or geometry behind the original problem
(Ising model/thermodynamic analysis; I got far enough in physics to have
heard of it, but not far enough to understand what it's useful for.)

I'm still really curious though, because it's rare that I see a
programming language I don't recognize.

What language is this?

topology = torus in {bounded,torus};
width = 512 in {64..4096}; #defines a default and a range.
height = 512 in {64..4096};

or this:

bonds = ([0,1] << 0) + ([0,-1] << 1) + ([1,0] << 2) + ([-1,0] << 3);

It's not unusual to invent a new domain specific language to facilitate
implementing the solution to domain specific problem. Many times, these
languages are unnamed. We've got a few unnamed languages here at my company.

- Oliver
 
M

Mike Schilling

jmcgill said:
Mike Schilling wrote:

Mike we're much closer to full agreement than you seem to think.

But I've got the mindset where, if something doesn't work in practice, you
miss the delivery of your iteration. All the theory in the world
won't put words in the email to your project manager that spins it like a
positive thing ;-)

We don't diasgree there either. My sugggestion (upthread, a bit) was to
write a postprocessor that splits the method into small enough pieces (that
is, private methods) that javac can deal with it.
 
J

James McGill

Oliver said:
It's not unusual to invent a new domain specific language to
facilitate implementing the solution to domain specific problem. Many
times, these languages are unnamed. We've got a few unnamed languages
here at my company.

I'm sure it always seems like a good idea at the time....
 
D

Dale King

jmcgill said:
The classfile format is the only issue!

It is not necessarily the only issue. You have limitations on branch
statements as well being limited to 64k.
 
M

Mike Schilling

Dale King said:
It is not necessarily the only issue. You have limitations on branch
statements as well being limited to 64k.

But this doesn't limit the size of the method, given that the goto_w
instruction uses a 32-bit offset. Longer branches can be generated using
it, e.g.

if (condition:)
{
more than 64K of code)
}
else
{
}

becomes

if (!condition)
goto_w else_block
if_block.
goto_w after_else_block
else_block

Interestingly, I found a 1995 version of the JVM spec online at
http://sunsite.ee/java/vmspec/vmspec-1.html, and under limitations, it says:

The amount of code per method is limited to 65535 bytes by the sizes
of the indices in the code in the exception table, the line number
table,
and the local variable table. This may be fixed for 1.0beta2.
 
C

Chris Uppal

Mike said:
Interestingly, I found a 1995 version of the JVM spec online at
http://sunsite.ee/java/vmspec/vmspec-1.html, and under limitations, it
says:

The amount of code per method is limited to 65535 bytes by the sizes
of the indices in the code in the exception table, the line number
table, and the local variable table. This may be fixed for 1.0beta2.

<chuckle/>

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,998
Latest member
MarissaEub

Latest Threads

Top