compile directly into a .zip/.jar file?


A

Andreas Leitgeb

Is it possible to compile directly into a .zip/.jar file?

This may look odd, as everyone uses ant (or something alike) to do the jarring,
but the problem that I'm trying to solve is, that the (big) number of .class
files in the target folder appears to slow down things to a crawl on AIX (jfs2).

To give an impression: merely running rm -rf on the folder containing the .class
files (about 100,000) takes 13 minutes - and this is on a local filesystem.
Compiling takes 42 mins on that AIX machine, with only 20 minutes accounted
as cpu-time. (on linux, the rm takes a couple of seconds, and compiling takes
18 minutes - almost all of it accounted as cpu-time)

My hope now is, that if I could specify a zip/jar file for the -d option
(doesn't work - already tested) or something similar, then adding files
to a zip might still be faster than the sluggy OS on large directories.

PS: the "obvious" split-it-into-separate-packages is plan Z.
Plan Y is compiling batches of 100, then adding the results to zip-file,
and removing the .class files, then go for next batch...
I do hope for some plans B-X, first, (where plan A was status sluggish quo.)

PPS: I'm not an expert of AIX, so if there are better filesystems than jfs2,
or mount-options to improve performance with large directories, then that
would also help a lot. I know, this is not an AIX group, and my AIX
whining was mostly to explain why I would want the javac-feature I asked for)
 
Ad

Advertisements

J

Joerg Meier

Is it possible to compile directly into a .zip/.jar file?

Technically, that would be the same as compiling into a ramdisk, and
zipping that. So why not try to do that ? Ramdisks are pretty generic, so
you might have an easier time finding a solution to what is essentially the
same problem wrapped in Java confines.

Liebe Gruesse,
Joerg
 
R

Roedy Green

Is it possible to compile directly into a .zip/.jar file?

I have not seen anything. I use genjar which is very quick. I don't
know if they have a version for your OS.

The big advantage is you just give it the main class and your
resources and it finds all the classes it needs. You have to tell it
about anything you load via classForName.

see http://mindprod.com/jgloss/genjar.html
 
R

Roedy Green

Technically, that would be the same as compiling into a ramdisk, and
zipping that. So why not try to do that ? Ramdisks are pretty generic, so
you might have an easier time finding a solution to what is essentially the
same problem wrapped in Java confines.

SSDs are so cheap now I was able to replace my hard disk with one. I
just use the hard disk as an attic for rarely used files.

see http://mindprod.com/bgloss/ssd.html
 
A

Andreas Leitgeb

Joerg Meier said:
Technically, that would be the same as compiling into a ramdisk, and
zipping that.

Other timing experiments, I've made meanwhile, have shown that the
large directory itself doesn't seem to be the problem. (It would
have been more comforting to believe that sluggish performance was
result of some unusal use - like large directories - but unfortunately
sluggish performance seems to be the rule on that system.)

A ramdisk thus sure looks like a more promising approach, than my previous
plans Y & Z (in my OP). Almost can't wait till monday for trying it.

Thanks!

PS: another approach might be trying option "log=NULL" for some partition
with less-than-precious data. Btw., also checking the system logs for I/O
problems on the log device is on the TODO list.
 
Ad

Advertisements

A

Andreas Leitgeb

Roedy Green said:
I have not seen anything.

At least I haven't just missed anything obvious.
I use genjar which is very quick. I don't
know if they have a version for your OS.

Although that doesn't address the problem I posted, it seems
to fit a separate problem that I've been putting off so far.

Thanks!

On your page, you say that to get genjar2 one needs mercurial,
but in the source-tab, when you click "Browse" then you're
also offered to download a .zip . There may of course be
reasons to stick with mercurial, but it's good to know one's
options.
 
A

Arne Vajhøj

Weird - I did not see the original post.

Yes. It is possible.

The javac command does not support it but Java does.

Java has supported a compiler API capable of compiling to a
byte array since 1.6 and supported writing jar files since 1.2.

So all you need to do is to write your own compiler driver with
the command line switches you want and glue everything together.

Arne
 
A

Arne Vajhøj

Technically, that would be the same as compiling into a ramdisk, and
zipping that.

"the same" is a rather elastic term.

It would avoid writing to a physical disk.

I would not avoid calling down Java->CRT->OS, managing file system
info etc..

If the goal is better performance, then whether using a RAM disk will
be more like using byte[] or like using a traditional rotating hard disk
may depend a lot on hardware, OS, file system and config. RAM access
is many many orders of magnitude faster than access to data on a
rotating disk. But with todays OS/file system caches and disk
caches - likely in write back mode, then writing those class files
to disk may not cause them to be actually written to the rotating plates
before long after the process is completed from user perspective.

Arne
 
A

Andreas Leitgeb

Arne Vajhøj said:
Weird - I did not see the original post.

shrug - just missed it, or not available on your server?
Yes. It is possible.
The javac command does not support it but Java does.
Java has supported a compiler API capable of compiling to a
byte array since 1.6 and supported writing jar files since 1.2.

do you mean:
http://docs.oracle.com/javase/7/docs/api/javax/tools/JavaCompiler.html
?

Thanks for bringing it up! I wasn't aware of that until I googled
for "java compiler api" triggered by your answer.

It's always good to know one's options, but I think this is a bit
too much of a detour for me at this point.
 
A

Arne Vajhøj

shrug - just missed it, or not available on your server?

Not on my server.

Yes.

And even though it is a rather convoluted API making it difficult to
use then compiling to byte array is certainly doable.
Thanks for bringing it up! I wasn't aware of that until I googled
for "java compiler api" triggered by your answer.

It did get a little bit of mentioning back when 1.6 came out, but
it is a feature that only a small part of Java developers ever
get a need for.

Arne
 
Ad

Advertisements

A

Andreas Leitgeb

Arne Vajhøj said:
If the goal is better performance,

It's not like I wanted to squeeze another microsecond out of the
compile time, but more like trying to bypass some brake, that the
system has slammed full on for no good reason.
But with todays OS/file system caches and disk
caches - likely in write back mode, then writing those class files
to disk may not cause them to be actually written to the rotating plates
before long after the process is completed from user perspective.

Unless, maybe, the system believes that a file write/delete isn't
done until it's physically on the disk and some logs on another...
Kind of like some sync-flag was set. (maybe it even is - hardcoded)

Googling for the symptom I came across suggestions to mount a separate
filesystem over the target folder, and when it's time to clean up, then
just umount,mkfs,mount the partition. Those people must have been quite
unhappy with their filesystem performance, too, to suggest that.
 
R

Roedy Green

Other timing experiments, I've made meanwhile, have shown that the
large directory itself doesn't seem to be the problem.

I remember back in the days of DOS when a customer complained a
machine I had built for him was too slow. He put 12,000 files about
horse racing in one directory. Back in the day, directory searches
were linear.

I have been working with the HTMLValidator people to speed up their
program by tracking when files were last verified as error free. They
were reluctant to use a HashMap, being C++ programmers. I suggested
using a tree of tiny disk files instead. They liked the simplicity of
the idea. However, I was surprised to see they implemented the tree by
converting \ in filenames to _ and storing the tree in one directory.
It is incredibly fast, simply because it avoids revalidating anything
it knows is already good. I wondered which actually would be faster,
tree or their system. Has anyone ever experimented or have thoughts
on the matter?

A tree would be more compact, and hence more likely to be cached and
more localised. I think modern hashed directories do not slow down
with size.
 
R

Roedy Green

Is it possible to compile directly into a .zip/.jar file?

ANT speeds things up incredibly because it loads the various
processors and keeps them in ram, rather the loading them over and
over the way a make script would.
 
A

Arne Vajhøj

It's not like I wanted to squeeze another microsecond out of the
compile time, but more like trying to bypass some brake, that the
system has slammed full on for no good reason.


Unless, maybe, the system believes that a file write/delete isn't
done until it's physically on the disk and some logs on another...
Kind of like some sync-flag was set. (maybe it even is - hardcoded)

That is how it always was in the old days.

But today aggressive caching is quite common.

And it should be fine for a build server.

Arne
 
Ad

Advertisements

A

Arne Vajhøj

OK, though I think that may be true of all Unices since System V, unless
they were very old or BSD. AFAIK all Linuxes used the SysV init until
recently, when Systemd rocked onto the scene. However, what I meant was
was what architecture does it have, e.g. a traditional monolithic kernel,
a microkernel, or a Mach-style message-passing design: from a web search
it appears that its a traditional monolithic kernel architecture.

The widely used label for the modern traditional Unixes are "Monolithic
with dynamically loadable modules".

Arne
 
A

Arne Vajhøj

AFAIK Unices have traditionally only automatically flushed buffers to
disk in three circumstances: (1) if they run out of space in RAM, (2)
when a process terminates all buffers get flushed and (3) output buffers
are flushed if they haven't been written to for some time and contain
unflushed data.

You are right. Unix never subscribed to the "get the data on the plates"
philosophy.
Apart from that you can, of course, always call fflush()
whenever the process logic requires, e.g. if a process periodically
records a restartable checkpoint, it would probably flush buffers for all
open files as part of recording the checkpoint.

AFAIK then fflush only move data from user space to system space.

To get data on the plates you need fsync.

Other OSen may do things differently, e.g. OS/9, the real-time OS, writes
blocks as soon as they are complete and when a file is closed, updating
the file and disk management meta-data at the same time. This is
deliberate: although it makes i/o a lot slower, its very rare indeed to
to loose any data or to see a corrupt file in an OS/9 system unless
you're using a dodgy disk.

Same with VMS.
Not doing much of the above is a main reason for FAT filing systems to be
relatively fragile.

Strictly speaking then the multiple flavors of FAT are file system
formats more than file systems. But yes - MS code from around DOS 4 and
onwards has cached data.

Arne
 
J

Jeff Higgins

The manpages for fflush()m and fsync() could be a little clearer, but I
read them as saying that both functions force a write to disk. The only
difference is the type of file reference argument they take: fflush()
takes a FILE* stream reference while fsync() takes an int file
description.

That is pretty much the same as the difference between read() and fread()
or write() and fwrite().
as an adjunct to the manpages
"The Linux Programming Interface"
13.3 Controlling Kernel Buffering of File I/O
<http://books.google.com/books?id=Ps2SH727eCIC&pg=PA239>
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top