Split number of requests

M

mike

Hi,

I will make a request to an external tool from my java program.It
looks like:

command arg1....argn

The number of arguments can be quite large. The arguments are files
and directories. Let's say that we can have up to 10.000 files and/or
directories. Is it possible to issue the command with a specific
number of arguments until all args have been used. The reason for
doing this is that I dont want to issue the command with 10.000 args.
So my issue is how to split the command issue depending on the number
of args.

All help is greatly appreciated.

br,

//mike


public ArrayList issueCmd(String command,String [] args){


}
 
D

Donkey Hottie

mike said:
Hi,

I will make a request to an external tool from my java
program.It looks like:

command arg1....argn

The number of arguments can be quite large. The arguments
are files and directories. Let's say that we can have up
to 10.000 files and/or directories. Is it possible to
issue the command with a specific number of arguments
until all args have been used. The reason for doing this
is that I dont want to issue the command with 10.000
args. So my issue is how to split the command issue
depending on the number of args.

All help is greatly appreciated.

A common way to handle that has been an argument with @+filename

The file may contain the arguments, and when the application sees an
argument with @ prefix it reads the file.
 
R

Roedy Green

the command with 10.000 args

I would look at some way to put the data in a file, and get the child
to read it.

Originally the command line was limited to 255 chars. I don't know
what modern limits are.
--
Roedy Green Canadian Mind Products
http://mindprod.com

When you can’t find a bug, you are probably looking in the wrong place. When you can’t find your glasses, you don’t keep scanning the same spot because you are convinced that is where you left them.
~ Roedy
 
A

Arne Vajhøj

mike said:
I will make a request to an external tool from my java program.It
looks like:

command arg1....argn

The number of arguments can be quite large. The arguments are files
and directories. Let's say that we can have up to 10.000 files and/or
directories. Is it possible to issue the command with a specific
number of arguments until all args have been used. The reason for
doing this is that I dont want to issue the command with 10.000 args.
So my issue is how to split the command issue depending on the number
of args.

All help is greatly appreciated.
public ArrayList issueCmd(String command,String [] args){


}

I do not think Java itself has any limits on passing arguments
to an external program.

But your OS most likely have some and Java will have the
same limit.

10000 is a lot !

The proposal to use a single argument with the name of a file containing
all the dir and filenames you want to process make sense to me.

Arne
 
T

Tom Anderson

I will make a request to an external tool from my java program.It looks
like:

command arg1....argn

The number of arguments can be quite large. The arguments are files
and directories. Let's say that we can have up to 10.000 files and/or
directories. Is it possible to issue the command with a specific
number of arguments until all args have been used. The reason for
doing this is that I dont want to issue the command with 10.000 args.
So my issue is how to split the command issue depending on the number
of args.

All help is greatly appreciated.

This seems pretty easy to me.

I assume you know how to write the version of this that just sends all the
arguments at once:
public ArrayList issueCmd(String command,String [] args){


}

So then it's just:

private static final int BATCH_SIZE = 100;

public List issueCommandInBatches(String command, String[] args) {
int argsProcessed = 0;
List results = new ArrayList();
while (argsProcessed < args.length) {
int thisBatchSize = Math.min((args.length - argsProcessed), BATCH_SIZE);
String[] batchArgs = new String[thisBatchSize];
System.arraycopy(args, argsProcessed, batchArgs, 0, thisBatchSize);
List batchResults = issueCmd(command, batchArgs);
results.addAll(batchResults);
argsProcessed += thisBatchSize;
}
return results;
}

tom
 
T

Tom Anderson

I would look at some way to put the data in a file, and get the child
to read it.

Originally the command line was limited to 255 chars. I don't know what
modern limits are.

On unix, it's called ARG_MAX, and you can find it out with sysconf:

$ getconf ARG_MAX
262144

So, a quarter of a megabyte on stock OS X 10.4.

I have a feeling there are other limits somewhere in the chain, like in
the shell or something. This webpage is informative, but too dull for me
to actually read properly:

http://www.in-ulm.de/~mascheck/various/argmax/

tom
 
M

mike

I would look at some way to put the data in a file, and get the child
to read it.

But how can the command issued in java using Runtime.exec() ? Am I
going to execute the command once for each arg in the file?

br,

//mike
 
A

Arved Sandstrom

mike said:
But how can the command issued in java using Runtime.exec() ? Am I
going to execute the command once for each arg in the file?
[ SNIP ]

I don't know what kind of OS you are using. On UNIX or UNIX-like
systems, one approach to this problem is using "xargs". When you obtain
your Process it may look something like this:

Process proc = Runtime.getRuntime().exec("xargs wc");

where you obviously substitute "wc" for the actual command that you wish
to run on each file.

You already have your list of filenames (doesn't matter if it's an array
or a List). Obtain the output stream from the Process, wrap it in a
PrintWriter, and println each filename to the output in a loop. xargs
expects to get its list of filenames from stdin, after all. E.g.

OutputStream os = proc.getOutputStream();
PrintWriter pw = new PrintWriter(os, true);

for (int f = 0; f < files.length; f++) {
pw.println(files[f].getPath());
}
pw.close();

Handle your process input and error streams as you normally would. For
this example printing out each line obtained from the Process input
stream displays the file by file "wc" output...and the "wc" summary.

AHS
 
M

mike

But how can the command issued in java using Runtime.exec() ? Am I
going to execute the command once for each arg in the file?

[ SNIP ]

I don't know what kind of OS you are using.

It is a Eclipse plug-in so it is supposed to run on win and unix.

Is there an xargs on win ( I only have unix)?

On UNIX or UNIX-like
systems, one approach to this problem is using "xargs". When you obtain
your Process it may look something like this:

Process proc = Runtime.getRuntime().exec("xargs wc");

where you obviously substitute "wc" for the actual command that you wish
to run on each file.

You already have your list of filenames (doesn't matter if it's an array
or a List). Obtain the output stream from the Process, wrap it in a
PrintWriter, and println each filename to the output in a loop. xargs
expects to get its list of filenames from stdin, after all. E.g.

OutputStream os = proc.getOutputStream();
PrintWriter pw = new PrintWriter(os, true);

for (int f = 0; f < files.length; f++) {
     pw.println(files[f].getPath());}

pw.close();

The above idea sounds great if I could get it to work on both
platforms.
Handle your process input and error streams as you normally would. For
this example printing out each line obtained from the Process input
stream displays the file by file "wc" output...and the "wc" summary.

AHS

//mike
 
J

John B. Matthews

mike said:
mike wrote: [...]
Process proc = Runtime.getRuntime().exec("xargs wc");

Like AHS, I'd prefer the xargs approach; but you'd have to see about
Windows. If needed, Cygwin <http://www.cygwin.com/> can provide an
implementation.

[...]
You already have your list of filenames (doesn't matter if it's an array
or a List). Obtain the output stream from the Process, wrap it in a
PrintWriter, and println each filename to the output in a loop. xargs
expects to get its list of filenames from stdin, after all. E.g.

OutputStream os = proc.getOutputStream();
PrintWriter pw = new PrintWriter(os, true);

for (int f = 0; f < files.length; f++) {
     pw.println(files[f].getPath());}

pw.close();

The above idea sounds great if I could get it to work on both
platforms.

Alternatively, ProcessBuilder is designed to ease the burden of repeated
executions. In particular, "This constructor does not make a copy of the
command list," so you can modify it to include some sublist of your full
list before each call to start():

<http://java.sun.com/javase/6/docs/api/java/lang/ProcessBuilder.html>
 
A

Arved Sandstrom

Thomas said:
If the problem is in the size of the list of arguments, xargs shifts but
does not always solve the problem. What xargs does is that it reads the
arguments, and then invokes the command with the read arguments. If the
list of arguments is so huge that it does not fit in a single command
invocation (due to an operating system limit -- speaking of which, the
limit is quite low on Windows system, something like 32 KB), then xargs
will split the list into several chunks, and invoke the command _several
times_.

Depending on what the invocation command does, then this is either
harmless or critically wrong. For instance, imagine calling a linker on
a bunch of object files. Trying to link with only some of the object
files will fail. (For this specific reason, the linker which comes with
Visual C++ accepts to be given the list of object file names to link as
a text file.)

What xargs does, Java can: all you have to do is to perform the multiple
calls by yourself. xargs has the advantage of "knowing" the system limit
and also to access the arguments in their system-dependant raw format
(from the Java point of view, they are String instances), but using
xargs is itself restrictive: it works only on Unix systems and systems
which try to look like a Unix system. A plain Windows has no xargs.
Also, xargs has trouble with file names and paths which include
whitespace or even, God forbid, embedded newline characters. The GNU
xargs has the non-standard '-0' option to use the NUL character (the
character of value 0) as unique separator, but relying on a specific
version of xargs is even more restrictive than relying on just any
xargs.

On a conceptual point of view, when you use xargs from Java, what
happens is that the Java program carefully encodes the arguments (a long
sequence of strings) into a single byte stream, just so that xargs
immediately performs the opposite transformation. I cannot help but
finding this process somewhat suboptimal.

--Thomas Pornin

No real arguments from me about any of that, although I may interpret
them differently. As you point out, one assumption when using xargs (or
using find -exec, or using a loop) is that invoking a command on an
argument list gives you the same results as invoking the command
multiple times, once for each argument in the list. If that's not the
case then you need a different solution.

As for the other observations, I look at it like this: if you're using
Runtime.exec you're already acknowledging that Java can't do the job
nearly as well as the underlying environment can, and so you're
accessing a specific OS. I wouldn't expect to be able to do things the
same way on all OS's. And for this specific case, I wouldn't be
determining the monstrous list of filename arguments in Java either.

AHS
 
M

Martin Gregorie

No real arguments from me about any of that, although I may interpret
them differently. As you point out, one assumption when using xargs (or
using find -exec, or using a loop) is that invoking a command on an
argument list gives you the same results as invoking the command
multiple times, once for each argument in the list. If that's not the
case then you need a different solution.

As for the other observations, I look at it like this: if you're using
Runtime.exec you're already acknowledging that Java can't do the job
nearly as well as the underlying environment can, and so you're
accessing a specific OS. I wouldn't expect to be able to do things the
same way on all OS's. And for this specific case, I wouldn't be
determining the monstrous list of filename arguments in Java either.
If this is a case where you sometimes want a single invocation of a
program to consume a relatively short list of arguments from the command
line and sometimes to handle a gynormous list that is best placed in a
file, you can do a lot worse than implement an idea used by Microware's
OS-9 operating system in almost all its utilities expect to handle a list
of arguments:

- if the argument list is in the command line, it is scanned and
the list of arguments is processed.

- if a -z option is present, the arguments are read, one per line,
from stdin and processed.

- if the -z option is present and takes a value, written as -z=name
the file whose pathname is 'name' is opened and the arguments read,
one per line, from it and processed.

This is not difficult to implement in Java and makes for a very flexible
interface. I believe this is compatible with most common operating
systems.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top