exec problem is JDK 1.7.0_21

A

Arved Sandstrom

Am 23.04.2013 12:48, schrieb Steven Simpson:

I haven't checked your code, but I have written the inverse to
CommandLineToArgv in Java. It's not that hard. Passing the result to
ProcessBuilder works fine - even though that it works is based on the
two silly undocumented facts: (1) ProcessBuilder adds quotes if and only
if the argument doesn't start and end with quotes and (2) ProcessBuilder
doesn't mess with quotes and backslashes inside the arguments.

The problem with Java's ProcessBuilder is, that people use it "willy
nilly" being completely oblivous about the fact that it doesn't do what
might be best for them. (Recall the case "c:\\program files\\", "world"
where the two strings are basically merged to "c:\\program files\"
world". This is probably the scariest.)
Secondly, there is no way to pass the arguments correctly by only
relying on documented stuff.
[ SNIP ]

That's the point a number of us have been making, Sven. In the case of
ProcessBuilder and Process you jump to no conclusions, and test your
specific case. Operative word being "test", as in unit test, functional
test, integration test, system test, UAT.

AHS
 
L

Lew

Steven said:
Also, just because a class is final now, it might not be in the future.

Not applicable to 'java.lang.String'.

It will never be non-final.
It's harder to make an already published class final than to make a
final one non-final, because a currently final class won't have any
derivations that could be affected by its change of finality.

You cannot speak in such general terms about 'java.lang.String'.
<potentially-dodgy-generalization>Finality is not final; non-finality
is.</potentially-dodgy-generalization> That said, it's not likely to
happen to String, and maybe there would be other consequences that keep
it unlikely. Nevertheless, my code will remain neutral on this matter,
whatever happens.

You're solving a non-existent problem. The finality and immutability of 'String'
are wired into the promises of the Java language and the type's special treatment by
the compiler and runtime.
Furthermore, the class could be re-factored to use (say) CharSequence.
By sticking to the principle that is correct whether String or
CharSequence is used, I worry less about the specifics of the type in
use as I make the change.

Whatever floats your boat. But defensive programming against scenarios that
won't occur is not really a best practice.

Yes, you can predict that certain scenarios will not occur.
 
S

Steven Simpson

Steven said:
Also, just because a class is final now, it might not be in the future.
Not applicable to 'java.lang.String'.

It will never be non-final.
[...]

You're solving a non-existent problem.

I concede that my arguments given so far are not compelling.
The finality and immutability of 'String'
are wired into the promises of the Java language and the type's special treatment by
the compiler and runtime.

A particularly good point that I hadn't considered.

But defensive programming against scenarios that
won't occur is not really a best practice.

Perhaps I now have a more compelling argument. Consider:

import java.util.List;

class ExtendsTest {
// Two methods with identical functionality
static void foo1(List<? extends String> list) { }
static void foo2(List<String> list) { }

interface Bar<T> {
void bar(List<? extends T> list);
}

static class Adapter implements Bar<String> {
public void bar(List<? extends String> list) {
foo1(list);
foo2(list); // error
}
}
}

Suppose foo1 and foo2 may have been written with no generic interface in
mind, perhaps even before Bar was conceived. Bar comes along, and the
functionalities of foo1/foo2 are deemed suitable in meeting Bar.bar's
contract. How does one now adapt foo2 to Bar?
 
S

Steven Simpson

Am 23.04.2013 21:56, schrieb Steven Simpson:
$ java WindowsArgumentGenerator 'c:\program files\\\' 'world'
argv[0]=[c:\program files\\\]
argv[1]=[world]
"c:\program files"\\\ world
That doesn't look right. A correct escaping would be
"c:\program files\\\\\\" world

You have to double the number of backslashes if they preceed a quote.

I took it to mean that such a quote is literal, and should be present in
the string provided by argv. If I were to pass that line to
CommandLineToArgvW, I would expect it either to fail because the first
(and only) argument is quoted, but does not have a closing quote, or
succeed by inferring the missing quote, yielding the following list of
one argument:

c:\program files\\\" world


Also, you have to add another backslash if the quote does not terminate
the argument.

Again, I took it to mean that the first two rules produce the same
result, so \\\" and \\" both produce \".

I haven't looked in detail at it, but do note your comment:
* How decoding works:
* 2n backslashes + quote => n backslashes + closing quote
* 2n+1 backslashes + quote => n backslashes + inner quote
* n backslashes not followed by a quote => n backslashes

From
* 2/n/ backslashes followed by a quotation mark produce /n/
backslashes followed by a quotation mark.
* (2/n/) + 1 backslashes followed by a quotation mark *again*
produce /n/ backslashes followed by a quotation mark.
* /n/ backslashes not followed by a quotation mark simply produce
/n/ backslashes.

Where are you getting the notion that the first two rules imply
different quotes? I interpret both as being literal (inner, right?).
 
S

Sven Köhler

Am 24.04.2013 01:39, schrieb Steven Simpson:
Again, I took it to mean that the first two rules produce the same
result, so \\\" and \\" both produce \".

The documentation of CommandLineToArgv is incomplete. \\" produces a
single backslash, and the quote is the closing quote.
\" and \\\" produce a non-closing quote resp. a backslash followed by a
non-closing quote.
I haven't looked in detail at it, but do note your comment:

From


Where are you getting the notion that the first two rules imply
different quotes? I interpret both as being literal (inner, right?).

Well, by testing. As you can test yourself on the command line, \\" will
result in a closing quote and \" and \\\" will result in non-closing quotes.

Also, there HAS to be a way to distinguish a closing vs. a non-closing
quote. Otherwise you wouldn't know whether the quotes in \" \\" and \\\"
were closing or not. That's it is not documented is just a real shame!


Regards,
Sven
 
S

Sven Köhler

That's the point a number of us have been making, Sven. In the case of
ProcessBuilder and Process you jump to no conclusions, and test your
specific case.

The specific case is broken. Now what?
 
S

Sven Köhler

Actually, on Linux almost all the messing about with strings is done by
the shell that invokes the the command if the strings are enclosed in
double quotes. Use single quotes instead and the strings arrive at the
program ProcessBuilder is running with no modification at all. Here are
the results of using (1) double quoted strings and (2) single quoted
strings.

I don't know what you mean. On Linux (or other UNIX-like operating
systems), you specify the command line parameters as an array of strings
when executing an external program. There is by definition no "messing
about" with strings. Whatever syntax the shell uses to escape quotes and
backslashes is irrelevant to ProcessBuilder, as it doesn't use the shell
to start programs, but rather invokes some flavor of the exec function
directly.


Regards,
Sven
 
S

Steven Simpson

Am 24.04.2013 01:39, schrieb Steven Simpson:
The documentation of CommandLineToArgv is incomplete.
Agreed.

Well, by testing. As you can test yourself on the command line, \\" will
result in a closing quote and \" and \\\" will result in non-closing quotes.

That's not what I'm seeing. In fact, backslashes appear to have no effect!

Cross-compiled with mingw32:

#include <windows.h>
#include <shellapi.h>

#include <stdio.h>
#include <wchar.h>

static void test(const wchar_t *s)
{
printf("\nInput line: %ls\n", s);
int argc;
LPWSTR *argv = CommandLineToArgvW(s, &argc);
if (!argv) {
printf(" unable to parse\n");
} else {
printf(" arg count: %d\n", argc);
for (int i = 0; i < argc; i++)
printf(" [%d]=[%ls]\n", i, argv);
LocalFree(argv);
}
}

int main(void)
{
test(L"foo");
test(L"foo\"");
test(L"foo\\\"");
test(L"foo\\\\\"");
test(L"foo\\\\\\\"");

test(L"\"foo");
test(L"\"foo\"");
test(L"\"foo\\\"");
test(L"\"foo\\\\\"");
test(L"\"foo\\\\\\\"");

test(L"foo bar");
test(L"foo\"bar");
test(L"foo\\\"bar");
test(L"foo\\\\\"bar");
test(L"foo\\\\\\\"bar");

test(L"foo\\ bar");
test(L"foo\\\\ bar");
test(L"foo\\\\\\ bar");

test(L"\"foo bar\"");
test(L"\"foo \\\"bar\"");
test(L"\"foo \\\\\"bar\"");
test(L"\"foo \\\\\\\"bar\"");

return 0;
}




Output:

Input line: foo
arg count: 1
[0]=[foo]

Input line: foo"
arg count: 1
[0]=[foo"]

Okay, so if the quote doesn't start the argument, it's literal.


Input line: foo\"
arg count: 1
[0]=[foo\"]

Input line: foo\\"
arg count: 1
[0]=[foo\\"]

Input line: foo\\\"
arg count: 1
[0]=[foo\\\"]

These backslashes aren't folded in any way.

Input line: "foo
arg count: 1
[0]=[foo]

Input line: "foo"
arg count: 1
[0]=[foo]

So the leading quote is stripped, and the closing quote is optional - no
error.


Input line: "foo\"
arg count: 1
[0]=[foo\]

Input line: "foo\\"
arg count: 1
[0]=[foo\\]

Input line: "foo\\\"
arg count: 1
[0]=[foo\\\]

Inside the quoted argument, backslashes still have no special meaning.
And I can't get a literal quote character.


Input line: foo bar
arg count: 2
[0]=[foo]
[1]=[bar]

So a space splits arguments.


Input line: foo"bar
arg count: 1
[0]=[foo"bar]

The quote inside an unquoted argument is taken as literal.


Input line: foo\"bar
arg count: 1
[0]=[foo\"bar]

Input line: foo\\"bar
arg count: 1
[0]=[foo\\"bar]

Input line: foo\\\"bar
arg count: 1
[0]=[foo\\\"bar]

The backslashes are literal in the middle of the unquoted argument.


Input line: foo\ bar
arg count: 2
[0]=[foo\]
[1]=[bar]

Input line: foo\\ bar
arg count: 2
[0]=[foo\\]
[1]=[bar]

Input line: foo\\\ bar
arg count: 2
[0]=[foo\\\]
[1]=[bar]

Just checking that there's no unspecified way to escape a space.


Input line: "foo bar"
arg count: 1
[0]=[foo bar]

The quotes cause the space to be taken literally.


Input line: "foo \"bar"
arg count: 2
[0]=[foo \]
[1]=[bar]

Input line: "foo \\"bar"
arg count: 2
[0]=[foo \\]
[1]=[bar]

Input line: "foo \\\"bar"
arg count: 2
[0]=[foo \\\]
[1]=[bar]

Backslashes are still literal, and the next quote closes the argument
regardless. Plus, the extra quote after [bar] is not literal.

There seems to be no way to escape a space other than enclosing the
entire argument in quotes. Once inside the quotes, there's no way to
escape anything else (like a literal quote), so there's no way to escape
an argument containing a space and a quote. Backslashes behave
literally everywhere.

Is the program using CommandLineToArgvW incorrectly? Are there any
other input strings to try?


Also, there HAS to be a way to distinguish a closing vs. a non-closing
quote. Otherwise you wouldn't know whether the quotes in \" \\" and \\\"
were closing or not. That's it is not documented is just a real shame!

What I'm seeing now is that general escaping is impossible, not merely
documented badly. :-(
 
S

Sven Köhler

Input line: "foo \"bar"
arg count: 2
[0]=[foo \]
[1]=[bar]

Input line: "foo \\"bar"
arg count: 2
[0]=[foo \\]
[1]=[bar]

Input line: "foo \\\"bar"
arg count: 2
[0]=[foo \\\]
[1]=[bar]

Backslashes are still literal, and the next quote closes the argument
regardless. Plus, the extra quote after [bar] is not literal.

That's completely not what the documentation says. The documentation
clearly states, that 2n or 2n+1 backslashes followed by a quote should
result in backslashes. However, in your cases, not even the number of
backslashes matches the documentation.

Which operating system were you using? I will try to check tonight on
Windows 7 using mingw64. Are you sure, you haven't mixed char and wchar
strings? Have you tried CommandLineToArgvA?


Regards,
Sven
 
N

Nigel Wade

Am 23.04.2013 00:07, schrieb Martin Gregorie:
My testprog works exactly the same when run from within my
TestProcessBuilder test class as it does when run stand-alone from the
command line:

$ java TestProcessBuilder testprog "hello world" "\"hello world\""
"\"hello\" \"world\"" "hello ""double quoted"" world"
argc=5 argv[0]=testprog argv[1]=hello world argv[2]="hello world"
argv[3]="hello" "world"

Please look at the results that Steven posted. If the String "hello\"
\"world" is passed to the ProcessBuilder, the result was:
argv[1]=[hello] argv[2]=[world]
Actually, on Linux almost all the messing about with strings is done by
the shell that invokes the the command if the strings are enclosed in
double quotes.

Only if the command invokes a shell, otherwise there is no shell involved.

The shell is just another executable.
 
S

Steven Simpson

Which operating system were you using?

winver reports "Windows 7 Enterprise". On a VirtualBox VM.
I will try to check tonight on Windows 7 using mingw64. Are you sure,
you haven't mixed char and wchar strings?

I don't think so. If I had, I wouldn't expect anything coherent.
Have you tried CommandLineToArgvA?

Does such a beast exist? Googling, it seems to be wishware.
 
S

Sven Köhler

What is doing the argument splitting here, and where is its spec ?

Here's the spec of what microsoft implements:
[1] http://msdn.microsoft.com/en-us/library/a1y7w461.aspx
Point is that when you start an executable on Windows /you do not pass an array
of strings/ to the OS. You pass a single string (or rather two strings -- one
to name the executable and the other to be the /entire/ command line).
Exactly!

That means that the interpretation of that (single) string as an array of
sub-strings is /entirely/ at the target application's discretion. And hence
entirely arbitrary. Arbitrary in practise I mean, not just in theory. (I
have a micro-application in production that recognises quotes around its first
argument but not around its second ;-)

And hence the current ProcessBuilder behavior is simply broken.
/Some/ applications use the built-in CommandLineToArgvW() function in Windows,

I'm afraid, that has turned out to be false information. Executable
generated by Microsoft compilers seem to use some other code, that
follows the spec given in [1], but CommandLineToArgvW does not follow
this specification. In fact, CommandLineToArgvW doesn't do anything
useful. Let's forget I ever mentioned it.
but most (at least as far as my knowledge goes) do not. And anyway that
function's defined behaviour defies belief (clearly the documentation simply
enshrines the existing behaviour of some seriously stupid code).
ProcessBuilder (if we take the code rather than the documentation as the spec)
is a disgrace (if we just take the doc as the spec then it isn't even that).
Yes.

I have no idea what command-line parser mingw32 provides for (imposes on) the
code it compiles. It /may/ be the same as CommandLineToArgvW() (or even be
implemented using it), but my guess is that the mingw folk will have come up
with something closer to what /bin/sh does in *nix.

I would hope, that mingw32 and mingw64 toolchains have code that is
compatible to [1]. At least my experience with mingw and visual studio
executables is consistent with [1].


Regards,
Sven
 
S

Steven Simpson

What is doing the argument splitting here, and where is its spec ?

Um, the call to CommandLineToArgvW here:

static void test(const wchar_t *s)
{
printf("\nInput line: %ls\n", s);
int argc;
LPWSTR *argv = CommandLineToArgvW(s, &argc);
if (!argv) {
printf(" unable to parse\n");
} else {
printf(" arg count: %d\n", argc);
for (int i = 0; i < argc; i++)
printf(" [%d]=[%ls]\n", i, argv);
LocalFree(argv);
}
}


The program itself takes no arguments. All "command lines" are
hard-coded into the program to avoid any possibility of being mangled by
any hidden parser (whether it's in the invoking shell or in the program
before main() is invoked). Since the command lines are embedded in the
program, the only escaping we need to apply to them is as for C strings,
which is well understood. The program even prints out both the original
command line and the resulting argument list, so we can see both input
and output with even the C-string escaping removed.


The 'spec' for CommandLineToArgvW:

<http://msdn.microsoft.com/en-us/library/windows/desktop/bb776391(v=vs.85).aspx>
 
S

Sven Köhler

Then quite possibly you've hit the end of the road.

Which should imply, that ProcessBuilder is currently broken on Windows.
(Or as you said it: On Windows, ProcessBuilder is a disgrace)

I was just hoping for a comment of Arved, since I really didn't
understand the point of "let's test a specific case" while my point was
"the general idea behind ProcessBuilder is broken, in case the OS is
Windows".
Perhaps you can get around the limits of ProcessBuilder by (say) encoding the
desired command-line in base64 and using a helper application which will decode
it before passing it to Windows, but even with hacks like that, there's no
guarantee that you can pass arbitrary arguments to an arbitrary application.

Passing arbitrary arguments to arbitrary applications is not always
possible on Windows - which is a shame! However: assuming I know
everything about the application I am calling, ProcessBuilder is not a
big help in calling that application as it messes around with the
command line parameters. My point is: on Windows, it shouldn't do so.
Actually, on Windows, it should pass the second element of the String
list as-is - without messing with it - and throw an exception if the
String list contains more than 2 elements.

For people that want escaping compatible with microsoft's default
tokenization, ProcessBuilder should provide some helper function.
ProcessBuilder should also provide some function to determine, what kind
of OS you're dealing with, as the documentation, which contains
statements like "on some operating systems" is not helping _at all_.
Depending on how its command-line parser is written it might be simply
impossible to pass, say, a single double-quote as an argument no matter /what/
string you pass to Windows as the command-line.

That is true. But with ProcessBuilder it is in general impossible to
adapt to whatever command-line parser is implemented. That's just a
shame, considering that ProcessBuilder is not that old.


Regards,
Sven
 
A

Arved Sandstrom

Which should imply, that ProcessBuilder is currently broken on Windows.
(Or as you said it: On Windows, ProcessBuilder is a disgrace)

I was just hoping for a comment of Arved, since I really didn't
understand the point of "let's test a specific case" while my point was
"the general idea behind ProcessBuilder is broken, in case the OS is
Windows".
[ SNIP ]

My comment just meant what it said - identify the specific app on
Windows, the arguments it needs, and do what you need to do with
ProcessBuilder - in a way irrespective of official docs - to make it work.

You got me playing with executables and Windows and Java some, so I am
now stuck into it. :)

AHS
 
S

Sven Köhler

Am 24.04.2013 12:26, schrieb Steven Simpson:
There seems to be no way to escape a space other than enclosing the
entire argument in quotes. Once inside the quotes, there's no way to
escape anything else (like a literal quote), so there's no way to escape
an argument containing a space and a quote. Backslashes behave
literally everywhere.

Is the program using CommandLineToArgvW incorrectly? Are there any
other input strings to try?

I tested
test(L"foo.exe \"ab\\\"c\\\\\" def");
test(L"\"foo.exe\" \"ab\\\"c\\\\\" def");
end then it works. Apparently, the first token (which is the program
name, and can be assumed not to contain any backslashes and quotes) is
treated differently from all subsequent tokens.

So CommandLineToArgvW does allow for escaping quotes within the arguments.


Regards,
Sven
 
S

Steven Simpson

Am 24.04.2013 12:26, schrieb Steven Simpson:
I tested
test(L"foo.exe \"ab\\\"c\\\\\" def");
test(L"\"foo.exe\" \"ab\\\"c\\\\\" def");
end then it works.

Confirmed. I tried all my cases with a prefixed token, and they started
working too.

Apparently, the first token (which is the program
name, and can be assumed not to contain any backslashes and quotes) is
treated differently from all subsequent tokens.

Well, the first token can contain backslashes, but they are taken literally.

So CommandLineToArgvW does allow for escaping quotes within the arguments.

I take it all back.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top