Regexes on the command line

R

Roedy Green

Has anyone ever tackled the problem of specifying regexes on the
command line?

It seems you might need three levels of quoting, for regex, for Java
strings and for magic command line characters.

Do you handle it by insisting the regex expression live is a separate
file?
--
Roedy Green Canadian Mind Products
http://mindprod.com
"Man has been endowed with reason, with the power to create, so that he can add
to what he’s been given. But up to now he hasn’t been a creator, only a destroyer.
Forests keep disappearing, rivers dry up, wild lifes become extinct,
the climate’s ruined and the land grows poorer and uglier every day."
~ Anton Chekhov (born: 1860-01-29 died: 1904-07-15 at age: 44)
 
T

Tom McGlynn

Has anyone ever tackled the problem of specifying regexes on the
command line?

It seems you might need three levels of quoting, for regex, for Java
strings and for magic command line characters.

Do you handle it by insisting the regex expression live is a separate
file?

I have done this kind of thing but mostly on Linux where I can just
use the bash shell's single quotes. I find the problem is not in
counting the backslashes, they do not proliferate any further than
usual, but in accommodating what is typically two levels of shell
escaping. E.g., typically, I want to be able to say

match 'regex' 'matching string'

where match is a little shell script

#!/bin/bash
java -cp ... somepackage.Match $*

When the java command inside the script executes it reanalyzes the
command string and since the single quotes are gone it misinterprets
the special characters. There are a number of ways to address this.
You can use aliases rather than scripts, but that's a bit limiting.
You could use Perl and do a multi-argument system call (I presume
there
are equivalents in Python, Ruby and elsewhere) which doesn't re-
analyze the arguments. What I've tended to do is to analyze the input
arguments and regenerate the command line in the script requoting any
special characters.

Similar issues come up in a number of contexts, e.g., sending SQL or
mathematical expressions to commands or anything that may have spaces
in inputs.

Regards,
Tom McGlynn
 
T

Tom Anderson

I have done this kind of thing but mostly on Linux where I can just use
the bash shell's single quotes. I find the problem is not in counting
the backslashes, they do not proliferate any further than usual, but in
accommodating what is typically two levels of shell escaping. E.g.,
typically, I want to be able to say

match 'regex' 'matching string'

where match is a little shell script

#!/bin/bash
java -cp ... somepackage.Match $*

When the java command inside the script executes it reanalyzes the
command string and since the single quotes are gone it misinterprets
the special characters. There are a number of ways to address this.
You can use aliases rather than scripts, but that's a bit limiting.
You could use Perl and do a multi-argument system call (I presume
there
are equivalents in Python, Ruby and elsewhere) which doesn't re-
analyze the arguments. What I've tended to do is to analyze the input
arguments and regenerate the command line in the script requoting any
special characters.

java Match '$*' doesn't do it for you? Or better yet, IMHO, '$@'.

tom
 
T

Tom McGlynn

java Match '$*' doesn't do it for you? Or better yet, IMHO, '$@'.

tom

Thanks... That's not quite right but I think you just misremembered
it.
It should be
java Match "$@"
Once you pointed me to the proper variable I was able to see it in the
bash documentation. I'm a little worried about where the expansion to
the argument list occurs though. It seems to be after variable
substitution, but I have checked for history and such and I haven't
had time to check if the documentation specifies the order of commmand
line substitutions.

In any case it's a much nicer way of doing it than the contrivances
I've used before!

Regards,
Tom
 
T

Tom McGlynn

Thanks... That's not quite right but I think you just misremembered
it.
It should be
java Match "$@"
Once you pointed me to the proper variable I was able to see it in the
bash documentation.

I should probably have phrased this a little more cautiously but I was
in a hurry. The other Tom clearly is more familiar than I with this,
but at last on my machine '$*' and '$@' are not expanded by bash.
Possibly they are in some of the other shells. "$*" expands as a
single argument, so it doesn't quite do what I typically want. But
"$@" is very very nice!


Regards,
Tom McGlynn
 
T

Tom Anderson

I should probably have phrased this a little more cautiously but I was
in a hurry. The other Tom clearly is more familiar than I with this,
but at last on my machine '$*' and '$@' are not expanded by bash.
Possibly they are in some of the other shells.

Argh! No, you're quite right, and i wasn't thinking straight. I had the
idea that they'd both be expanded, but with different amounts of expansion
of shell variables in them, but of course neither of them do any expansion
of anything inside the string. '' just quotes, and "" expands the variable
into a quoted string.

The funny thing is that i've spent the last week or two doing almost
nothing but shell, so you'd have thought i'd be on top of this. Maybe i've
gone snowblind instead.
"$*" expands as a single argument, so it doesn't quite do what I
typically want. But "$@" is very very nice!

"$@" is a fine invention. I'm not really sure what the use of "$*" is!

tom
 
M

Martin Gregorie

Argh! No, you're quite right, and i wasn't thinking straight. I had the
idea that they'd both be expanded, but with different amounts of
expansion of shell variables in them, but of course neither of them do
any expansion of anything inside the string. '' just quotes, and ""
expands the variable into a quoted string.

The funny thing is that i've spent the last week or two doing almost
nothing but shell, so you'd have thought i'd be on top of this. Maybe
i've gone snowblind instead.


"$@" is a fine invention. I'm not really sure what the use of "$*" is!
It can be useful if, after expansion its re-expanded, e.g. used to pass
all the script arguments into a function which is going to reinterpret
it. Passing the arguments as a single parameter will probably simplify
the preliminary handling.

It's also the way to go if you need to use argument separators other than
a space, e.g. IFS=","; commalist="$*" - that sort of use is not all that
common, but it could get you out of a hole one day.
 
T

Tom Anderson

It can be useful if, after expansion its re-expanded, e.g. used to pass
all the script arguments into a function which is going to reinterpret
it. Passing the arguments as a single parameter will probably simplify
the preliminary handling.

True. I guess i think writing a function which is going to do that is
unusual - that would put the whole command line in $1 in the function.
When would you want that?
It's also the way to go if you need to use argument separators other
than a space, e.g. IFS=","; commalist="$*" - that sort of use is not all
that common, but it could get you out of a hole one day.

Oh, that's nice. That's really nice. Useful for manipulating PATH,
classpaths, etc.

Thanks for the reminder about IFS; i vaguely knew about it, but have never
actually used it. I always find it awkward when writing code to loop over
all the entries in $PATH or something, because it's comma-separated - i
usually send it to tr to make it space-separated, then loop over that, but
IFS is a much better solution.

tom
 
R

Roedy Green

Has anyone ever tackled the problem of specifying regexes on the
command line?

I occurred to me I would have to write docs on how to tunnel through
the quoting of all the major command processors, and throw users of
others to the wolves.

So I punted. I decided to use @regexisinthisfile.txt style parms for
regexes.

I allow you to split the regexes over multiple lines without penalty.

You have one LESS level of quoting that you would writing Java code.

The problem with that is \uxxxx support will most likely be gone too.
That is not a big problem, since funny characters don't need quoting,
and they are more readable without it.

The code is written. I have some testing to do, and documenting how
the regexes work.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"If everyone lived the way people do in Vancouver,
we would need three more entire planets to support us."
~ Guy Dauncey (born: 1948 age: 61)
 
M

Martin Gregorie

True. I guess i think writing a function which is going to do that is
unusual - that would put the whole command line in $1 in the function.
When would you want that?


Oh, that's nice. That's really nice. Useful for manipulating PATH,
classpaths, etc.

Thanks for the reminder about IFS; i vaguely knew about it, but have
never actually used it. I always find it awkward when writing code to
loop over all the entries in $PATH or something, because it's
comma-separated - i usually send it to tr to make it space-separated,
then loop over that, but IFS is a much better solution.

tom

Actually, I used it today....

Here's the background, just to show this isn't an artificial use of "$*".

I have a Perl script which analyzes the performance of my spam purging
chain. It is normally run without arguments as a user-supplied logwatch
component, so its defaults must be set to allow that AND it must accept
input via STDIN. In this mode it shows the %age of spam for the day and
lists the 10 most used locally defined Spamasassin rules. However, it can
generate other analyses, e.g. to list the rules that didn't fire, and
hence are candidates for removal.

The most useful summary processes all the maillog files to show the top
10 rules as well as any unused rules. So, I wrote a wrapper script for
it. The salient part boils down to these six lines:

args="$*"
if [ -z "$args ]
then
args='-unused'
fi
cat /var/log/maillog* | spamscan $args

Yes, I could have tested $# as non-zero, but I think this is more elegant.
 
T

Tom Anderson

Actually, I used it today....

args="$*"
if [ -z "$args ]
then
args='-unused'
fi
cat /var/log/maillog* | spamscan $args

Yes, I could have tested $# as non-zero, but I think this is more elegant.

You can use -z on $@ in exactly the same way as $*. By using $*, if any of
the args have spaces in them, you're hosed.

That said, i can't find a really clean way of doing this using $@, because
you can't assign $@ to a variable and have it keep its magic property of
being quoted as separate words. You also can't, AFAICT, modify it. The
best i could come up with was:

function dospamscan {
cat /var/log/maillog* | spamscan "$@"
}

if [[ -z "$@" ]]
then
dospamscan '-unused'
else
dospamscan "$@"
fi

Functionally, i humbly submit that this is better than your solution,
because it handles arguments containing spaces. In terms of readability,
though, it's not as good.

tom
 
J

John B. Matthews

Tom Anderson said:
On Thu, 26 Feb 2009 23:32:39 +0000, Tom Anderson wrote:

"$@" is a fine invention. I'm not really sure what the use of "$*" is!

Actually, I used it today....

args="$*"
if [ -z "$args ]
then
args='-unused'
fi
cat /var/log/maillog* | spamscan $args

Yes, I could have tested $# as non-zero, but I think this is more elegant.

You can use -z on $@ in exactly the same way as $*. By using $*, if any of
the args have spaces in them, you're hosed.

That said, i can't find a really clean way of doing this using $@, because
you can't assign $@ to a variable and have it keep its magic property of
being quoted as separate words. You also can't, AFAICT, modify it. The
best i could come up with was:

function dospamscan {
cat /var/log/maillog* | spamscan "$@"
}

if [[ -z "$@" ]]
then
dospamscan '-unused'
else
dospamscan "$@"
fi

Functionally, i humbly submit that this is better than your solution,
because it handles arguments containing spaces. In terms of readability,
though, it's not as good.

Interesting. I habitually use "${@}". I can't recall needing
double-digit positional parameters, but I often concatenate shell
variables and command line text.
 
M

Martin Gregorie

You can use -z on $@ in exactly the same way as $*. By using $*, if any
of the args have spaces in them, you're hosed.
True enough. The original problem never needs spaces in arguments. Just
now I'd thought that passing quoted arguments would preserve included
spaces but a quick test script shows that isn't the case.
Functionally, i humbly submit that this is better than your solution,
because it handles arguments containing spaces.
Good point for the general case.

What I didn't mention, and probably should have, was that my inner script
only accepts options on its command line, which determine the type of
analysis that it carries out.
 

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top