FAQ 6.9 How can I quote a variable to use in a regex?

P

PerlFAQ Server

This is an excerpt from the latest version perlfaq6.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

6.9: How can I quote a variable to use in a regex?

The Perl parser will expand $variable and @variable references in
regular expressions unless the delimiter is a single quote. Remember,
too, that the right-hand side of a "s///" substitution is considered a
double-quoted string (see perlop for more details). Remember also that
any regex special characters will be acted on unless you precede the
substitution with \Q. Here's an example:

$string = "Placido P. Octopus";
$regex = "P.";

$string =~ s/$regex/Polyp/;
# $string is now "Polypacido P. Octopus"

Because "." is special in regular expressions, and can match any single
character, the regex "P." here has matched the <Pl> in the original
string.

To escape the special meaning of ".", we use "\Q":

$string = "Placido P. Octopus";
$regex = "P.";

$string =~ s/\Q$regex/Polyp/;
# $string is now "Placido Polyp Octopus"

The use of "\Q" causes the <.> in the regex to be treated as a regular
character, so that "P." matches a "P" followed by a dot.



--------------------------------------------------------------------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
 
C

ccc31807

6.9: How can I quote a variable to use in a regex?

I have applications that process files in a remote directory that I
get using SCP. The files that I process are named like
'USA_20110412.txt' with each day's file having the date. Sometimes
they miss a day, and the next day I'll pick up two files from the
directory. I pick up my files with a file glob, like this: 'USA_*.txt'
and it's worked for five or six years without a hitch.

This application runs for a number of different kinds of files with
different prefixes, so I'll get files like 'USA_*.txt', 'USB_*.txt',
'USC_*.txt', etc. I use different rules for processing each kind of
file.

Users have complained from time to time about having to process
multiple files individually, so this morning I decided to fix this and
put the common code in a module and pass the function the file glob to
the function. (The variable $CONFIG{GLOB_A} contains a string like
'USA_*.txt'.) I call the function like this:

COMMON::process_files($CONFIG{glob};

and COMMON.pm contains this function definition

starts like this

sub process_files
{
my $glob = shift;
...
do_something if $file =~ /$glob/;
...
}

Guess what, guys? It didn't work! Unfortunately, the OS file glob
'USA_*.txt' should have been given to the regex as 'USA_.*txt' (notice
that the dot and the star have swapped places).

Question: is there any way I can use the OS file glob in a regex
without changing it? Can I put my file descriptor in a variable and
pass it reliably to the regex?

CC.
 
U

Uri Guttman

c> Guess what, guys? It didn't work! Unfortunately, the OS file glob
c> 'USA_*.txt' should have been given to the regex as 'USA_.*txt' (notice
c> that the dot and the star have swapped places).

you don't have it right either way. . is not a meta char in globs, it
matches . (since . is a common part separator in file names). * in globs
matches 0 or more char in the current spot - it does not matter the char
to the left.

in regexes, . matches any one char and * matches 0 or more of the char
to its left.

so just swapping * and . makes no sense.

c> Question: is there any way I can use the OS file glob in a regex
c> without changing it? Can I put my file descriptor in a variable and
c> pass it reliably to the regex?

there is no direct way to use a glob pattern in a regex. there may be a
module that can convert it for you. the simplest solution is to replace
.. with \. and * with .* (in that order). that will handle those two
chars. there are other glob things you may need but most users don't
know them.

uri
 
B

brian d foy

ccc31807 said:
Question: is there any way I can use the OS file glob in a regex
without changing it? Can I put my file descriptor in a variable and
pass it reliably to the regex?

Isn't that what glob() is for?
 
C

ccc31807

so just swapping * and . makes no sense.

As a file glob, 'USA_*.txt' gets all the files that begin with 'USA_'
and end with '.txt' with (perhaps) a few characters in between.

As a regular expression, /USA_.*txt/ matches a string that contains
the literal characters 'USA_', followed by zero or more characters,
followed by 'txt'.

As a matter of fact, swapping the dot and star actually didn't work in
my application (due to other reasons not material here). I solved the
problem by assigning all the literal characters before the star to a
variable, and then matched the variable. The fact that the file name
ends in 'txt' doesn't matter.

I would have liked to use the same value both as a file glob for the
purpose of getting the file (which I do by running pscp as an external
process) and as a regular expression to process just the files I need,
but in the end it doesn't matter -- except maybe as a reminder that
the file glob syntax and the regex syntax isn't identical.

CC.
 
U

Uri Guttman

c> As a file glob, 'USA_*.txt' gets all the files that begin with 'USA_'
c> and end with '.txt' with (perhaps) a few characters in between.

please don't tell me how globs work.

c> As a regular expression, /USA_.*txt/ matches a string that contains
c> the literal characters 'USA_', followed by zero or more characters,
c> followed by 'txt'.

please don't tell me how regexes work.


c> As a matter of fact, swapping the dot and star actually didn't work in
c> my application (due to other reasons not material here). I solved the
c> problem by assigning all the literal characters before the star to a
c> variable, and then matched the variable. The fact that the file name
c> ends in 'txt' doesn't matter.

it wouldn't work under any circumstances. regexes are not globs.

the regex will match USA_txt without the . which you should have. the
glob MUST have a . matched. they are different patterns.

c> I would have liked to use the same value both as a file glob for the
c> purpose of getting the file (which I do by running pscp as an external
c> process) and as a regular expression to process just the files I need,
c> but in the end it doesn't matter -- except maybe as a reminder that
c> the file glob syntax and the regex syntax isn't identical.

did you see what i said about a proper way to fix it? i don't think
so. swapping is wrong on many levels. my solution is correct on all
levels.

uri
 
W

Willem

brian d foy wrote:
) In article
)<8e4f1b32-f977-4dac-8c83-101709aa95a1@r13g2000yqk.googlegroups.com>,
)
)> Question: is there any way I can use the OS file glob in a regex
)> without changing it? Can I put my file descriptor in a variable and
)> pass it reliably to the regex?
)
) Isn't that what glob() is for?

Wouldn't it be nice if there were a glob() that worked on strings ?


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
I

Ilya Zakharevich

Wouldn't it be nice if there were a glob() that worked on strings ?

Did you actually inspect File::Glob before doing your wishful
thinking? (You claim you do not need learning how glob() operates,
right?)

Ilya
 
I

Ilya Zakharevich

Did you actually inspect File::Glob before doing your wishful
thinking? (You claim you do not need learning how glob() operates,
right?)

My apologies - in the last sentence I mixed you up with someone else. :-(

Yours,
Ilya
 
C

ccc31807

We need a reminder that the funny characters in different languages
mean different things?

Unfortunately, sometimes we do, especially when we don't concentrate
on the particular tool currently in use.

I use vi, emacs, and Textpad for different things, and sometimes I
have all three open and in use at the same time, and sometimes I type
a Ctl-p in vi, or a :x in Textpad, or a Ctl-v in emacs, and
momentarily wonder what's wrong with the command. I'm just not good
enough to juggle three balls at the same time. Hell, I'm often not
good enough to juggle ONE ball at the same time!

CC.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top