Matching escaped delimiter chars

W

Witold Rugowski

Hi!
I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))

Let's take string such:
AAAAAAA "blah blah \" blah blah" BBBBBB

How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
blah blah \" blah blah

How it can be done????

Best regards,
Witold Rugowski
 
P

Paul Lalli

Witold said:
I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))

Let's take string such:
AAAAAAA "blah blah \" blah blah" BBBBBB

How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
blah blah \" blah blah

How it can be done????

It's already been done. Don't reinvent wheels.

http://search.cpan.org/~abigail/Regexp-Common-2.120/lib/Regexp/Common/delimited.pm

Paul Lalli
 
P

Paul Lalli

Paul said:

For the heck of it, an example of the above module's usage:

#!/usr/bin/perl
use strict;
use warnings;

use Regexp::Common qw/delimited/;

$_ = 'AAAAAAA "blah blah \" blah blah" BBBBBB';
if (/$RE{delimited}{-delim=>'"'}{-keep}/){
print "Found: $1\n";
print "Without quotes: $3\n";
}
__END__

Found: "blah blah \" blah blah"
Without quotes: blah blah \" blah blah
 
S

Sherm Pendley

Witold Rugowski said:
Let's take string such:
AAAAAAA "blah blah \" blah blah" BBBBBB

How to match all what is in between quotes, not counting escaped quote.
In this case it should match to:
blah blah \" blah blah

How it can be done????

A very simplistic way of doing that would be to use a zero-width look-behind,
so that the end of the match is (in English) "a quote character that's not
immediately preceded by a backslash." In Perl:

#!/usr/bin/perl

use strict;
use warnings;

my $string = 'AAAAAAA "blah blah \" blah blah" BBBBBB';

if ($string =~ /\"(.*)(?<!\\)\"/) {
print $1, "\n";
}

As I said though, that's a pretty simplistic way of doing it. It works for
the specific example given, but may not work with real-world data. It does
not handle, for example, the case where the string you're interested in ends
in a backslash which is escaped - i.e. blah\\".

The special cases and "what if" scenarios like the above can get out of hand
rather quickly. For production use, I'd use something like the Text::Balanced
module on CPAN, or even a full-blown parser using Parse::RecDescent.

sherm--
 
J

James

Witold said:
Hi!
I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))

Let's take string such:
AAAAAAA "blah blah \" blah blah" BBBBBB

How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
blah blah \" blah blah

How it can be done????

Best regards,
Witold Rugowski

$ cat prog.pl
#!/bin/env perl -w
$_ = 'AAAAAAA "blah blah \" blah blah" BBBBBB';
print $& if /(?<=\").+\\\".+(?=\")/;

$ ./prog.pl
blah blah \" blah blah

James
 
X

xicheng

This is a very typical problem solved in "Mastering Regular
Expressions" (by Jeffrey Friedl). The regex for this kind of problems
is:

opening_normal*(special_normal*)*closing

where for you:
opening: \"
normal : [^"\\]
special : \\"
closing : \"
the underscore _ here means seamless connection between two parts.
So if I put it this way:
 
R

robic0

Hi!
I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))

Let's take string such:
AAAAAAA "blah blah \" blah blah" BBBBBB

How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
blah blah \" blah blah

How it can be done????

Best regards,
Witold Rugowski

my @regx_esc_codes = ( "\\", '/', '(', ')', '[', ']', '?', '|',
'+', '.', '*', '$', '^', '{', '}', '@' );

my $funky_string = 'AAAAAAA "blah blah \" blah blah" BBBBBB';
my $match_string = 'your_logic_here';
for (@regx_esc_codes)
{
my $tc = $_;
# code template for regex
my $xx = "\$match_string =~ s/\\$tc/\\\\\\$tc/g;";
eval $xx;
#print "$xx\n";
}

## match should be ready now,
## be sure to trap regex violations

$fnd = 0;
# -- #
my $ctmpl = "if (\$funky_string =~ /$match_string/ {\$fnd = 1;}";
eval $ctmpl;
# -- #

if ($@) {
## Check the $ctmpl, get the control code, log this error as a
code issue.
## This shouldn't happen ... the compiler will show the escape
char, add
## the char to "@regx_esc_codes", now its fixed!
$@ =~ s/^[\x20\n\t]+//; $@ =~ s/[\x20\n\t]+$//;
print $@,"\n";
exit;
}

Note - I may be wrong in this context, its been a while
 
W

Wade Whitaker

Witold said:
Hi!
I want to match with regexp substring, which is delimited by, let's say
". It is trivial, but I don't know how to match escaped quotes with \.
OK, example will be better ;-))

Let's take string such:
AAAAAAA "blah blah \" blah blah" BBBBBB

How to match all what is in between quotes, not counting escaped quote.
In this case it should match to:
blah blah \" blah blah

How it can be done????

Best regards,
Witold Rugowski

Try:

/"(|.*?[^\\])"/ or /("(?|.*?[^\\])")/

This says try the null case first. i.e. ""
Then if it is not null slurp 0 characters and one more that is not a
backslash, which you can because it is not a null case. Then check for ".
Then try slurp 1 char and one more that is not a backslash. etc. The Idea is
that after checking for null first you can slurp past a backslash which will
always put the following " in the [^\\] position so you can't stop on it.

See what a year of learning can get you.

Regards,

Wade
 
W

Wade Whitaker

Wade said:
Witold said:
Hi!
I want to match with regexp substring, which is delimited by, let's
say ". It is trivial, but I don't know how to match escaped quotes
with \. OK, example will be better ;-))

Let's take string such:
AAAAAAA "blah blah \" blah blah" BBBBBB

How to match all what is in between quotes, not counting escaped
quote. In this case it should match to:
blah blah \" blah blah

How it can be done????

Best regards,
Witold Rugowski


Try:

/"(|.*?[^\\])"/ or /("(?|.*?[^\\])")/

This says try the null case first. i.e. ""
Then if it is not null slurp 0 characters and one more that is not a
backslash, which you can because it is not a null case. Then check for ".
Then try slurp 1 char and one more that is not a backslash. etc. The
Idea is that after checking for null first you can slurp past a
backslash which will always put the following " in the [^\\] position so
you can't stop on it.

See what a year of learning can get you.

Regards,

Wade
second one should have been /("(?:|.*?[^\\])")/

Thats what I get for just typing it in. :)

Regards,

Wade
 
R

robic0

Hi!
I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))

Let's take string such:
AAAAAAA "blah blah \" blah blah" BBBBBB

How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
blah blah \" blah blah

How it can be done????

Best regards,
Witold Rugowski

my @regx_esc_codes = ( "\\", '/', '(', ')', '[', ']', '?', '|',
'+', '.', '*', '$', '^', '{', '}', '@' );

my $funky_string = 'AAAAAAA "blah blah \" blah blah" BBBBBB';
my $match_string = 'your_logic_here';
for (@regx_esc_codes)
{
my $tc = $_;
# code template for regex
my $xx = "\$match_string =~ s/\\$tc/\\\\\\$tc/g;";
eval $xx;
#print "$xx\n";
}

## match should be ready now,
## be sure to trap regex violations

$fnd = 0;
# -- #
my $ctmpl = "if (\$funky_string =~ /$match_string/ {\$fnd = 1;}";
eval $ctmpl;
# -- #

if ($@) {
## Check the $ctmpl, get the control code, log this error as a
code issue.
## This shouldn't happen ... the compiler will show the escape
char, add
## the char to "@regx_esc_codes", now its fixed!
$@ =~ s/^[\x20\n\t]+//; $@ =~ s/[\x20\n\t]+$//;
print $@,"\n";
exit;
}

Note - I may be wrong in this context, its been a while
Oh yeah, this is how its done et all. This is what works when
qr// doesen't. Incase you don't think it works. Consider that
this puppy will handle any variable you can read in from
unknown string content.
Now how much is that worth? Where's the regulars now?
Bunch of blow hard puffs....
 
R

robic0

Hi!
I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))

Let's take string such:
AAAAAAA "blah blah \" blah blah" BBBBBB

How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
blah blah \" blah blah

How it can be done????

Best regards,
Witold Rugowski

my @regx_esc_codes = ( "\\", '/', '(', ')', '[', ']', '?', '|',
'+', '.', '*', '$', '^', '{', '}', '@' );

my $funky_string = 'AAAAAAA "blah blah \" blah blah" BBBBBB';
my $match_string = 'your_logic_here';
for (@regx_esc_codes)
{
my $tc = $_;
# code template for regex
my $xx = "\$match_string =~ s/\\$tc/\\\\\\$tc/g;";
eval $xx;
#print "$xx\n";
}

## match should be ready now,
## be sure to trap regex violations

$fnd = 0;
# -- #
my $ctmpl = "if (\$funky_string =~ /$match_string/ {\$fnd = 1;}";
eval $ctmpl;
# -- #

if ($@) {
## Check the $ctmpl, get the control code, log this error as a
code issue.
## This shouldn't happen ... the compiler will show the escape
char, add
## the char to "@regx_esc_codes", now its fixed!
$@ =~ s/^[\x20\n\t]+//; $@ =~ s/[\x20\n\t]+$//;
print $@,"\n";
exit;
}

Note - I may be wrong in this context, its been a while
Oh yeah, this is how its done et all. This is what works when
qr// doesen't. Incase you don't think it works. Consider that
this puppy will handle any variable you can read in from
unknown string content.
Now how much is that worth? Where's the regulars now?
Bunch of blow hard puffs....

Incase you don't understand, "your_logic_here" could be a
unknown, variable subset string of that which you are looking for.
I believe I could write a book on it.
 
R

robic0

Witold said:
Hi!
I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))

Let's take string such:
AAAAAAA "blah blah \" blah blah" BBBBBB

How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
blah blah \" blah blah

How it can be done????

Best regards,
Witold Rugowski

A little late, (ignore rant on regex escap codes,
that is for escaping a match string with purely unintended
random escape codes not used for logic.)

yet another way:

use strict;
my $uuu = 'AAAAAAA "blah blah \" blah\"blah" BBBBBB';
print $uuu,"\n";
my $string = '';
while ($uuu =~ s/\"(.*\\\")|\"(.*)\"/\"/) {$string .= $1.$2;}
print $string,"\n";

__END__

output:

AAAAAAA "blah blah \" blah\"blah" BBBBBB
blah blah \" blah\"blah
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top