Matching escaped delimiter chars

Discussion in 'Perl Misc' started by Witold Rugowski, Nov 28, 2005.

  1. Hi!
    I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))

    Let's take string such:
    AAAAAAA "blah blah \" blah blah" BBBBBB

    How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
    blah blah \" blah blah

    How it can be done????

    Best regards,
    Witold Rugowski
    Witold Rugowski, Nov 28, 2005
    #1
    1. Advertising

  2. Witold Rugowski

    Paul Lalli Guest

    Witold Rugowski wrote:

    > I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))
    >
    > Let's take string such:
    > AAAAAAA "blah blah \" blah blah" BBBBBB
    >
    > How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
    > blah blah \" blah blah
    >
    > How it can be done????


    It's already been done. Don't reinvent wheels.

    http://search.cpan.org/~abigail/Regexp-Common-2.120/lib/Regexp/Common/delimited.pm

    Paul Lalli
    Paul Lalli, Nov 28, 2005
    #2
    1. Advertising

  3. Witold Rugowski

    Paul Lalli Guest

    Paul Lalli wrote:
    > Witold Rugowski wrote:
    >
    > > I want to match with regexp substring, which is delimited by, let's say "
    > >
    > > Let's take string such:
    > > AAAAAAA "blah blah \" blah blah" BBBBBB
    > >
    > > How to match all what is in between quotes, not counting escaped quote.
    > > In this case it should match to:
    > > blah blah \" blah blah
    > >
    > > How it can be done????

    >
    > It's already been done. Don't reinvent wheels.
    >
    > http://search.cpan.org/~abigail/Regexp-Common-2.120/lib/Regexp/Common/delimited.pm


    For the heck of it, an example of the above module's usage:

    #!/usr/bin/perl
    use strict;
    use warnings;

    use Regexp::Common qw/delimited/;

    $_ = 'AAAAAAA "blah blah \" blah blah" BBBBBB';
    if (/$RE{delimited}{-delim=>'"'}{-keep}/){
    print "Found: $1\n";
    print "Without quotes: $3\n";
    }
    __END__

    Found: "blah blah \" blah blah"
    Without quotes: blah blah \" blah blah
    Paul Lalli, Nov 28, 2005
    #3
  4. Witold Rugowski <> writes:

    > Let's take string such:
    > AAAAAAA "blah blah \" blah blah" BBBBBB
    >
    > How to match all what is in between quotes, not counting escaped quote.
    > In this case it should match to:
    > blah blah \" blah blah
    >
    > How it can be done????


    A very simplistic way of doing that would be to use a zero-width look-behind,
    so that the end of the match is (in English) "a quote character that's not
    immediately preceded by a backslash." In Perl:

    #!/usr/bin/perl

    use strict;
    use warnings;

    my $string = 'AAAAAAA "blah blah \" blah blah" BBBBBB';

    if ($string =~ /\"(.*)(?<!\\)\"/) {
    print $1, "\n";
    }

    As I said though, that's a pretty simplistic way of doing it. It works for
    the specific example given, but may not work with real-world data. It does
    not handle, for example, the case where the string you're interested in ends
    in a backslash which is escaped - i.e. blah\\".

    The special cases and "what if" scenarios like the above can get out of hand
    rather quickly. For production use, I'd use something like the Text::Balanced
    module on CPAN, or even a full-blown parser using Parse::RecDescent.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
    Sherm Pendley, Nov 28, 2005
    #4
  5. Witold Rugowski

    James Guest

    Witold Rugowski wrote:
    > Hi!
    > I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))
    >
    > Let's take string such:
    > AAAAAAA "blah blah \" blah blah" BBBBBB
    >
    > How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
    > blah blah \" blah blah
    >
    > How it can be done????
    >
    > Best regards,
    > Witold Rugowski


    $ cat prog.pl
    #!/bin/env perl -w
    $_ = 'AAAAAAA "blah blah \" blah blah" BBBBBB';
    print $& if /(?<=\").+\\\".+(?=\")/;

    $ ./prog.pl
    blah blah \" blah blah

    James
    James, Nov 28, 2005
    #5
  6. Witold Rugowski

    Guest

    This is a very typical problem solved in "Mastering Regular
    Expressions" (by Jeffrey Friedl). The regex for this kind of problems
    is:

    opening_normal*(special_normal*)*closing

    where for you:
    opening: \"
    normal : [^"\\]
    special : \\"
    closing : \"
    the underscore _ here means seamless connection between two parts.
    So if I put it this way:
    ------------------------------
    $_ = qq(AAAAAAA "blah \\" blah \\" blah blah\\"" BBBBBB);
    print;
    print $1 if/\"([^"\\]*(\\"[^"\\]*)*)\"/;
    -------------------------------
    OUTPUT will be as following:
    _________________________________________
    AAAAAAA "blah \" blah \" blah blah\"" BBBBBB
    blah \" blah \" blah blah\"
    ---------------------------------------------------------
    XC


    Witold Rugowski wrote:
    > Hi!
    > I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))
    >
    > Let's take string such:
    > AAAAAAA "blah blah \" blah blah" BBBBBB
    >
    > How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
    > blah blah \" blah blah
    >
    > How it can be done????
    >
    > Best regards,
    > Witold Rugowski
    , Nov 29, 2005
    #6
  7. Witold Rugowski

    robic0 Guest

    On Mon, 28 Nov 2005 21:21:09 +0100, Witold Rugowski
    <> wrote:

    >Hi!
    >I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))
    >
    >Let's take string such:
    >AAAAAAA "blah blah \" blah blah" BBBBBB
    >
    >How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
    >blah blah \" blah blah
    >
    >How it can be done????
    >
    >Best regards,
    >Witold Rugowski


    my @regx_esc_codes = ( "\\", '/', '(', ')', '[', ']', '?', '|',
    '+', '.', '*', '$', '^', '{', '}', '@' );

    my $funky_string = 'AAAAAAA "blah blah \" blah blah" BBBBBB';
    my $match_string = 'your_logic_here';
    for (@regx_esc_codes)
    {
    my $tc = $_;
    # code template for regex
    my $xx = "\$match_string =~ s/\\$tc/\\\\\\$tc/g;";
    eval $xx;
    #print "$xx\n";
    }

    ## match should be ready now,
    ## be sure to trap regex violations

    $fnd = 0;
    # -- #
    my $ctmpl = "if (\$funky_string =~ /$match_string/ {\$fnd = 1;}";
    eval $ctmpl;
    # -- #

    if ($@) {
    ## Check the $ctmpl, get the control code, log this error as a
    code issue.
    ## This shouldn't happen ... the compiler will show the escape
    char, add
    ## the char to "@regx_esc_codes", now its fixed!
    $@ =~ s/^[\x20\n\t]+//; $@ =~ s/[\x20\n\t]+$//;
    print $@,"\n";
    exit;
    }

    Note - I may be wrong in this context, its been a while
    robic0, Nov 29, 2005
    #7
  8. Witold Rugowski wrote:
    > Hi!
    > I want to match with regexp substring, which is delimited by, let's say
    > ". It is trivial, but I don't know how to match escaped quotes with \.
    > OK, example will be better ;-))
    >
    > Let's take string such:
    > AAAAAAA "blah blah \" blah blah" BBBBBB
    >
    > How to match all what is in between quotes, not counting escaped quote.
    > In this case it should match to:
    > blah blah \" blah blah
    >
    > How it can be done????
    >
    > Best regards,
    > Witold Rugowski


    Try:

    /"(|.*?[^\\])"/ or /("(?|.*?[^\\])")/

    This says try the null case first. i.e. ""
    Then if it is not null slurp 0 characters and one more that is not a
    backslash, which you can because it is not a null case. Then check for ".
    Then try slurp 1 char and one more that is not a backslash. etc. The Idea is
    that after checking for null first you can slurp past a backslash which will
    always put the following " in the [^\\] position so you can't stop on it.

    See what a year of learning can get you.

    Regards,

    Wade
    Wade Whitaker, Dec 2, 2005
    #8
  9. Wade Whitaker wrote:
    > Witold Rugowski wrote:
    >
    >> Hi!
    >> I want to match with regexp substring, which is delimited by, let's
    >> say ". It is trivial, but I don't know how to match escaped quotes
    >> with \. OK, example will be better ;-))
    >>
    >> Let's take string such:
    >> AAAAAAA "blah blah \" blah blah" BBBBBB
    >>
    >> How to match all what is in between quotes, not counting escaped
    >> quote. In this case it should match to:
    >> blah blah \" blah blah
    >>
    >> How it can be done????
    >>
    >> Best regards,
    >> Witold Rugowski

    >
    >
    > Try:
    >
    > /"(|.*?[^\\])"/ or /("(?|.*?[^\\])")/
    >
    > This says try the null case first. i.e. ""
    > Then if it is not null slurp 0 characters and one more that is not a
    > backslash, which you can because it is not a null case. Then check for ".
    > Then try slurp 1 char and one more that is not a backslash. etc. The
    > Idea is that after checking for null first you can slurp past a
    > backslash which will always put the following " in the [^\\] position so
    > you can't stop on it.
    >
    > See what a year of learning can get you.
    >
    > Regards,
    >
    > Wade

    second one should have been /("(?:|.*?[^\\])")/

    Thats what I get for just typing it in. :)

    Regards,

    Wade
    Wade Whitaker, Dec 2, 2005
    #9
  10. Witold Rugowski

    robic0 Guest

    On Tue, 29 Nov 2005 00:02:19 -0800, robic0 wrote:

    >On Mon, 28 Nov 2005 21:21:09 +0100, Witold Rugowski
    ><> wrote:
    >
    >>Hi!
    >>I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))
    >>
    >>Let's take string such:
    >>AAAAAAA "blah blah \" blah blah" BBBBBB
    >>
    >>How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
    >>blah blah \" blah blah
    >>
    >>How it can be done????
    >>
    >>Best regards,
    >>Witold Rugowski

    >
    >my @regx_esc_codes = ( "\\", '/', '(', ')', '[', ']', '?', '|',
    > '+', '.', '*', '$', '^', '{', '}', '@' );
    >
    >my $funky_string = 'AAAAAAA "blah blah \" blah blah" BBBBBB';
    >my $match_string = 'your_logic_here';
    >for (@regx_esc_codes)
    >{
    > my $tc = $_;
    > # code template for regex
    > my $xx = "\$match_string =~ s/\\$tc/\\\\\\$tc/g;";
    > eval $xx;
    > #print "$xx\n";
    >}
    >
    >## match should be ready now,
    >## be sure to trap regex violations
    >
    >$fnd = 0;
    ># -- #
    >my $ctmpl = "if (\$funky_string =~ /$match_string/ {\$fnd = 1;}";
    >eval $ctmpl;
    ># -- #
    >
    >if ($@) {
    > ## Check the $ctmpl, get the control code, log this error as a
    >code issue.
    > ## This shouldn't happen ... the compiler will show the escape
    >char, add
    > ## the char to "@regx_esc_codes", now its fixed!
    > $@ =~ s/^[\x20\n\t]+//; $@ =~ s/[\x20\n\t]+$//;
    > print $@,"\n";
    > exit;
    >}
    >
    >Note - I may be wrong in this context, its been a while
    >

    Oh yeah, this is how its done et all. This is what works when
    qr// doesen't. Incase you don't think it works. Consider that
    this puppy will handle any variable you can read in from
    unknown string content.
    Now how much is that worth? Where's the regulars now?
    Bunch of blow hard puffs....
    robic0, Dec 3, 2005
    #10
  11. Witold Rugowski

    robic0 Guest

    On Sat, 03 Dec 2005 01:10:30 -0800, robic0 wrote:

    >On Tue, 29 Nov 2005 00:02:19 -0800, robic0 wrote:
    >
    >>On Mon, 28 Nov 2005 21:21:09 +0100, Witold Rugowski
    >><> wrote:
    >>
    >>>Hi!
    >>>I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))
    >>>
    >>>Let's take string such:
    >>>AAAAAAA "blah blah \" blah blah" BBBBBB
    >>>
    >>>How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
    >>>blah blah \" blah blah
    >>>
    >>>How it can be done????
    >>>
    >>>Best regards,
    >>>Witold Rugowski

    >>
    >>my @regx_esc_codes = ( "\\", '/', '(', ')', '[', ']', '?', '|',
    >> '+', '.', '*', '$', '^', '{', '}', '@' );
    >>
    >>my $funky_string = 'AAAAAAA "blah blah \" blah blah" BBBBBB';
    >>my $match_string = 'your_logic_here';
    >>for (@regx_esc_codes)
    >>{
    >> my $tc = $_;
    >> # code template for regex
    >> my $xx = "\$match_string =~ s/\\$tc/\\\\\\$tc/g;";
    >> eval $xx;
    >> #print "$xx\n";
    >>}
    >>
    >>## match should be ready now,
    >>## be sure to trap regex violations
    >>
    >>$fnd = 0;
    >># -- #
    >>my $ctmpl = "if (\$funky_string =~ /$match_string/ {\$fnd = 1;}";
    >>eval $ctmpl;
    >># -- #
    >>
    >>if ($@) {
    >> ## Check the $ctmpl, get the control code, log this error as a
    >>code issue.
    >> ## This shouldn't happen ... the compiler will show the escape
    >>char, add
    >> ## the char to "@regx_esc_codes", now its fixed!
    >> $@ =~ s/^[\x20\n\t]+//; $@ =~ s/[\x20\n\t]+$//;
    >> print $@,"\n";
    >> exit;
    >>}
    >>
    >>Note - I may be wrong in this context, its been a while
    >>

    >Oh yeah, this is how its done et all. This is what works when
    >qr// doesen't. Incase you don't think it works. Consider that
    >this puppy will handle any variable you can read in from
    >unknown string content.
    >Now how much is that worth? Where's the regulars now?
    >Bunch of blow hard puffs....


    Incase you don't understand, "your_logic_here" could be a
    unknown, variable subset string of that which you are looking for.
    I believe I could write a book on it.
    robic0, Dec 3, 2005
    #11
  12. Witold Rugowski

    Guest

    Witold Rugowski wrote:
    > Hi!
    > I want to match with regexp substring, which is delimited by, let's say ". It is trivial, but I don't know how to match escaped quotes with \. OK, example will be better ;-))
    >
    > Let's take string such:
    > AAAAAAA "blah blah \" blah blah" BBBBBB
    >
    > How to match all what is in between quotes, not counting escaped quote. In this case it should match to:
    > blah blah \" blah blah
    >
    > How it can be done????
    >
    > Best regards,
    > Witold Rugowski


    A little late, (ignore rant on regex escap codes,
    that is for escaping a match string with purely unintended
    random escape codes not used for logic.)

    yet another way:

    use strict;
    my $uuu = 'AAAAAAA "blah blah \" blah\"blah" BBBBBB';
    print $uuu,"\n";
    my $string = '';
    while ($uuu =~ s/\"(.*\\\")|\"(.*)\"/\"/) {$string .= $1.$2;}
    print $string,"\n";

    __END__

    output:

    AAAAAAA "blah blah \" blah\"blah" BBBBBB
    blah blah \" blah\"blah
    , Dec 7, 2005
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Java script Dude
    Replies:
    2
    Views:
    4,624
    Roedy Green
    Sep 5, 2005
  2. Kosio

    Floats to chars and chars to floats

    Kosio, Sep 16, 2005, in forum: C Programming
    Replies:
    44
    Views:
    1,246
    Tim Rentsch
    Sep 23, 2005
  3. Hongyu
    Replies:
    9
    Views:
    887
    James Kanze
    Aug 8, 2008
  4. M.Posseth

    receiving ??? chars instead of "special" chars

    M.Posseth, Nov 15, 2004, in forum: ASP .Net Web Services
    Replies:
    3
    Views:
    215
    Dan Rogers
    Nov 16, 2004
  5. Gene Angelo
    Replies:
    3
    Views:
    99
    Gene Angelo
    Sep 9, 2010
Loading...

Share This Page