Help: How can I parse this properties file?

Discussion in 'Perl Misc' started by yuanyun.ken, Nov 5, 2008.

  1. yuanyun.ken

    yuanyun.ken Guest

    Hi, dear all perl users:
    Recently I need read in a proerties file,
    its format is key=value, and it uses \ to escape.
    for example:
    expression means: key value
    a=b=c a b=c
    a\=b=c a=b c
    a\\=b=c a\ b=c
    a\\\=b=c a\=b c

    How can I parse this file?
    I think it should use regex.
    but my knowledage in Regular expression is poor.
    any help is greatly appreciated.
    yuanyun.ken, Nov 5, 2008
    #1
    1. Advertising

  2. "yuanyun.ken" <> wrote:
    >Recently I need read in a proerties file,
    >its format is key=value, and it uses \ to escape.
    >for example:
    >expression means: key value
    >a=b=c a b=c
    >a\=b=c a=b c
    >a\\=b=c a\ b=c
    >a\\\=b=c a\=b c
    >
    >How can I parse this file?
    >I think it should use regex.
    >but my knowledage in Regular expression is poor.


    No need for complex REs. Just have the tokenizer walk through the file
    character by character and when it finds a backslash then immeditately
    read another character and return that literal character as the next
    character in the token, no matter if it is a normal character, another
    backslash, or an equal sign.

    jue
    Jürgen Exner, Nov 5, 2008
    #2
    1. Advertising

  3. yuanyun.ken <> wrote:
    > Hi, dear all perl users:
    > Recently I need read in a proerties file,
    > its format is key=value, and it uses \ to escape.
    > for example:
    > expression means: key value
    > a=b=c a b=c
    > a\=b=c a=b c
    > a\\=b=c a\ b=c
    > a\\\=b=c a\=b c
    >
    > How can I parse this file?



    ----------------------------
    #!/usr/bin/perl
    use warnings;
    use strict;

    foreach my $line ( <DATA> ) {
    chomp $line;
    $line =~ s/\\\\/&backslash;/g; # translate literal backslashes

    my($key, $value) = split /(?<!\\)=/, $line, 2; # use negative look-behind

    $key =~ tr/\\//d; # eliminate backslashes used for escaping

    $key =~ s/&backslash;/\\/g; # put the literal backslashes back in

    printf "%-10s %-10s\n", $key, $value;
    }

    __DATA__
    a=b=c
    a\=b=c
    a\\=b=c
    a\\\=b=c
    ----------------------------


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
    Tad J McClellan, Nov 5, 2008
    #3
  4. yuanyun.ken

    Ted Zlatanov Guest

    On Wed, 5 Nov 2008 11:03:42 -0600 Tad J McClellan <> wrote:

    TJM> #!/usr/bin/perl
    TJM> use warnings;
    TJM> use strict;

    TJM> foreach my $line ( <DATA> ) {
    TJM> chomp $line;
    TJM> $line =~ s/\\\\/&backslash;/g; # translate literal backslashes

    TJM> my($key, $value) = split /(?<!\\)=/, $line, 2; # use negative look-behind

    TJM> $key =~ tr/\\//d; # eliminate backslashes used for escaping

    TJM> $key =~ s/&backslash;/\\/g; # put the literal backslashes back in

    TJM> printf "%-10s %-10s\n", $key, $value;
    TJM> }

    TJM> __DATA__
    TJM> a=b=c
    TJM> a\=b=c
    TJM> a\\=b=c
    TJM> a\\\=b=c
    TJM> ----------------------------

    I was thinking of a similar solution, but adding 256 (or some other
    large number) to each escaped character (in case there's a '&backslash;'
    in the data). As long as it's valid Unicode and the original data
    doesn't contain Unicode characters it should be a clean translation.

    Ted
    Ted Zlatanov, Nov 5, 2008
    #4
  5. On Wed, 05 Nov 2008 11:28:24 -0600, Ted Zlatanov <>
    wrote:

    >TJM> $line =~ s/\\\\/&backslash;/g; # translate literal backslashes
    >
    >TJM> my($key, $value) = split /(?<!\\)=/, $line, 2; # use negative look-behind
    >
    >TJM> $key =~ tr/\\//d; # eliminate backslashes used for escaping
    >
    >TJM> $key =~ s/&backslash;/\\/g; # put the literal backslashes back in
    >
    >TJM> printf "%-10s %-10s\n", $key, $value;
    >TJM> }
    >
    >TJM> __DATA__
    >TJM> a=b=c
    >TJM> a\=b=c
    >TJM> a\\=b=c
    >TJM> a\\\=b=c
    >TJM> ----------------------------
    >
    >I was thinking of a similar solution, but adding 256 (or some other
    >large number) to each escaped character (in case there's a '&backslash;'
    >in the data). As long as it's valid Unicode and the original data
    >doesn't contain Unicode characters it should be a clean translation.


    I like to be sure thus instead of adding "some other large number" I
    actually *find* something that *can't* be there:

    my $delim = "&". (sort @delims = $line =~ /&(\0+);/)[-1] . "\0;";
    $line =~ s/\\\\/$delim;/g; # translate literal backslashes
    # ...


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
    Michele Dondi, Nov 6, 2008
    #5
  6. On Thu, 06 Nov 2008 14:39:37 +0100, Michele Dondi
    <> wrote:

    >I like to be sure thus instead of adding "some other large number" I
    >actually *find* something that *can't* be there:
    >
    > my $delim = "&". (sort @delims = $line =~ /&(\0+);/)[-1] . "\0;";
    > $line =~ s/\\\\/$delim;/g; # translate literal backslashes


    Sorry! That's what you get out of posting such in a hurry; I meant:

    my $delim = "&". (sort "", $line =~ /&(\0+);/g)[-1] . "\0;";


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
    Michele Dondi, Nov 6, 2008
    #6
  7. yuanyun.ken

    cartercc Guest

    On Nov 5, 12:28 pm, Ted Zlatanov <> wrote:
    > I was thinking of a similar solution, but adding 256 (or some other
    > large number) to each escaped character (in case there's a '&backslash;'
    > in the data).  As long as it's valid Unicode and the original data
    > doesn't contain Unicode characters it should be a clean translation.


    I find absolutely nothing wrong with Tad's solution. The fact that it
    ^might^ be a little more verbose than necessary I regard as a mark in
    its favor, not a mark against.

    I might consider the string '\' rather than '&backslash;' but
    that's a simple quibble.

    Now, what about a one liner?

    CC
    cartercc, Nov 6, 2008
    #7
  8. yuanyun.ken

    Guest

    On Wed, 5 Nov 2008 05:23:58 -0800 (PST), "yuanyun.ken" <> wrote:

    >Hi, dear all perl users:
    >Recently I need read in a proerties file,
    >its format is key=value, and it uses \ to escape.
    >for example:
    >expression means: key value
    >a=b=c a b=c
    >a\=b=c a=b c
    >a\\=b=c a\ b=c
    >a\\\=b=c a\=b c
    >
    >How can I parse this file?
    >I think it should use regex.
    >but my knowledage in Regular expression is poor.
    >any help is greatly appreciated.



    Well, according to your template there, the valid
    equal sign separating Key from Value is the first
    non-escaped equal sign.

    So yes there is a regular expression to do that.
    You have data in the escaped form. It can be split up
    into key and value using those rules in a regexp, then
    unescape the key/val pair.

    Below is probably no different than the other suggestions
    in terms of how the split occurs using a regex.

    Where does this sequence take you? Do you expect the value
    side to be escaped?

    These appear possible as well, is this something that will be
    encountered?

    a\\\\=b=c a\\ b=c
    a\\\=b\\=c a\=b\ c
    a\\=b\\=c a\ b\=c

    Like the other possiblities, here is one more. Its hard to see how
    you would get a simple one-step solution though. Maybe..

    Good luck.
    sln

    # ** Original
    # expression means: key value
    # a=b=c a b=c
    # a\=b=c a=b c
    # a\\=b=c a\ b=c
    # a\\\=b=c a\=b c

    # ** Output
    # a=b=c a b=c
    # a\=b=c a=b c
    # a\\=b=c a\ b=c
    # a\\\=b=c a\=b c
    # a\\\\=b=c a\\ b=c
    # a\\\=b\\=c a\=b\ c
    # a\\=b\\=c a\ b\=c


    use strict;
    use warnings;

    my @property = ();

    foreach my $line ( <DATA> ) {
    chomp $line;
    push @property, $line if (length $line);
    }

    print "\nexpression means:\tkey\tvalue\n";

    for (@property)
    {
    if (/^((?:(?:\\.)*?|.*?)+)=(.*)$/)
    {
    # unescape built in sequences
    my ($key, $val) = ($1,$2);
    $key =~ s/\\(.)/$1/g;
    $val =~ s/\\(.)/$1/g;
    printf "%-20s\t%s\t%s\n", $_, $key, $val;
    }
    }

    __DATA__
    a=b=c
    a\=b=c
    a\\=b=c
    a\\\=b=c
    a\\\\=b=c
    a\\\=b\\=c
    a\\=b\\=c
    , Nov 6, 2008
    #8
  9. yuanyun.ken

    Guest

    On Thu, 06 Nov 2008 23:34:53 GMT, wrote:

    >On Wed, 5 Nov 2008 05:23:58 -0800 (PST), "yuanyun.ken" <> wrote:
    >
    >>Hi, dear all perl users:
    >>Recently I need read in a proerties file,
    >>its format is key=value, and it uses \ to escape.
    >>for example:
    >>expression means: key value
    >>a=b=c a b=c
    >>a\=b=c a=b c
    >>a\\=b=c a\ b=c
    >>a\\\=b=c a\=b c
    >>
    >>How can I parse this file?
    >>I think it should use regex.
    >>but my knowledage in Regular expression is poor.
    >>any help is greatly appreciated.

    >
    >
    >Well, according to your template there, the valid
    >equal sign separating Key from Value is the first
    >non-escaped equal sign.
    >
    >So yes there is a regular expression to do that.
    >You have data in the escaped form. It can be split up
    >into key and value using those rules in a regexp, then
    >unescape the key/val pair.
    >
    >Below is probably no different than the other suggestions
    >in terms of how the split occurs using a regex.
    >
    >Where does this sequence take you? Do you expect the value
    >side to be escaped?
    >
    >These appear possible as well, is this something that will be
    >encountered?
    >
    >a\\\\=b=c a\\ b=c
    >a\\\=b\\=c a\=b\ c
    >a\\=b\\=c a\ b\=c
    >
    >Like the other possiblities, here is one more. Its hard to see how
    >you would get a simple one-step solution though. Maybe..
    >
    >Good luck.
    >sln
    >
    ># ** Original
    ># expression means: key value
    ># a=b=c a b=c
    ># a\=b=c a=b c
    ># a\\=b=c a\ b=c
    ># a\\\=b=c a\=b c
    >
    ># ** Output
    ># a=b=c a b=c
    ># a\=b=c a=b c
    ># a\\=b=c a\ b=c
    ># a\\\=b=c a\=b c
    ># a\\\\=b=c a\\ b=c
    ># a\\\=b\\=c a\=b\ c
    ># a\\=b\\=c a\ b\=c
    >
    >
    >use strict;
    >use warnings;
    >

    [snip]

    You could minimalize it as well.

    foreach ( <DATA> ) {
    chomp;

    if (/^((?:(?:\\.)*?|.*?)+)=(.*)$/)
    {
    # unescape built in sequences
    my ($key, $val) = ($1,$2);
    $key =~ s/\\(.)/$1/g;
    $val =~ s/\\(.)/$1/g;
    printf "%-20s\t%s\t%s\n", $_, $key, $val;
    }
    }

    >
    >__DATA__
    >a=b=c
    >a\=b=c
    >a\\=b=c
    >a\\\=b=c
    >a\\\\=b=c
    >a\\\=b\\=c
    >a\\=b\\=c
    , Nov 6, 2008
    #9
  10. yuanyun.ken

    Ted Zlatanov Guest

    On Thu, 06 Nov 2008 14:39:37 +0100 Michele Dondi <> wrote:

    MD> On Wed, 05 Nov 2008 11:28:24 -0600, Ted Zlatanov <>
    MD> wrote:

    TJM> $line =~ s/\\\\/&backslash;/g; # translate literal backslashes
    >>

    TJM> my($key, $value) = split /(?<!\\)=/, $line, 2; # use negative look-behind
    >>

    TJM> $key =~ tr/\\//d; # eliminate backslashes used for escaping
    >>

    TJM> $key =~ s/&backslash;/\\/g; # put the literal backslashes back in
    >>

    TJM> printf "%-10s %-10s\n", $key, $value;
    TJM> }
    >>

    TJM> __DATA__
    TJM> a=b=c
    TJM> a\=b=c
    TJM> a\\=b=c
    TJM> a\\\=b=c
    TJM> ----------------------------
    >>
    >> I was thinking of a similar solution, but adding 256 (or some other
    >> large number) to each escaped character (in case there's a '&backslash;'
    >> in the data). As long as it's valid Unicode and the original data
    >> doesn't contain Unicode characters it should be a clean translation.


    MD> I like to be sure thus instead of adding "some other large number" I
    MD> actually *find* something that *can't* be there:

    MD> my $delim = "&". (sort @delims = $line =~ /&(\0+);/)[-1] . "\0;";
    MD> $line =~ s/\\\\/$delim;/g; # translate literal backslashes
    MD> # ...

    Surely there's a CPAN module to do this... Or 10...

    From the docs of Encode::Escape, there's also String::Escape,
    Unicode::Escape, TeX::Encode, HTML::Mason::Escape,
    Template::plugin::XML::Escape, URI::Escape.

    Ted
    Ted Zlatanov, Nov 7, 2008
    #10
  11. yuanyun.ken

    yuanyun.ken Guest

    Thanks for all the reply. and this problem has been solved.
    but sorry for my poor understanding on regex, and having to trouble
    you again,
    here I have another little problem:
    if the content ends with a real single backslash, I need read in the
    next line.

    How to use regex to do this?
    for example:
    line ends with match
    \ yes
    \\ no
    \\\ yes
    \\\\ no
    Thanks for any help again.
    yuanyun.ken, Nov 8, 2008
    #11
  12. yuanyun.ken <> wrote:
    > Thanks for all the reply. and this problem has been solved.
    > but sorry for my poor understanding on regex, and having to trouble
    > you again,
    > here I have another little problem:
    > if the content ends with a real single backslash, I need read in the
    > next line.
    >
    > How to use regex to do this?
    > for example:
    > line ends with match
    > \ yes
    > \\ no
    > \\\ yes
    > \\\\ no



    -------------------
    #!/usr/bin/perl
    use warnings;
    use strict;

    foreach my $line ( <DATA> ) {
    chomp $line;
    if ( $line =~ /(\\+)$/ and length($1) % 2 )
    { print "yes\n" }
    else
    { print "no\n" }
    }

    __DATA__
    \
    \\
    \\\
    \\\\
    -------------------


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
    Tad J McClellan, Nov 8, 2008
    #12
  13. "yuanyun.ken" <> wrote:
    >Thanks for all the reply. and this problem has been solved.
    >but sorry for my poor understanding on regex, and having to trouble
    >you again,
    >here I have another little problem:
    >if the content ends with a real single backslash, I need read in the
    >next line.
    >
    >How to use regex to do this?


    You don't. When the tokenizer discovers a backslash it just reads
    another character and if that character is an EOL then just continue
    processing the next line instead of reporting an end-of-token.

    jue
    Jürgen Exner, Nov 8, 2008
    #13
  14. yuanyun.ken

    fB Guest

    "yuanyun.ken" <> wrote in <>:
    > Thanks for all the reply. and this problem has been solved.
    > but sorry for my poor understanding on regex, and having to trouble
    > you again,
    > here I have another little problem:
    > if the content ends with a real single backslash, I need read in the
    > next line.
    >
    > How to use regex to do this?
    > for example:
    > line ends with match
    > \ yes
    > \\ no
    > \\\ yes
    > \\\\ no
    > Thanks for any help again.


    I felt like doing your homework for you, and anyway it is not that difficult.

    #!/usr/bin/perl

    use strict;
    use warnings;
    use feature ':5.10';

    while (<::DATA>) {
    chomp;
    printf '%10s', $_;
    m{ (?: [^\\] | ^ ) # match a non-backslash character
    # or the start of the string
    ( \\ (?:\\\\)* ) # match an odd number of backslashes
    $ # followed by the end of the string
    }xms
    and say ' matched: '.$1
    or say ' not matched';
    }

    exit;
    __DATA__

    \
    \\
    \\\
    \\\\
    \\\\\
    \\\\\\
    \\\\\\\
    \\\\\\\\
    a
    a\
    a\\
    a\\\
    a\\\\
    a\\\\\
    a\\\\\\
    a\\\\\\\
    a\\\\\\\\
    \a
    \\a
    \\\a
    \\\\a
    __END__



    --
    fB, Nov 8, 2008
    #14
  15. yuanyun.ken

    Guest

    On Fri, 7 Nov 2008 19:55:48 -0800 (PST), "yuanyun.ken" <> wrote:

    >Thanks for all the reply. and this problem has been solved.
    >but sorry for my poor understanding on regex, and having to trouble
    >you again,
    >here I have another little problem:
    >if the content ends with a real single backslash, I need read in the
    >next line.
    >
    >How to use regex to do this?
    >for example:
    >line ends with match
    >\ yes
    >\\ no
    >\\\ yes
    >\\\\ no
    >Thanks for any help again.


    I assume this pertains to the rules set out on the properties
    in the original problem statement.

    Tad's solution to check then end for 'odd' number of '\' works best
    for a line continuation.

    Be very cautious!! If you are trying to find a way to fix random
    line splits when this file was generated, there is absolutely
    NO solution available to you at all !!!
    The reason is you already have escaping rules in place

    The line split must be intelligently constucted in that only
    an odd number of '\' at the end will determine line continuation.
    And at the same time be used in the general escaping rules after
    it is joined.

    You can't just add a '\' where you would like to split the line then
    remove it later without counting the existing escapes at the end.
    Either way it takes intelligence to construct the file given the
    existing escaping rules you laid out for yourself.

    Notice the places where the split occurs in DATA below..
    Even if you had an intelligent generator that splits the
    line on a '\', it could still split on an even boundry.
    Or say it adds a complement to make the split odd, still,
    even then, the original can not be guaranteed to reassemble
    because this conflicts with the original escape logic..

    There is no solution then!


    sln

    use strict;
    use warnings;

    # ** Original
    # expression means: key value
    # a=b=c a b=c
    # a\=b=c a=b c
    # a\\=b=c a\ b=c
    # a\\\=b=c a\=b c

    # ** Output
    # a=b=c a b=c
    # a\=b=c a=b c
    # a\\=b=c a\ b=c
    # a\\\=b=c a\=b c
    # a\\\\=b=c a\\ b=c
    # a\\\=b\\=c a\=b\ c
    # a\\=b\\=c a\ b\=c
    # a=b=c a b=c
    # a\=b=c a=b c
    # a\\=b=c a\ b=c
    # a\\\=b=c a\=b c
    # a\\\\=b=c a\\ b=c
    # a\\\=b\\=c a\=b\ c
    # a\\=b\\=c a\ b\=c



    my $buf = '';

    print "\nexpression means:\tkey\tvalue\n";

    foreach ( <DATA> ) {
    chomp;
    $_ = $buf . $_;

    if ( /(\\+)$/ and length($1) % 2 ) {
    # wouldn't want to do this -> s/\\$//;
    $buf .= $_; # cat this line to buffer
    next; # read next line
    }
    if (/^((?:(?:\\.)*?|.*?)+)=(.*)$/) {
    # unescape built in sequences
    my ($key, $val) = ($1,$2);
    $key =~ s/\\(.)/$1/g;
    $val =~ s/\\(.)/$1/g;
    printf "%-20s\t%s\t%s\n", $_, $key, $val;
    }
    $buf = '';
    }

    __DATA__

    # no line splits
    a=b=c
    a\=b=c
    a\\=b=c
    a\\\=b=c
    a\\\\=b=c
    a\\\=b\\=c
    a\\=b\\=c

    # ok line splits
    a=b=c
    a\
    =b=c
    a\
    \=b=c
    a\\\
    =b=c
    a\\\
    \=b=c
    a\\\=b\
    \=c
    a\\=b\
    \=c

    #some good/bad line splits
    a=b=c
    a\
    =b=c
    a\\
    =b=c
    a\\\
    =b=c
    a\\\\
    =b=c
    a\\\
    =b\\=c
    a\\=b\\
    =c
    , Nov 8, 2008
    #15
  16. yuanyun.ken

    Guest

    On Sat, 08 Nov 2008 18:38:16 GMT, wrote:

    >On Fri, 7 Nov 2008 19:55:48 -0800 (PST), "yuanyun.ken" <> wrote:
    >
    >>Thanks for all the reply. and this problem has been solved.
    >>but sorry for my poor understanding on regex, and having to trouble
    >>you again,
    >>here I have another little problem:
    >>if the content ends with a real single backslash, I need read in the
    >>next line.
    >>
    >>How to use regex to do this?
    >>for example:
    >>line ends with match
    >>\ yes
    >>\\ no
    >>\\\ yes
    >>\\\\ no
    >>Thanks for any help again.

    >
    >I assume this pertains to the rules set out on the properties
    >in the original problem statement.
    >
    >Tad's solution to check then end for 'odd' number of '\' works best
    >for a line continuation.
    >
    >Be very cautious!! If you are trying to find a way to fix random
    >line splits when this file was generated, there is absolutely
    >NO solution available to you at all !!!
    >The reason is you already have escaping rules in place
    >
    >The line split must be intelligently constucted in that only
    >an odd number of '\' at the end will determine line continuation.
    >And at the same time be used in the general escaping rules after
    >it is joined.
    >
    >You can't just add a '\' where you would like to split the line then
    >remove it later without counting the existing escapes at the end.
    >Either way it takes intelligence to construct the file given the
    >existing escaping rules you laid out for yourself.
    >
    >Notice the places where the split occurs in DATA below..
    >Even if you had an intelligent generator that splits the
    >line on a '\', it could still split on an even boundry.
    >Or say it adds a complement to make the split odd, still,
    >even then, the original can not be guaranteed to reassemble
    >because this conflicts with the original escape logic..
    >
    >There is no solution then!
    >
    >
    >sln
    >
    >use strict;
    >use warnings;
    >
    ># ** Original
    ># expression means: key value
    ># a=b=c a b=c
    ># a\=b=c a=b c
    ># a\\=b=c a\ b=c
    ># a\\\=b=c a\=b c
    >

    # ** Output
    # a=b=c a b=c
    # a\=b=c a=b c
    # a\\=b=c a\ b=c
    # a\\\=b=c a\=b c
    # a\\\\=b=c a\\ b=c
    # a\\\=b\\=c a\=b\ c
    # a\\=b\\=c a\ b\=c
    # a=b=c a b=c
    # a\=b=c a=b c
    # a\\=b=c a\ b=c
    # a\\\=b=c a\=b c
    # a\\\\=b=c a\\ b=c
    # a\\\=b\\=c a\=b\ c
    # a\\=b\\=c a\ b\=c
    # a=b=c a b=c
    # a\=b=c a=b c
    # =b=c b=c
    # a\\\=b=c a\=b c
    # =b=c b=c
    # a\\\=b\\=c a\=b\ c
    # a\\=b\\ a\ b\
    # =c
    >
    >my $buf = '';
    >
    >print "\nexpression means:\tkey\tvalue\n";
    >
    >foreach ( <DATA> ) {
    > chomp;
    > $_ = $buf . $_;
    >
    > if ( /(\\+)$/ and length($1) % 2 ) {
    > # wouldn't want to do this -> s/\\$//;
    > $buf .= $_; # cat this line to buffer
    > next; # read next line
    > }
    > if (/^((?:(?:\\.)*?|.*?)+)=(.*)$/) {
    > # unescape built in sequences
    > my ($key, $val) = ($1,$2);
    > $key =~ s/\\(.)/$1/g;
    > $val =~ s/\\(.)/$1/g;
    > printf "%-20s\t%s\t%s\n", $_, $key, $val;
    > }
    > $buf = '';
    >}
    >
    >__DATA__
    >
    ># no line splits
    >a=b=c
    >a\=b=c
    >a\\=b=c
    >a\\\=b=c
    >a\\\\=b=c
    >a\\\=b\\=c
    >a\\=b\\=c
    >
    ># ok line splits
    >a=b=c
    >a\
    >=b=c
    >a\
    >\=b=c
    >a\\\
    >=b=c
    >a\\\
    >\=b=c
    >a\\\=b\
    >\=c
    >a\\=b\
    >\=c
    >
    >#some good/bad line splits
    >a=b=c
    >a\
    >=b=c
    >a\\
    >=b=c
    >a\\\
    >=b=c
    >a\\\\
    >=b=c
    >a\\\
    >=b\\=c
    >a\\=b\\
    >=c
    , Nov 8, 2008
    #16
  17. yuanyun.ken

    Dr.Ruud Guest

    Tad J McClellan schreef:

    > if ( $line =~ /(\\+)$/ and length($1) % 2 )
    > { print "yes\n" }
    > else
    > { print "no\n" }
    > }


    /(?<!\\)(?:\\\\)*\\$/

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Nov 9, 2008
    #17
  18. Dr.Ruud <> wrote:
    > Tad J McClellan schreef:
    >
    >> if ( $line =~ /(\\+)$/ and length($1) % 2 )
    >> { print "yes\n" }
    >> else
    >> { print "no\n" }
    >> }

    >
    > /(?<!\\)(?:\\\\)*\\$/



    Which one do you want to figure out after not having
    seen this program for six months?


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
    Tad J McClellan, Nov 9, 2008
    #18
  19. yuanyun.ken

    Dr.Ruud Guest

    Tad J McClellan schreef:
    > Dr.Ruud:
    >> Tad J McClellan:


    >>> if ( $line =~ /(\\+)$/ and length($1) % 2 )
    >>> { print "yes\n" }
    >>> else
    >>> { print "no\n" }
    >>> }

    >>
    >> /(?<!\\)(?:\\\\)*\\$/

    >
    > Which one do you want to figure out after not having
    > seen this program for six months?


    Probably this one:

    # even number of trailing slashes
    print /(?<!\\)(?:\\\\)*$/ ? "no" : "yes";

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Nov 9, 2008
    #19
  20. yuanyun.ken

    Guest

    On Sat, 08 Nov 2008 18:38:16 GMT, wrote:

    >On Fri, 7 Nov 2008 19:55:48 -0800 (PST), "yuanyun.ken" <> wrote:
    >
    >>Thanks for all the reply. and this problem has been solved.
    >>but sorry for my poor understanding on regex, and having to trouble
    >>you again,
    >>here I have another little problem:
    >>if the content ends with a real single backslash, I need read in the
    >>next line.
    >>
    >>How to use regex to do this?
    >>for example:
    >>line ends with match
    >>\ yes
    >>\\ no
    >>\\\ yes
    >>\\\\ no
    >>Thanks for any help again.

    >
    >I assume this pertains to the rules set out on the properties
    >in the original problem statement.
    >
    >Tad's solution to check then end for 'odd' number of '\' works best
    >for a line continuation.
    >
    >Be very cautious!! If you are trying to find a way to fix random
    >line splits when this file was generated, there is absolutely
    >NO solution available to you at all !!!
    >The reason is you already have escaping rules in place
    >
    >The line split must be intelligently constucted in that only
    >an odd number of '\' at the end will determine line continuation.
    >And at the same time be used in the general escaping rules after
    >it is joined.
    >
    >You can't just add a '\' where you would like to split the line then
    >remove it later without counting the existing escapes at the end.
    >Either way it takes intelligence to construct the file given the
    >existing escaping rules you laid out for yourself.
    >
    >Notice the places where the split occurs in DATA below..
    >Even if you had an intelligent generator that splits the
    >line on a '\', it could still split on an even boundry.
    >Or say it adds a complement to make the split odd, still,
    >even then, the original can not be guaranteed to reassemble
    >because this conflicts with the original escape logic..
    >
    >There is no solution then!
    >
    >
    >sln
    >
    >use strict;
    >use warnings;
    >
    ># ** Original
    ># expression means: key value
    ># a=b=c a b=c
    ># a\=b=c a=b c
    ># a\\=b=c a\ b=c
    ># a\\\=b=c a\=b c
    >
    ># ** Output
    ># a=b=c a b=c
    ># a\=b=c a=b c
    ># a\\=b=c a\ b=c
    ># a\\\=b=c a\=b c
    ># a\\\\=b=c a\\ b=c
    ># a\\\=b\\=c a\=b\ c
    ># a\\=b\\=c a\ b\=c
    ># a=b=c a b=c
    ># a\=b=c a=b c
    ># a\\=b=c a\ b=c
    ># a\\\=b=c a\=b c
    ># a\\\\=b=c a\\ b=c
    ># a\\\=b\\=c a\=b\ c
    ># a\\=b\\=c a\ b\=c
    >
    >
    >
    >my $buf = '';
    >
    >print "\nexpression means:\tkey\tvalue\n";
    >
    >foreach ( <DATA> ) {
    > chomp;
    > $_ = $buf . $_;
    >
    > if ( /(\\+)$/ and length($1) % 2 ) {
    > # wouldn't want to do this -> s/\\$//;
    > $buf .= $_; # cat this line to buffer

    ^^^^^^^^^^^
    $buf = $_; # asign to buffer

    # see what happens whey you don't test

    > next; # read next line
    > }
    > if (/^((?:(?:\\.)*?|.*?)+)=(.*)$/) {
    > # unescape built in sequences
    > my ($key, $val) = ($1,$2);
    > $key =~ s/\\(.)/$1/g;
    > $val =~ s/\\(.)/$1/g;
    > printf "%-20s\t%s\t%s\n", $_, $key, $val;
    > }
    > $buf = '';
    >}
    >
    >__DATA__
    >
    ># no line splits
    >a=b=c
    >a\=b=c
    >a\\=b=c
    >a\\\=b=c
    >a\\\\=b=c
    >a\\\=b\\=c
    >a\\=b\\=c
    >
    ># ok line splits
    >a=b=c
    >a\
    >=b=c
    >a\
    >\=b=c
    >a\\\
    >=b=c
    >a\\\
    >\=b=c
    >a\\\=b\
    >\=c
    >a\\=b\
    >\=c
    >
    >#some good/bad line splits
    >a=b=c
    >a\
    >=b=c
    >a\\
    >=b=c
    >a\\\
    >=b=c
    >a\\\\
    >=b=c
    >a\\\
    >=b\\=c
    >a\\=b\\
    >=c
    , Nov 9, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Nathan Sokalski
    Replies:
    0
    Views:
    889
    Nathan Sokalski
    Oct 17, 2005
  2. =?Utf-8?B?Q2hyaXN0b3BoZSBQZWlsbGV0?=

    CompositeControls: ViewState properties w/ Mapped properties probl

    =?Utf-8?B?Q2hyaXN0b3BoZSBQZWlsbGV0?=, Jan 19, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    1,124
    Steven Cheng[MSFT]
    Jan 19, 2006
  3. Jim
    Replies:
    0
    Views:
    5,965
  4. Replies:
    19
    Views:
    1,101
    Daniel Vallstrom
    Mar 15, 2005
  5. 7stud --

    optparse: parse v. parse! ??

    7stud --, Feb 20, 2008, in forum: Ruby
    Replies:
    3
    Views:
    173
    7stud --
    Feb 20, 2008
Loading...

Share This Page