Manually parsing quoted characters

Discussion in 'Perl Misc' started by Arne Ruhnau, Aug 11, 2005.

  1. Arne Ruhnau

    Arne Ruhnau Guest

    Cheers,

    is there some simpler/more elegant solution to the following problem:

    given a string 'a\|b|c', transform it into the list ('a|b', 'c')

    Currently, i have a intermediate representation, but i doubt its
    generality. What if $string contains (out of reasons I cannot predict)
    '{[[VerticalBar]]}' right from the start?
    I need a primitive pattern-language, and currently it contains, among others,

    (..|..|..) : Disjunction

    which I have to parse to a list of its elements (w/o the |'s).

    Arne Ruhnau

    Code follows:

    use strict;
    use warnings;
    use Test::More tests => 1;

    my $string = 'a\|b|c';
    is_deeply(string2list($string), ['a|b', 'c']);

    sub string2list {
    my $string = shift;
    $string =~ s/\\\|/{[[VerticalBar]]}/g;
    my @elements = split /\|/, $string;
    for(@elements) {
    s/\{\[\[VerticalBar\]\]\}/|/g;
    }
    return \@elements;
    }
     
    Arne Ruhnau, Aug 11, 2005
    #1
    1. Advertising

  2. Arne Ruhnau

    Dave Weaver Guest

    Arne Ruhnau <> wrote:
    > Cheers,
    >
    > is there some simpler/more elegant solution to the following problem:
    >
    > given a string 'a\|b|c', transform it into the list ('a|b', 'c')
    >


    Here's my attempt, splitting using a negative lookbehind assertion,
    i.e. only splitting on a | if it's not preceded by a \

    #!/usr/bin/perl
    use strict;
    use warnings;
    use Data::Dumper;

    my @list = map { s/\\\|/|/g; $_ } split /(?<!\\)\|/, 'a\|b|c';
    print Dumper \@list;

    __END__

    $VAR1 = [
    'a|b',
    'c'
    ];
     
    Dave Weaver, Aug 11, 2005
    #2
    1. Advertising

  3. Arne Ruhnau

    Arne Ruhnau Guest

    Dave Weaver wrote:
    > Arne Ruhnau <> wrote:
    >
    >> is there some simpler/more elegant solution to the following problem:
    >>
    >> given a string 'a\|b|c', transform it into the list ('a|b', 'c')

    >
    > Here's my attempt, splitting using a negative lookbehind assertion,
    > i.e. only splitting on a | if it's not preceded by a \
    >
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    > use Data::Dumper;
    >
    > my @list = map { s/\\\|/|/g; $_ } split /(?<!\\)\|/, 'a\|b|c';


    Nice. I changed it to

    map { s/\\(.)/$1/g; $_ } to be able to work with different quoted characters.

    Thanks,

    Arne Ruhnau
     
    Arne Ruhnau, Aug 11, 2005
    #3
  4. In article <42fb3422$0$18200$>,
    Dave Weaver <> wrote:
    >Arne Ruhnau <> wrote:
    >> is there some simpler/more elegant solution to the following problem:
    >> given a string 'a\|b|c', transform it into the list ('a|b', 'c')

    >
    >Here's my attempt, splitting using a negative lookbehind assertion,
    >i.e. only splitting on a | if it's not preceded by a \
    >
    >#!/usr/bin/perl
    >use strict;
    >use warnings;
    >use Data::Dumper;
    >
    >my @list = map { s/\\\|/|/g; $_ } split /(?<!\\)\|/, 'a\|b|c';
    >print Dumper \@list;
    >
    >__END__
    >
    >$VAR1 = [
    > 'a|b',
    > 'c'
    > ];


    That runs into the question of how the string 'a\\|b|c' should be
    treated -- should it result in [ 'a\', 'b', 'c' ] ? If so, then
    a little more work needs to be put into the code.

    (note for the nit-pickers: strings above indicate actual string
    contents, not Perl string literals or Dumper output. Do we have
    a convention for that?)

    Gary
    --
    The recipe says "toss lightly," but I suppose that depends
    on how much you eat and how bad the cramps get. - J. Lileks
     
    Gary E. Ansok, Aug 11, 2005
    #4
  5. Arne Ruhnau

    ko Guest

    Gary E. Ansok wrote:
    > In article <42fb3422$0$18200$>,
    > Dave Weaver <> wrote:
    >
    >>Arne Ruhnau <> wrote:
    >>
    >>> is there some simpler/more elegant solution to the following problem:
    >>> given a string 'a\|b|c', transform it into the list ('a|b', 'c')

    >>
    >>Here's my attempt, splitting using a negative lookbehind assertion,
    >>i.e. only splitting on a | if it's not preceded by a \
    >>
    >>#!/usr/bin/perl
    >>use strict;
    >>use warnings;
    >>use Data::Dumper;
    >>
    >>my @list = map { s/\\\|/|/g; $_ } split /(?<!\\)\|/, 'a\|b|c';
    >>print Dumper \@list;
    >>
    >>__END__
    >>
    >>$VAR1 = [
    >> 'a|b',
    >> 'c'
    >> ];

    >
    >
    > That runs into the question of how the string 'a\\|b|c' should be
    > treated -- should it result in [ 'a\', 'b', 'c' ] ? If so, then
    > a little more work needs to be put into the code.


    One possible way:

    use strict;
    use warnings;
    use Text::parseWords;

    chomp(my @data = <DATA>);
    foreach my $line (@data) {
    print join(' :: ', grep { $_ } parse_line('\|', 0, $line) ) . "\n";
    }
    __DATA__
    a\|b|c
    |a\|b|c
    |a\\|b|c


    HTH - keith
     
    ko, Aug 11, 2005
    #5
  6. Arne Ruhnau

    Arne Ruhnau Guest

    Gary E. Ansok wrote:
    > In article <42fb3422$0$18200$>,
    > Dave Weaver <> wrote:
    >
    >>Arne Ruhnau <> wrote:
    >>
    >>> is there some simpler/more elegant solution to the following problem:
    >>> given a string 'a\|b|c', transform it into the list ('a|b', 'c')

    >>
    >>Here's my attempt, splitting using a negative lookbehind assertion,
    >>i.e. only splitting on a | if it's not preceded by a \
    >>
    >>#!/usr/bin/perl
    >>use strict;
    >>use warnings;
    >>use Data::Dumper;
    >>
    >>my @list = map { s/\\\|/|/g; $_ } split /(?<!\\)\|/, 'a\|b|c';
    >>print Dumper \@list;
    >>
    >>__END__

    <snip output>
    >
    > That runs into the question of how the string 'a\\|b|c' should be
    > treated -- should it result in [ 'a\', 'b', 'c' ] ? If so, then
    > a little more work needs to be put into the code.


    It should be read from left to right, thus resulting in ['a\', 'b', 'c'].
    After thinking all this over, I came to realize that there is nothing as
    elegant as lexing my (a|b|c)-lists into something like

    [OPENLIST],
    [ELEMENT, a],[DELIMITER],[ELEMENT, b],[DELIMITER],[ELEMENT, c],
    [CLOSELIST]

    and then to parse it into ['a', 'b', 'c'] - which is what i need for the
    rest of the language.

    'a\\|b|c' would be lexed as

    [ELEMENT, 'a\'], [DELIMITER], ...

    I just had to define a lexer which takes care of backslash-quoted
    characters, the rest would be simple enough.

    The rest of my code to parse my simple language works this way, but naively
    I thought it would be possible to simply hack (..|..)-constructs into it.
    Too lazy, too much hybris and way to impatient...

    I'll post my solution when it is done the way i think it should be done.

    Thanks for your thoughts,

    Arne Ruhnau
     
    Arne Ruhnau, Aug 14, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. flarosa
    Replies:
    6
    Views:
    12,013
    flarosa
    Apr 11, 2006
  2. Replies:
    4
    Views:
    357
  3. chaithu

    quoted string parsing

    chaithu, Aug 7, 2007, in forum: Java
    Replies:
    0
    Views:
    371
    chaithu
    Aug 7, 2007
  4. Wells
    Replies:
    1
    Views:
    948
    Intchanter / Daniel Fackrell
    Dec 9, 2009
  5. Replies:
    5
    Views:
    139
    Xicheng Jia
    Jun 1, 2007
Loading...

Share This Page