Manually parsing quoted characters

A

Arne Ruhnau

Cheers,

is there some simpler/more elegant solution to the following problem:

given a string 'a\|b|c', transform it into the list ('a|b', 'c')

Currently, i have a intermediate representation, but i doubt its
generality. What if $string contains (out of reasons I cannot predict)
'{[[VerticalBar]]}' right from the start?
I need a primitive pattern-language, and currently it contains, among others,

(..|..|..) : Disjunction

which I have to parse to a list of its elements (w/o the |'s).

Arne Ruhnau

Code follows:

use strict;
use warnings;
use Test::More tests => 1;

my $string = 'a\|b|c';
is_deeply(string2list($string), ['a|b', 'c']);

sub string2list {
my $string = shift;
$string =~ s/\\\|/{[[VerticalBar]]}/g;
my @elements = split /\|/, $string;
for(@elements) {
s/\{\[\[VerticalBar\]\]\}/|/g;
}
return \@elements;
}
 
D

Dave Weaver

Arne Ruhnau said:
Cheers,

is there some simpler/more elegant solution to the following problem:

given a string 'a\|b|c', transform it into the list ('a|b', 'c')

Here's my attempt, splitting using a negative lookbehind assertion,
i.e. only splitting on a | if it's not preceded by a \

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my @list = map { s/\\\|/|/g; $_ } split /(?<!\\)\|/, 'a\|b|c';
print Dumper \@list;

__END__

$VAR1 = [
'a|b',
'c'
];
 
A

Arne Ruhnau

Dave said:
Here's my attempt, splitting using a negative lookbehind assertion,
i.e. only splitting on a | if it's not preceded by a \

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my @list = map { s/\\\|/|/g; $_ } split /(?<!\\)\|/, 'a\|b|c';

Nice. I changed it to

map { s/\\(.)/$1/g; $_ } to be able to work with different quoted characters.

Thanks,

Arne Ruhnau
 
G

Gary E. Ansok

Arne Ruhnau said:
is there some simpler/more elegant solution to the following problem:
given a string 'a\|b|c', transform it into the list ('a|b', 'c')

Here's my attempt, splitting using a negative lookbehind assertion,
i.e. only splitting on a | if it's not preceded by a \

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my @list = map { s/\\\|/|/g; $_ } split /(?<!\\)\|/, 'a\|b|c';
print Dumper \@list;

__END__

$VAR1 = [
'a|b',
'c'
];

That runs into the question of how the string 'a\\|b|c' should be
treated -- should it result in [ 'a\', 'b', 'c' ] ? If so, then
a little more work needs to be put into the code.

(note for the nit-pickers: strings above indicate actual string
contents, not Perl string literals or Dumper output. Do we have
a convention for that?)

Gary
 
K

ko

Gary said:
Dave Weaver said:
Arne Ruhnau said:
is there some simpler/more elegant solution to the following problem:
given a string 'a\|b|c', transform it into the list ('a|b', 'c')

Here's my attempt, splitting using a negative lookbehind assertion,
i.e. only splitting on a | if it's not preceded by a \

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my @list = map { s/\\\|/|/g; $_ } split /(?<!\\)\|/, 'a\|b|c';
print Dumper \@list;

__END__

$VAR1 = [
'a|b',
'c'
];


That runs into the question of how the string 'a\\|b|c' should be
treated -- should it result in [ 'a\', 'b', 'c' ] ? If so, then
a little more work needs to be put into the code.

One possible way:

use strict;
use warnings;
use Text::parseWords;

chomp(my @data = <DATA>);
foreach my $line (@data) {
print join(' :: ', grep { $_ } parse_line('\|', 0, $line) ) . "\n";
}
__DATA__
a\|b|c
|a\|b|c
|a\\|b|c


HTH - keith
 
A

Arne Ruhnau

That runs into the question of how the string 'a\\|b|c' should be
treated -- should it result in [ 'a\', 'b', 'c' ] ? If so, then
a little more work needs to be put into the code.

It should be read from left to right, thus resulting in ['a\', 'b', 'c'].
After thinking all this over, I came to realize that there is nothing as
elegant as lexing my (a|b|c)-lists into something like

[OPENLIST],
[ELEMENT, a],[DELIMITER],[ELEMENT, b],[DELIMITER],[ELEMENT, c],
[CLOSELIST]

and then to parse it into ['a', 'b', 'c'] - which is what i need for the
rest of the language.

'a\\|b|c' would be lexed as

[ELEMENT, 'a\'], [DELIMITER], ...

I just had to define a lexer which takes care of backslash-quoted
characters, the rest would be simple enough.

The rest of my code to parse my simple language works this way, but naively
I thought it would be possible to simply hack (..|..)-constructs into it.
Too lazy, too much hybris and way to impatient...

I'll post my solution when it is done the way i think it should be done.

Thanks for your thoughts,

Arne Ruhnau
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top