Help: How can I parse this properties file?

Y

yuanyun.ken

Hi, dear all perl users:
Recently I need read in a proerties file,
its format is key=value, and it uses \ to escape.
for example:
expression means: key value
a=b=c a b=c
a\=b=c a=b c
a\\=b=c a\ b=c
a\\\=b=c a\=b c

How can I parse this file?
I think it should use regex.
but my knowledage in Regular expression is poor.
any help is greatly appreciated.
 
J

Jürgen Exner

yuanyun.ken said:
Recently I need read in a proerties file,
its format is key=value, and it uses \ to escape.
for example:
expression means: key value
a=b=c a b=c
a\=b=c a=b c
a\\=b=c a\ b=c
a\\\=b=c a\=b c

How can I parse this file?
I think it should use regex.
but my knowledage in Regular expression is poor.

No need for complex REs. Just have the tokenizer walk through the file
character by character and when it finds a backslash then immeditately
read another character and return that literal character as the next
character in the token, no matter if it is a normal character, another
backslash, or an equal sign.

jue
 
T

Tad J McClellan

yuanyun.ken said:
Hi, dear all perl users:
Recently I need read in a proerties file,
its format is key=value, and it uses \ to escape.
for example:
expression means: key value
a=b=c a b=c
a\=b=c a=b c
a\\=b=c a\ b=c
a\\\=b=c a\=b c

How can I parse this file?


----------------------------
#!/usr/bin/perl
use warnings;
use strict;

foreach my $line ( <DATA> ) {
chomp $line;
$line =~ s/\\\\/&backslash;/g; # translate literal backslashes

my($key, $value) = split /(?<!\\)=/, $line, 2; # use negative look-behind

$key =~ tr/\\//d; # eliminate backslashes used for escaping

$key =~ s/&backslash;/\\/g; # put the literal backslashes back in

printf "%-10s %-10s\n", $key, $value;
}

__DATA__
a=b=c
a\=b=c
a\\=b=c
a\\\=b=c
 
T

Ted Zlatanov

TJM> #!/usr/bin/perl
TJM> use warnings;
TJM> use strict;

TJM> foreach my $line ( <DATA> ) {
TJM> chomp $line;
TJM> $line =~ s/\\\\/&backslash;/g; # translate literal backslashes

TJM> my($key, $value) = split /(?<!\\)=/, $line, 2; # use negative look-behind

TJM> $key =~ tr/\\//d; # eliminate backslashes used for escaping

TJM> $key =~ s/&backslash;/\\/g; # put the literal backslashes back in

TJM> printf "%-10s %-10s\n", $key, $value;
TJM> }

TJM> __DATA__
TJM> a=b=c
TJM> a\=b=c
TJM> a\\=b=c
TJM> a\\\=b=c
TJM> ----------------------------

I was thinking of a similar solution, but adding 256 (or some other
large number) to each escaped character (in case there's a '&backslash;'
in the data). As long as it's valid Unicode and the original data
doesn't contain Unicode characters it should be a clean translation.

Ted
 
M

Michele Dondi

TJM> $line =~ s/\\\\/&backslash;/g; # translate literal backslashes

TJM> my($key, $value) = split /(?<!\\)=/, $line, 2; # use negative look-behind

TJM> $key =~ tr/\\//d; # eliminate backslashes used for escaping

TJM> $key =~ s/&backslash;/\\/g; # put the literal backslashes back in

TJM> printf "%-10s %-10s\n", $key, $value;
TJM> }

TJM> __DATA__
TJM> a=b=c
TJM> a\=b=c
TJM> a\\=b=c
TJM> a\\\=b=c
TJM> ----------------------------

I was thinking of a similar solution, but adding 256 (or some other
large number) to each escaped character (in case there's a '&backslash;'
in the data). As long as it's valid Unicode and the original data
doesn't contain Unicode characters it should be a clean translation.

I like to be sure thus instead of adding "some other large number" I
actually *find* something that *can't* be there:

my $delim = "&". (sort @delims = $line =~ /&(\0+);/)[-1] . "\0;";
$line =~ s/\\\\/$delim;/g; # translate literal backslashes
# ...


Michele
 
M

Michele Dondi

I like to be sure thus instead of adding "some other large number" I
actually *find* something that *can't* be there:

my $delim = "&". (sort @delims = $line =~ /&(\0+);/)[-1] . "\0;";
$line =~ s/\\\\/$delim;/g; # translate literal backslashes

Sorry! That's what you get out of posting such in a hurry; I meant:

my $delim = "&". (sort "", $line =~ /&(\0+);/g)[-1] . "\0;";


Michele
 
C

cartercc

I was thinking of a similar solution, but adding 256 (or some other
large number) to each escaped character (in case there's a '&backslash;'
in the data).  As long as it's valid Unicode and the original data
doesn't contain Unicode characters it should be a clean translation.

I find absolutely nothing wrong with Tad's solution. The fact that it
^might^ be a little more verbose than necessary I regard as a mark in
its favor, not a mark against.

I might consider the string '\' rather than '&backslash;' but
that's a simple quibble.

Now, what about a one liner?

CC
 
S

sln

Hi, dear all perl users:
Recently I need read in a proerties file,
its format is key=value, and it uses \ to escape.
for example:
expression means: key value
a=b=c a b=c
a\=b=c a=b c
a\\=b=c a\ b=c
a\\\=b=c a\=b c

How can I parse this file?
I think it should use regex.
but my knowledage in Regular expression is poor.
any help is greatly appreciated.


Well, according to your template there, the valid
equal sign separating Key from Value is the first
non-escaped equal sign.

So yes there is a regular expression to do that.
You have data in the escaped form. It can be split up
into key and value using those rules in a regexp, then
unescape the key/val pair.

Below is probably no different than the other suggestions
in terms of how the split occurs using a regex.

Where does this sequence take you? Do you expect the value
side to be escaped?

These appear possible as well, is this something that will be
encountered?

a\\\\=b=c a\\ b=c
a\\\=b\\=c a\=b\ c
a\\=b\\=c a\ b\=c

Like the other possiblities, here is one more. Its hard to see how
you would get a simple one-step solution though. Maybe..

Good luck.
sln

# ** Original
# expression means: key value
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c

# ** Output
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c


use strict;
use warnings;

my @property = ();

foreach my $line ( <DATA> ) {
chomp $line;
push @property, $line if (length $line);
}

print "\nexpression means:\tkey\tvalue\n";

for (@property)
{
if (/^((?:(?:\\.)*?|.*?)+)=(.*)$/)
{
# unescape built in sequences
my ($key, $val) = ($1,$2);
$key =~ s/\\(.)/$1/g;
$val =~ s/\\(.)/$1/g;
printf "%-20s\t%s\t%s\n", $_, $key, $val;
}
}

__DATA__
a=b=c
a\=b=c
a\\=b=c
a\\\=b=c
a\\\\=b=c
a\\\=b\\=c
a\\=b\\=c
 
S

sln

Well, according to your template there, the valid
equal sign separating Key from Value is the first
non-escaped equal sign.

So yes there is a regular expression to do that.
You have data in the escaped form. It can be split up
into key and value using those rules in a regexp, then
unescape the key/val pair.

Below is probably no different than the other suggestions
in terms of how the split occurs using a regex.

Where does this sequence take you? Do you expect the value
side to be escaped?

These appear possible as well, is this something that will be
encountered?

a\\\\=b=c a\\ b=c
a\\\=b\\=c a\=b\ c
a\\=b\\=c a\ b\=c

Like the other possiblities, here is one more. Its hard to see how
you would get a simple one-step solution though. Maybe..

Good luck.
sln

# ** Original
# expression means: key value
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c

# ** Output
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c


use strict;
use warnings;
[snip]

You could minimalize it as well.

foreach ( <DATA> ) {
chomp;

if (/^((?:(?:\\.)*?|.*?)+)=(.*)$/)
{
# unescape built in sequences
my ($key, $val) = ($1,$2);
$key =~ s/\\(.)/$1/g;
$val =~ s/\\(.)/$1/g;
printf "%-20s\t%s\t%s\n", $_, $key, $val;
}
}
 
T

Ted Zlatanov

MD> On Wed, 05 Nov 2008 11:28:24 -0600, Ted Zlatanov <[email protected]>
MD> wrote:

TJM> $line =~ s/\\\\/&backslash;/g; # translate literal backslashesTJM> printf "%-10s %-10s\n", $key, $value;
TJM> }TJM> __DATA__
TJM> a=b=c
TJM> a\=b=c
TJM> a\\=b=c
TJM> a\\\=b=c
TJM> ----------------------------
MD> I like to be sure thus instead of adding "some other large number" I
MD> actually *find* something that *can't* be there:

MD> my $delim = "&". (sort @delims = $line =~ /&(\0+);/)[-1] . "\0;";
MD> $line =~ s/\\\\/$delim;/g; # translate literal backslashes
MD> # ...

Surely there's a CPAN module to do this... Or 10...

From the docs of Encode::Escape, there's also String::Escape,
Unicode::Escape, TeX::Encode, HTML::Mason::Escape,
Template::plugin::XML::Escape, URI::Escape.

Ted
 
Y

yuanyun.ken

Thanks for all the reply. and this problem has been solved.
but sorry for my poor understanding on regex, and having to trouble
you again,
here I have another little problem:
if the content ends with a real single backslash, I need read in the
next line.

How to use regex to do this?
for example:
line ends with match
\ yes
\\ no
\\\ yes
\\\\ no
Thanks for any help again.
 
T

Tad J McClellan

yuanyun.ken said:
Thanks for all the reply. and this problem has been solved.
but sorry for my poor understanding on regex, and having to trouble
you again,
here I have another little problem:
if the content ends with a real single backslash, I need read in the
next line.

How to use regex to do this?
for example:
line ends with match
\ yes
\\ no
\\\ yes
\\\\ no


-------------------
#!/usr/bin/perl
use warnings;
use strict;

foreach my $line ( <DATA> ) {
chomp $line;
if ( $line =~ /(\\+)$/ and length($1) % 2 )
{ print "yes\n" }
else
{ print "no\n" }
}

__DATA__
\
\\
\\\
\\\\
 
J

Jürgen Exner

yuanyun.ken said:
Thanks for all the reply. and this problem has been solved.
but sorry for my poor understanding on regex, and having to trouble
you again,
here I have another little problem:
if the content ends with a real single backslash, I need read in the
next line.

How to use regex to do this?

You don't. When the tokenizer discovers a backslash it just reads
another character and if that character is an EOL then just continue
processing the next line instead of reporting an end-of-token.

jue
 
F

fB

yuanyun.ken said:
Thanks for all the reply. and this problem has been solved.
but sorry for my poor understanding on regex, and having to trouble
you again,
here I have another little problem:
if the content ends with a real single backslash, I need read in the
next line.

How to use regex to do this?
for example:
line ends with match
\ yes
\\ no
\\\ yes
\\\\ no
Thanks for any help again.

I felt like doing your homework for you, and anyway it is not that difficult.

#!/usr/bin/perl

use strict;
use warnings;
use feature ':5.10';

while (<::DATA>) {
chomp;
printf '%10s', $_;
m{ (?: [^\\] | ^ ) # match a non-backslash character
# or the start of the string
( \\ (?:\\\\)* ) # match an odd number of backslashes
$ # followed by the end of the string
}xms
and say ' matched: '.$1
or say ' not matched';
}

exit;
__DATA__

\
\\
\\\
\\\\
\\\\\
\\\\\\
\\\\\\\
\\\\\\\\
a
a\
a\\
a\\\
a\\\\
a\\\\\
a\\\\\\
a\\\\\\\
a\\\\\\\\
\a
\\a
\\\a
\\\\a
__END__



--
 
S

sln

Thanks for all the reply. and this problem has been solved.
but sorry for my poor understanding on regex, and having to trouble
you again,
here I have another little problem:
if the content ends with a real single backslash, I need read in the
next line.

How to use regex to do this?
for example:
line ends with match
\ yes
\\ no
\\\ yes
\\\\ no
Thanks for any help again.

I assume this pertains to the rules set out on the properties
in the original problem statement.

Tad's solution to check then end for 'odd' number of '\' works best
for a line continuation.

Be very cautious!! If you are trying to find a way to fix random
line splits when this file was generated, there is absolutely
NO solution available to you at all !!!
The reason is you already have escaping rules in place

The line split must be intelligently constucted in that only
an odd number of '\' at the end will determine line continuation.
And at the same time be used in the general escaping rules after
it is joined.

You can't just add a '\' where you would like to split the line then
remove it later without counting the existing escapes at the end.
Either way it takes intelligence to construct the file given the
existing escaping rules you laid out for yourself.

Notice the places where the split occurs in DATA below..
Even if you had an intelligent generator that splits the
line on a '\', it could still split on an even boundry.
Or say it adds a complement to make the split odd, still,
even then, the original can not be guaranteed to reassemble
because this conflicts with the original escape logic..

There is no solution then!


sln

use strict;
use warnings;

# ** Original
# expression means: key value
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c

# ** Output
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c



my $buf = '';

print "\nexpression means:\tkey\tvalue\n";

foreach ( <DATA> ) {
chomp;
$_ = $buf . $_;

if ( /(\\+)$/ and length($1) % 2 ) {
# wouldn't want to do this -> s/\\$//;
$buf .= $_; # cat this line to buffer
next; # read next line
}
if (/^((?:(?:\\.)*?|.*?)+)=(.*)$/) {
# unescape built in sequences
my ($key, $val) = ($1,$2);
$key =~ s/\\(.)/$1/g;
$val =~ s/\\(.)/$1/g;
printf "%-20s\t%s\t%s\n", $_, $key, $val;
}
$buf = '';
}

__DATA__

# no line splits
a=b=c
a\=b=c
a\\=b=c
a\\\=b=c
a\\\\=b=c
a\\\=b\\=c
a\\=b\\=c

# ok line splits
a=b=c
a\
=b=c
a\
\=b=c
a\\\
=b=c
a\\\
\=b=c
a\\\=b\
\=c
a\\=b\
\=c

#some good/bad line splits
a=b=c
a\
=b=c
a\\
=b=c
a\\\
=b=c
a\\\\
=b=c
a\\\
=b\\=c
a\\=b\\
=c
 
S

sln

I assume this pertains to the rules set out on the properties
in the original problem statement.

Tad's solution to check then end for 'odd' number of '\' works best
for a line continuation.

Be very cautious!! If you are trying to find a way to fix random
line splits when this file was generated, there is absolutely
NO solution available to you at all !!!
The reason is you already have escaping rules in place

The line split must be intelligently constucted in that only
an odd number of '\' at the end will determine line continuation.
And at the same time be used in the general escaping rules after
it is joined.

You can't just add a '\' where you would like to split the line then
remove it later without counting the existing escapes at the end.
Either way it takes intelligence to construct the file given the
existing escaping rules you laid out for yourself.

Notice the places where the split occurs in DATA below..
Even if you had an intelligent generator that splits the
line on a '\', it could still split on an even boundry.
Or say it adds a complement to make the split odd, still,
even then, the original can not be guaranteed to reassemble
because this conflicts with the original escape logic..

There is no solution then!


sln

use strict;
use warnings;

# ** Original
# expression means: key value
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# ** Output
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c
# a=b=c a b=c
# a\=b=c a=b c
# =b=c b=c
# a\\\=b=c a\=b c
# =b=c b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\ a\ b\
# =c
 
T

Tad J McClellan

Dr.Ruud said:
Tad J McClellan schreef:


/(?<!\\)(?:\\\\)*\\$/


Which one do you want to figure out after not having
seen this program for six months?
 
D

Dr.Ruud

Tad J McClellan schreef:
Dr.Ruud:

Which one do you want to figure out after not having
seen this program for six months?

Probably this one:

# even number of trailing slashes
print /(?<!\\)(?:\\\\)*$/ ? "no" : "yes";
 
S

sln

I assume this pertains to the rules set out on the properties
in the original problem statement.

Tad's solution to check then end for 'odd' number of '\' works best
for a line continuation.

Be very cautious!! If you are trying to find a way to fix random
line splits when this file was generated, there is absolutely
NO solution available to you at all !!!
The reason is you already have escaping rules in place

The line split must be intelligently constucted in that only
an odd number of '\' at the end will determine line continuation.
And at the same time be used in the general escaping rules after
it is joined.

You can't just add a '\' where you would like to split the line then
remove it later without counting the existing escapes at the end.
Either way it takes intelligence to construct the file given the
existing escaping rules you laid out for yourself.

Notice the places where the split occurs in DATA below..
Even if you had an intelligent generator that splits the
line on a '\', it could still split on an even boundry.
Or say it adds a complement to make the split odd, still,
even then, the original can not be guaranteed to reassemble
because this conflicts with the original escape logic..

There is no solution then!


sln

use strict;
use warnings;

# ** Original
# expression means: key value
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c

# ** Output
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c



my $buf = '';

print "\nexpression means:\tkey\tvalue\n";

foreach ( <DATA> ) {
chomp;
$_ = $buf . $_;

if ( /(\\+)$/ and length($1) % 2 ) {
# wouldn't want to do this -> s/\\$//;
$buf .= $_; # cat this line to buffer
^^^^^^^^^^^
$buf = $_; # asign to buffer

# see what happens whey you don't test
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top