Help: How can I parse this properties file?

yuanyun.ken · Nov 5, 2008

Hi, dear all perl users:
Recently I need read in a proerties file,
its format is key=value, and it uses \ to escape.
for example:
expression means: key value
a=b=c a b=c
a\=b=c a=b c
a\\=b=c a\ b=c
a\\\=b=c a\=b c

How can I parse this file?
I think it should use regex.
but my knowledage in Regular expression is poor.
any help is greatly appreciated.

Jürgen Exner · Nov 5, 2008

yuanyun.ken said:
Recently I need read in a proerties file,
its format is key=value, and it uses \ to escape.
for example:
expression means: key value
a=b=c a b=c
a\=b=c a=b c
a\\=b=c a\ b=c
a\\\=b=c a\=b c

How can I parse this file?
I think it should use regex.
but my knowledage in Regular expression is poor.

No need for complex REs. Just have the tokenizer walk through the file
character by character and when it finds a backslash then immeditately
read another character and return that literal character as the next
character in the token, no matter if it is a normal character, another
backslash, or an equal sign.

jue

Tad J McClellan · Nov 5, 2008

yuanyun.ken said:
Hi, dear all perl users:
Recently I need read in a proerties file,
its format is key=value, and it uses \ to escape.
for example:
expression means: key value
a=b=c a b=c
a\=b=c a=b c
a\\=b=c a\ b=c
a\\\=b=c a\=b c

How can I parse this file?

----------------------------
#!/usr/bin/perl
use warnings;
use strict;

foreach my $line ( <DATA> ) {
chomp $line;
$line =~ s/\\\\/&backslash;/g; # translate literal backslashes

my($key, $value) = split /(?<!\\)=/, $line, 2; # use negative look-behind

$key =~ tr/\\//d; # eliminate backslashes used for escaping

$key =~ s/&backslash;/\\/g; # put the literal backslashes back in

printf "%-10s %-10s\n", $key, $value;
}

__DATA__
a=b=c
a\=b=c
a\\=b=c
a\\\=b=c

Ted Zlatanov · Nov 5, 2008

TJM> #!/usr/bin/perl
TJM> use warnings;
TJM> use strict;

TJM> foreach my $line ( <DATA> ) {
TJM> chomp $line;
TJM> $line =~ s/\\\\/&backslash;/g; # translate literal backslashes

TJM> my($key, $value) = split /(?<!\\)=/, $line, 2; # use negative look-behind

TJM> $key =~ tr/\\//d; # eliminate backslashes used for escaping

TJM> $key =~ s/&backslash;/\\/g; # put the literal backslashes back in

TJM> printf "%-10s %-10s\n", $key, $value;
TJM> }

TJM> __DATA__
TJM> a=b=c
TJM> a\=b=c
TJM> a\\=b=c
TJM> a\\\=b=c
TJM> ----------------------------

I was thinking of a similar solution, but adding 256 (or some other
large number) to each escaped character (in case there's a '&backslash;'
in the data). As long as it's valid Unicode and the original data
doesn't contain Unicode characters it should be a clean translation.

Ted

Michele Dondi · Nov 6, 2008

TJM> $line =~ s/\\\\/&backslash;/g; # translate literal backslashes

TJM> my($key, $value) = split /(?<!\\)=/, $line, 2; # use negative look-behind

TJM> $key =~ tr/\\//d; # eliminate backslashes used for escaping

TJM> $key =~ s/&backslash;/\\/g; # put the literal backslashes back in

TJM> printf "%-10s %-10s\n", $key, $value;
TJM> }

TJM> __DATA__
TJM> a=b=c
TJM> a\=b=c
TJM> a\\=b=c
TJM> a\\\=b=c
TJM> ----------------------------

I was thinking of a similar solution, but adding 256 (or some other
large number) to each escaped character (in case there's a '&backslash;'
in the data). As long as it's valid Unicode and the original data
doesn't contain Unicode characters it should be a clean translation.

I like to be sure thus instead of adding "some other large number" I
actually *find* something that *can't* be there:

my $delim = "&". (sort @delims = $line =~ /&(\0+);/)[-1] . "\0;";
$line =~ s/\\\\/$delim;/g; # translate literal backslashes
# ...

Michele

Michele Dondi · Nov 6, 2008

I like to be sure thus instead of adding "some other large number" I
actually *find* something that *can't* be there:

my $delim = "&". (sort @delims = $line =~ /&(\0+);/)[-1] . "\0;";
$line =~ s/\\\\/$delim;/g; # translate literal backslashes

Sorry! That's what you get out of posting such in a hurry; I meant:

my $delim = "&". (sort "", $line =~ /&(\0+);/g)[-1] . "\0;";

Michele

cartercc · Nov 6, 2008

I was thinking of a similar solution, but adding 256 (or some other
large number) to each escaped character (in case there's a '&backslash;'
in the data). As long as it's valid Unicode and the original data
doesn't contain Unicode characters it should be a clean translation.

I find absolutely nothing wrong with Tad's solution. The fact that it
^might^ be a little more verbose than necessary I regard as a mark in
its favor, not a mark against.

I might consider the string '\' rather than '&backslash;' but
that's a simple quibble.

Now, what about a one liner?

CC

sln · Nov 6, 2008

Hi, dear all perl users:
Recently I need read in a proerties file,
its format is key=value, and it uses \ to escape.
for example:
expression means: key value
a=b=c a b=c
a\=b=c a=b c
a\\=b=c a\ b=c
a\\\=b=c a\=b c

How can I parse this file?
I think it should use regex.
but my knowledage in Regular expression is poor.
any help is greatly appreciated.

Well, according to your template there, the valid
equal sign separating Key from Value is the first
non-escaped equal sign.

So yes there is a regular expression to do that.
You have data in the escaped form. It can be split up
into key and value using those rules in a regexp, then
unescape the key/val pair.

Below is probably no different than the other suggestions
in terms of how the split occurs using a regex.

Where does this sequence take you? Do you expect the value
side to be escaped?

These appear possible as well, is this something that will be
encountered?

a\\\\=b=c a\\ b=c
a\\\=b\\=c a\=b\ c
a\\=b\\=c a\ b\=c

Like the other possiblities, here is one more. Its hard to see how
you would get a simple one-step solution though. Maybe..

Good luck.
sln

# ** Original
# expression means: key value
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c

# ** Output
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c

use strict;
use warnings;

my @property = ();

foreach my $line ( <DATA> ) {
chomp $line;
push @property, $line if (length $line);
}

print "\nexpression means:\tkey\tvalue\n";

for (@property)
{
if (/^((?

?:\\.)*?|.*?)+)=(.*)$/)
{
# unescape built in sequences
my ($key, $val) = ($1,$2);
$key =~ s/\\(.)/$1/g;
$val =~ s/\\(.)/$1/g;
printf "%-20s\t%s\t%s\n", $_, $key, $val;
}
}

__DATA__
a=b=c
a\=b=c
a\\=b=c
a\\\=b=c
a\\\\=b=c
a\\\=b\\=c
a\\=b\\=c

sln · Nov 6, 2008

Well, according to your template there, the valid
equal sign separating Key from Value is the first
non-escaped equal sign.

So yes there is a regular expression to do that.
You have data in the escaped form. It can be split up
into key and value using those rules in a regexp, then
unescape the key/val pair.

Below is probably no different than the other suggestions
in terms of how the split occurs using a regex.

Where does this sequence take you? Do you expect the value
side to be escaped?

These appear possible as well, is this something that will be
encountered?

a\\\\=b=c a\\ b=c
a\\\=b\\=c a\=b\ c
a\\=b\\=c a\ b\=c

Like the other possiblities, here is one more. Its hard to see how
you would get a simple one-step solution though. Maybe..

Good luck.
sln

# ** Original
# expression means: key value
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c

# ** Output
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c

use strict;
use warnings;

[snip]

You could minimalize it as well.

foreach ( <DATA> ) {
chomp;

if (/^((?

?:\\.)*?|.*?)+)=(.*)$/)
{
# unescape built in sequences
my ($key, $val) = ($1,$2);
$key =~ s/\\(.)/$1/g;
$val =~ s/\\(.)/$1/g;
printf "%-20s\t%s\t%s\n", $_, $key, $val;
}
}

Ted Zlatanov · Nov 7, 2008

MD> On Wed, 05 Nov 2008 11:28:24 -0600, Ted Zlatanov <[email protected]>
MD> wrote:

TJM> $line =~ s/\\\\/&backslash;/g; # translate literal backslashesTJM> printf "%-10s %-10s\n", $key, $value;
TJM> }TJM> __DATA__
TJM> a=b=c
TJM> a\=b=c
TJM> a\\=b=c
TJM> a\\\=b=c
TJM> ----------------------------
MD> I like to be sure thus instead of adding "some other large number" I
MD> actually *find* something that *can't* be there:

MD> my $delim = "&". (sort @delims = $line =~ /&(\0+);/)[-1] . "\0;";
MD> $line =~ s/\\\\/$delim;/g; # translate literal backslashes
MD> # ...

Surely there's a CPAN module to do this... Or 10...

From the docs of Encode::Escape, there's also String::Escape,
Unicode::Escape, TeX::Encode, HTML::Mason::Escape,
Template:

lugin::XML::Escape, URI::Escape.

Ted

yuanyun.ken · Nov 8, 2008

Thanks for all the reply. and this problem has been solved.
but sorry for my poor understanding on regex, and having to trouble
you again,
here I have another little problem:
if the content ends with a real single backslash, I need read in the
next line.

How to use regex to do this?
for example:
line ends with match
\ yes
\\ no
\\\ yes
\\\\ no
Thanks for any help again.

Tad J McClellan · Nov 8, 2008

yuanyun.ken said:
Thanks for all the reply. and this problem has been solved.
but sorry for my poor understanding on regex, and having to trouble
you again,
here I have another little problem:
if the content ends with a real single backslash, I need read in the
next line.

How to use regex to do this?
for example:
line ends with match
\ yes
\\ no
\\\ yes
\\\\ no

-------------------
#!/usr/bin/perl
use warnings;
use strict;

foreach my $line ( <DATA> ) {
chomp $line;
if ( $line =~ /(\\+)$/ and length($1) % 2 )
{ print "yes\n" }
else
{ print "no\n" }
}

__DATA__
\
\\
\\\
\\\\

Jürgen Exner · Nov 8, 2008

yuanyun.ken said:
Thanks for all the reply. and this problem has been solved.
but sorry for my poor understanding on regex, and having to trouble
you again,
here I have another little problem:
if the content ends with a real single backslash, I need read in the
next line.

How to use regex to do this?

You don't. When the tokenizer discovers a backslash it just reads
another character and if that character is an EOL then just continue
processing the next line instead of reporting an end-of-token.

jue

fB · Nov 8, 2008

yuanyun.ken said:
Thanks for all the reply. and this problem has been solved.
but sorry for my poor understanding on regex, and having to trouble
you again,
here I have another little problem:
if the content ends with a real single backslash, I need read in the
next line.

How to use regex to do this?
for example:
line ends with match
\ yes
\\ no
\\\ yes
\\\\ no
Thanks for any help again.

I felt like doing your homework for you, and anyway it is not that difficult.

#!/usr/bin/perl

use strict;
use warnings;
use feature ':5.10';

while (<:

ATA>) {
chomp;
printf '%10s', $_;
m{ (?: [^\\] | ^ ) # match a non-backslash character
# or the start of the string
( \\ (?:\\\\)* ) # match an odd number of backslashes
$ # followed by the end of the string
}xms
and say ' matched: '.$1
or say ' not matched';
}

exit;
__DATA__

\
\\
\\\
\\\\
\\\\\
\\\\\\
\\\\\\\
\\\\\\\\
a
a\
a\\
a\\\
a\\\\
a\\\\\
a\\\\\\
a\\\\\\\
a\\\\\\\\
\a
\\a
\\\a
\\\\a
__END__

--

sln · Nov 8, 2008

Thanks for all the reply. and this problem has been solved.
but sorry for my poor understanding on regex, and having to trouble
you again,
here I have another little problem:
if the content ends with a real single backslash, I need read in the
next line.

How to use regex to do this?
for example:
line ends with match
\ yes
\\ no
\\\ yes
\\\\ no
Thanks for any help again.

I assume this pertains to the rules set out on the properties
in the original problem statement.

Tad's solution to check then end for 'odd' number of '\' works best
for a line continuation.

Be very cautious!! If you are trying to find a way to fix random
line splits when this file was generated, there is absolutely
NO solution available to you at all !!!
The reason is you already have escaping rules in place

The line split must be intelligently constucted in that only
an odd number of '\' at the end will determine line continuation.
And at the same time be used in the general escaping rules after
it is joined.

You can't just add a '\' where you would like to split the line then
remove it later without counting the existing escapes at the end.
Either way it takes intelligence to construct the file given the
existing escaping rules you laid out for yourself.

Notice the places where the split occurs in DATA below..
Even if you had an intelligent generator that splits the
line on a '\', it could still split on an even boundry.
Or say it adds a complement to make the split odd, still,
even then, the original can not be guaranteed to reassemble
because this conflicts with the original escape logic..

There is no solution then!

sln

use strict;
use warnings;

# ** Original
# expression means: key value
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c

# ** Output
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c

my $buf = '';

print "\nexpression means:\tkey\tvalue\n";

foreach ( <DATA> ) {
chomp;
$_ = $buf . $_;

if ( /(\\+)$/ and length($1) % 2 ) {
# wouldn't want to do this -> s/\\$//;
$buf .= $_; # cat this line to buffer
next; # read next line
}
if (/^((?

?:\\.)*?|.*?)+)=(.*)$/) {
# unescape built in sequences
my ($key, $val) = ($1,$2);
$key =~ s/\\(.)/$1/g;
$val =~ s/\\(.)/$1/g;
printf "%-20s\t%s\t%s\n", $_, $key, $val;
}
$buf = '';
}

__DATA__

# no line splits
a=b=c
a\=b=c
a\\=b=c
a\\\=b=c
a\\\\=b=c
a\\\=b\\=c
a\\=b\\=c

# ok line splits
a=b=c
a\
=b=c
a\
\=b=c
a\\\
=b=c
a\\\
\=b=c
a\\\=b\
\=c
a\\=b\
\=c

#some good/bad line splits
a=b=c
a\
=b=c
a\\
=b=c
a\\\
=b=c
a\\\\
=b=c
a\\\
=b\\=c
a\\=b\\
=c

sln · Nov 8, 2008

I assume this pertains to the rules set out on the properties
in the original problem statement.

Tad's solution to check then end for 'odd' number of '\' works best
for a line continuation.

Be very cautious!! If you are trying to find a way to fix random
line splits when this file was generated, there is absolutely
NO solution available to you at all !!!
The reason is you already have escaping rules in place

The line split must be intelligently constucted in that only
an odd number of '\' at the end will determine line continuation.
And at the same time be used in the general escaping rules after
it is joined.

You can't just add a '\' where you would like to split the line then
remove it later without counting the existing escapes at the end.
Either way it takes intelligence to construct the file given the
existing escaping rules you laid out for yourself.

Notice the places where the split occurs in DATA below..
Even if you had an intelligent generator that splits the
line on a '\', it could still split on an even boundry.
Or say it adds a complement to make the split odd, still,
even then, the original can not be guaranteed to reassemble
because this conflicts with the original escape logic..

There is no solution then!

sln

use strict;
use warnings;

# ** Original
# expression means: key value
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c

# ** Output
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c
# a=b=c a b=c
# a\=b=c a=b c
# =b=c b=c
# a\\\=b=c a\=b c
# =b=c b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\ a\ b\
# =c

Dr.Ruud · Nov 9, 2008

Tad J McClellan schreef:

if ( $line =~ /(\\+)$/ and length($1) % 2 )
{ print "yes\n" }
else
{ print "no\n" }
}

/(?<!\\)(?:\\\\)*\\$/

Tad J McClellan · Nov 9, 2008

Dr.Ruud said:
Tad J McClellan schreef:

/(?<!\\)(?:\\\\)*\\$/

Which one do you want to figure out after not having
seen this program for six months?

Dr.Ruud · Nov 9, 2008

Tad J McClellan schreef:

Dr.Ruud:

Which one do you want to figure out after not having
seen this program for six months?

Probably this one:

# even number of trailing slashes
print /(?<!\\)(?:\\\\)*$/ ? "no" : "yes";

sln · Nov 9, 2008

I assume this pertains to the rules set out on the properties
in the original problem statement.

Tad's solution to check then end for 'odd' number of '\' works best
for a line continuation.

Be very cautious!! If you are trying to find a way to fix random
line splits when this file was generated, there is absolutely
NO solution available to you at all !!!
The reason is you already have escaping rules in place

The line split must be intelligently constucted in that only
an odd number of '\' at the end will determine line continuation.
And at the same time be used in the general escaping rules after
it is joined.

You can't just add a '\' where you would like to split the line then
remove it later without counting the existing escapes at the end.
Either way it takes intelligence to construct the file given the
existing escaping rules you laid out for yourself.

Notice the places where the split occurs in DATA below..
Even if you had an intelligent generator that splits the
line on a '\', it could still split on an even boundry.
Or say it adds a complement to make the split odd, still,
even then, the original can not be guaranteed to reassemble
because this conflicts with the original escape logic..

There is no solution then!

sln

use strict;
use warnings;

# ** Original
# expression means: key value
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c

# ** Output
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c
# a=b=c a b=c
# a\=b=c a=b c
# a\\=b=c a\ b=c
# a\\\=b=c a\=b c
# a\\\\=b=c a\\ b=c
# a\\\=b\\=c a\=b\ c
# a\\=b\\=c a\ b\=c

my $buf = '';

print "\nexpression means:\tkey\tvalue\n";

foreach ( <DATA> ) {
chomp;
$_ = $buf . $_;

if ( /(\\+)$/ and length($1) % 2 ) {
# wouldn't want to do this -> s/\\$//;
$buf .= $_; # cat this line to buffer

^^^^^^^^^^^
$buf = $_; # asign to buffer

# see what happens whey you don't test

How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
Need help with this Python code.	2	Jun 13, 2023
Help please	8	Jul 7, 2023
How do I save information from an GUI into a XML-file?	0	Aug 17, 2022
How to change key name in json file with python	0	Oct 2, 2022
How to put loop result in csv file	1	Jan 3, 2023
How can I parse this correctly?	0	Apr 6, 2014

Help: How can I parse this properties file?

yuanyun.ken

Jürgen Exner

Tad J McClellan

Ted Zlatanov

Michele Dondi

Michele Dondi

cartercc

sln

sln

Ted Zlatanov

yuanyun.ken

Tad J McClellan

Jürgen Exner

fB

sln

sln

Dr.Ruud

Tad J McClellan

Dr.Ruud

sln

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads