W
Wes Groleau
I have a script to process a certain file format.
It was working at one time, but it doesn't now.
Obviously I changed something important, but
I have no memory of doing so, nor can I see anything wrong.
The input format: Each line has up to four parts:
Level, ID, Tag, and Text; separated by white space.
White space is optional _before_ the level.
ID is optional; if present, it is the second part.
It is printable characters, starting and ending
with '@'
Tag is always there. It is a single "word" of
letters, no white space. For simplicity, I am
going for any non-whitespace characters.
Text is optional and is everything after the
tag.
Here is what is not working:
@lines = <STDIN>; # Read all lines into array @lines
foreach (@lines) # Each line is placed into predefined scalar $_
{
# Get rid of CR/LF chars for originating platform.
# (So this platform's style can be put back on later.)
chomp;
s/[\r\n]//;
($Level, $ID, $Tag, $XRef, $Text, $Comment) = ("", "", "", "", "", "");
# Get the component parts of the line.
# Can UTF-8 input break this regexp???
#($Level, $ID, $Tag, $XRef, $Text, $Comment) =
# --- ----- --- ----- -- --
#/^\s*(\d+)\s+(@\S+@)?\s*(\S+)s+(@\S+@)?\s+(.*)?({{.*}})?/;
# 1 2 3 4 5 6
# Let's just use four (standard GEDCOM) for now.
# 1 2 3 4
/^\s*(\d+)\s+(@\S+@)?\s*(\S+)\s+(.*)/;
# Get level of subrecord
if ( ! defined ( $1 ) )
{
$Level = 0;
$LineEOL = "NO LEVEL: $_\n";
}
else
{
$Level = $1;
}
print $Level . "\n"; # This part (Level) seems to work
# Save the label (if specified)
if ( ! defined ( $2 ) )
{
$ID = "";
}
else
{
$ID = $2;
}
print $ID . "\n";
# Uppercase the tag
if ( ! defined ( $3 ) )
{
$Tag = "";
$LineEOL = "NO TAG: $_\n";
}
else
{
$Tag = $3;
}
print $Tag . "\n"; # When input line is "0 HEAD" this should be "HEAD"
# but it is blank instead and the diagnostic "NO TAG"
# is added.
# Save everything else
if ( ! defined ( $4 ) )
{
$Text = "";
}
else
{
$Text = $4;
}
print $Text . "\n";
It was working at one time, but it doesn't now.
Obviously I changed something important, but
I have no memory of doing so, nor can I see anything wrong.
The input format: Each line has up to four parts:
Level, ID, Tag, and Text; separated by white space.
White space is optional _before_ the level.
ID is optional; if present, it is the second part.
It is printable characters, starting and ending
with '@'
Tag is always there. It is a single "word" of
letters, no white space. For simplicity, I am
going for any non-whitespace characters.
Text is optional and is everything after the
tag.
Here is what is not working:
@lines = <STDIN>; # Read all lines into array @lines
foreach (@lines) # Each line is placed into predefined scalar $_
{
# Get rid of CR/LF chars for originating platform.
# (So this platform's style can be put back on later.)
chomp;
s/[\r\n]//;
($Level, $ID, $Tag, $XRef, $Text, $Comment) = ("", "", "", "", "", "");
# Get the component parts of the line.
# Can UTF-8 input break this regexp???
#($Level, $ID, $Tag, $XRef, $Text, $Comment) =
# --- ----- --- ----- -- --
#/^\s*(\d+)\s+(@\S+@)?\s*(\S+)s+(@\S+@)?\s+(.*)?({{.*}})?/;
# 1 2 3 4 5 6
# Let's just use four (standard GEDCOM) for now.
# 1 2 3 4
/^\s*(\d+)\s+(@\S+@)?\s*(\S+)\s+(.*)/;
# Get level of subrecord
if ( ! defined ( $1 ) )
{
$Level = 0;
$LineEOL = "NO LEVEL: $_\n";
}
else
{
$Level = $1;
}
print $Level . "\n"; # This part (Level) seems to work
# Save the label (if specified)
if ( ! defined ( $2 ) )
{
$ID = "";
}
else
{
$ID = $2;
}
print $ID . "\n";
# Uppercase the tag
if ( ! defined ( $3 ) )
{
$Tag = "";
$LineEOL = "NO TAG: $_\n";
}
else
{
$Tag = $3;
}
print $Tag . "\n"; # When input line is "0 HEAD" this should be "HEAD"
# but it is blank instead and the diagnostic "NO TAG"
# is added.
# Save everything else
if ( ! defined ( $4 ) )
{
$Text = "";
}
else
{
$Text = $4;
}
print $Text . "\n";