C
Charles Shannon Hendrix
I have been writing some code to parse log files, and I used regular
expressions to build arrays of fields. Those arrays were inserted
verbatim into an SQL insert command.
I assing the results of the regex to an array, like this:
@array = $line =~ /$rex_extract/x;
Then I found that some lines had a variable ending. There were three
possible endings:
"N" warnings
"N" errors
"N" errors, error code = "N"
At the same time, I want a regex failure on lines like this:
"N" warnings, "N" errors
"N" warnings, "N" errors, error code = "N"
"N" errors, "N" warnings
"N" errors, "N" warnings, error code = "N"
I found the following regex works and keeps my array in order so I don't
have to do ugly array parsing later:
<expressions for first N non-variable fields snipped>
(?:
(?:
,\s
"([0-9]+)" # number of...
\s
warnings # warnings
)?
|
(?:
,\s
"([0-9]+)" # number of...
\s
errors # errors
(?: # error code
,\s
error\scode\s=\s"([0-9]+)"
)?
)?
)
\s*$' # end of line
Question:
Do captures in failing non-capturing expressions always generate an
empty array position? I want to make sure I'm not depending on an
unreliable side effect.
The reason I like this is that it preserves the order in my array, so I
don't have to parse it to see which line ending was found.
I'm interested in seeing better ways of doing this.
I would also like a pointer to where this behavior is documented. I've
not been able to find an explicit mention.
expressions to build arrays of fields. Those arrays were inserted
verbatim into an SQL insert command.
I assing the results of the regex to an array, like this:
@array = $line =~ /$rex_extract/x;
Then I found that some lines had a variable ending. There were three
possible endings:
"N" warnings
"N" errors
"N" errors, error code = "N"
At the same time, I want a regex failure on lines like this:
"N" warnings, "N" errors
"N" warnings, "N" errors, error code = "N"
"N" errors, "N" warnings
"N" errors, "N" warnings, error code = "N"
I found the following regex works and keeps my array in order so I don't
have to do ugly array parsing later:
<expressions for first N non-variable fields snipped>
(?:
(?:
,\s
"([0-9]+)" # number of...
\s
warnings # warnings
)?
|
(?:
,\s
"([0-9]+)" # number of...
\s
errors # errors
(?: # error code
,\s
error\scode\s=\s"([0-9]+)"
)?
)?
)
\s*$' # end of line
Question:
Do captures in failing non-capturing expressions always generate an
empty array position? I want to make sure I'm not depending on an
unreliable side effect.
The reason I like this is that it preserves the order in my array, so I
don't have to parse it to see which line ending was found.
I'm interested in seeing better ways of doing this.
I would also like a pointer to where this behavior is documented. I've
not been able to find an explicit mention.