Matching a variable number of bytes

M

mikeharrison56

Hello group, I'm converting a very non-structured binary log file (lots
of variable length records, imbedded in arrays, etc.) and I have a
pattern matching question. To do the conversion I run a set of
substitutions on the data. One substitution I would like to do is to
add a start indicator and then an end indicator around a variable
length array of again variable length records. I.e. add a start
indicator and gobble up a variable number of bytes and then add an end
indicator. I successfully add start indicators of the form:
<record_N length=nnnnn>
Where nnnnn is the number of bytes in the record to follow. I want to
enclose the record with an end indicator of the form </record_N>. I've
tried various forms of matches looking like:
my $record_N_start = '<record_N length=';
my $record_N_end = '</record_N>';
s/($record_N_start)(\d+)(.{\2+4})/$1$2$3$record_N_end/sg;

Any suggestions on why this does not work, or alternate substitutions?
 
X

xhoster

Hello group, I'm converting a very non-structured binary log file (lots
of variable length records, imbedded in arrays, etc.) and I have a
pattern matching question. To do the conversion I run a set of
substitutions on the data. One substitution I would like to do is to
add a start indicator and then an end indicator around a variable
length array of again variable length records. I.e. add a start
indicator and gobble up a variable number of bytes and then add an end
indicator. I successfully add start indicators of the form:
<record_N length=nnnnn>

Why parse it out, add the start tag, then reparse with the start tag?
Why not just add the end tag at the same time you add the start tag?

Where nnnnn is the number of bytes in the record to follow. I want to
enclose the record with an end indicator of the form </record_N>. I've
tried various forms of matches looking like:
my $record_N_start = '<record_N length=';
my $record_N_end = '</record_N>';
s/($record_N_start)(\d+)(.{\2+4})/$1$2$3$record_N_end/sg;

Any suggestions on why this does not work,

You can't do arithmetic or backreferences inside curlies. So the curlies
above are interpreted as literal '{' and '}' characters.
or alternate substitutions?

The same question was asked here a few weeks ago. I don't remember if
someone came up with a direct workaround or not, as I wouldn't go about
doing it this way at all in the first place.

google groups for "Regex: Backreferences do not work inside quantifiers?"

Xho
 
R

robic0

Hello group, I'm converting a very non-structured binary log file (lots
of variable length records, imbedded in arrays, etc.) and I have a
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I won't assume I know what you are talking about. Nothing can procede untill
this statement is explained in either another context or greater detail...
This statement represents a conceptual error, whithout resolution, kill's
anything past it. Any responder is in error!!
pattern matching question.

[snip}
 
M

mikeharrison56

This statement represents a conceptual error

Wow, this group is tough. I used the wrong words, it should have been
something like each record is variable length with hard to find flags
and sync bytes to determine which record I'm dealing with.
Why parse it out, add the start tag, then reparse with the start tag?
Why not just add the end tag at the same time you add the start tag?

Yes, this would be great if I could do it. Here is what I would like
to do:
$record_N_start = '<record_N length=';
$record_N_end = '</record_N>\n';
s/(.)(.)\x33\x00(.{\1*2**8+\2-4})/$record_N_start\1\2>\n\3$record_N_end/sg;


I would then substitude out the binary length. I'm currently doing
this in 2 steps with a substitude with sprintf.

A question: Why are variables allowed in {} qualifiers but not
backreferences and numerical expressions?
 
P

Peter J. Holzer

Yes, this would be great if I could do it. Here is what I would like
to do:
$record_N_start = '<record_N length=';
$record_N_end = '</record_N>\n';
s/(.)(.)\x33\x00(
{\1*2**8+\2-4})/$record_N_start\1\2>\n\3$record_N_end/sg;

You have to do the replacement in-string? I'd just read the records and
do the replacement as I go:

$/ = "\x33\x00(";
while (<>) {
if (/(.*)(..)\Q$/\E$/) {
my $len = ord($2) * 256 + ord(substr($2, 1)) - 4;
read(F, $buf, $len);
read(F, $paren, 1); die unless $paren eq ')';
print "$1<record_N length=$len>$bug</record_N>\n";
} else {
print;
}
}

I would then substitude out the binary length. I'm currently doing
this in 2 steps with a substitude with sprintf.

A question: Why

You would have to ask Larry.
are variables allowed in {} qualifiers

Variables are interpolated in reqexps just like[0] in double-quoted
strings. If you use $foo anywhere in a regexp, it will be replaces with
its value. It doesn't have anything to do with {} qualifiers.
but not backreferences

These are not known at the time the regexp is compiled.
and numerical expressions?

Why are numerical expressions not allowed in double-quoted strings?

hp

[0] Somebody will surely jump in now and explain the differences ;-)
 
M

Matt Garrish

Wow, this group is tough. I used the wrong words, it should have been
something like each record is variable length with hard to find flags
and sync bytes to determine which record I'm dealing with.

Don't waste the effort. You're dealing with a troll.

Matt
 
X

xhoster

A question: Why are variables allowed in {} qualifiers but not
backreferences and numerical expressions?

Variables are not allowed in {}. /fooa{$x}bar/ is first interpreted as
a double-quotish string constructor, so the variable will be interpolated
the way double-quotish strings are interpolated. Only after this
interpolation does the regex get compiled. As far as the regex is
concerned there is no variable there.

Xho
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,173
Latest member
GeraldReund
Top