Peter said:
I am relative newbie to perl . i am reading programming perl to learn
perl. In the chanper on pattern matching I came across the following
sustitutions that I can't understand completely. It would be great if
someone could explain these.
Thanks in advance
a)
#put commas in the right place in an integer
1 while s/(\d) (\d\d\d) (?!\d)/$1,$2;
# what does this mean (?!\d) and what purpose does it serve
The correct form of the line is:
1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/;
The (?!\d) is what is known as a zero-width assertion. It means that
after the (\d) and the (\d\d\d) there is _not_ another \d. That it is a
"zero-width assertion" means that the thing it matches doesn't count as
part of the match; it's just checked.
Let's say that we are processing 12345678.
We try the match. The first thing that works is the '5' (which matches
'(\d)'), the '678' (which matches '(\d\d\d)') and the end, which is not
a \d.
That changes $_ to '12345,678'. Because the s/.../.../ worked, we
repeat the while. This time, the first thing that works is the '2'
(which matches '(\d)'), the '345' (which matches '(\d\d\d)'), and the
',', which is not a \d.
That changes $_ to '12,345,678. The comma after the '5' is not changed
because '(?!\d)' is a zero-width assertion, and therefore doesn't count
as part of the match, and therefore is not part of what is replaced.
Because the s/.../.../ worked, we repeat the match a third time, but
there isn't another match, and so the while terminates.
b)
#remove (nested (even deeply nested (like this))) remarks
1 while s/\([^()]*\)//g;
# why escape the first ( and second ), what about the ( or ) in
between
The escapes are there to indicate that they are literal parentheses to
be scanned for, not grouping operators in regular-expression language.
The escapes are not within the [] because parentheses have no meaning
within [], and are therefore automatically taken as literal.
To expand, the regular expression means this:
Match on a (, followed by zero or more characters that are not ( or ),
followed by a ).
The first time, we get "remove (nested (even deeply nested )) remarks".
The second time, we get "remove (nested ) remarks".
The third time, we get "remove remarks".
The fourth time, there is no match, and the while terminates.