m/(\/\*[.|\n]*\*\/)/ to try and match C-Style multiline comments. No matches found.

V

Vijai Kalyan

Hi,

I am learning perl so was working my way through the Camel book's
first chapter. I wrote the following program to match all C-style
multiline comments.

---------------
$filename = "testinput.txt";

-e $filename or die "Invalid file. File does not exist. $!\n";
-r $filename or die "Invalid file. File is not readable. $!\n";
-f $filename or die "Invalid file. File is not a regular file. $!\n";
-T $filename or die "Invalid file. File is not a text file. $!\n";

open (FILE,"<$filename");

while ($line = <FILE>){
if($line =~ m/(\/\*[.|\n]*\*\/)/){
print "Found comment $1\n";
}
}
---------------

For RE: m/(\/\*.*?\*\/)/

I got matches for multi-line comments containing no \n. For example:

/* ... */

but not

/*
*/

I looked up "." and found that . won't match a \n. So, I modified the
RE to

m/(\/\*[.|\n]*\*\/)/

but this didn't help either. If I got it right, the above RE says"

"Match a /*. After that match as many characters as possible but match
any character or newlines followed by a */"

So, this should match

/* ... */

as well as

/*
*/

but, I find that now, nothing is matched.

Any suggestions?

thanx,

-vijai.
 
P

Paul Lalli

Vijai said:
I am learning perl so was working my way through the Camel book's
first chapter. I wrote the following program to match all C-style
multiline comments.

---------------

For RE: m/(\/\*.*?\*\/)/

I got matches for multi-line comments containing no \n. For example:

/* ... */

but not

/*
*/

I looked up "." and found that . won't match a \n. So, I modified the
RE to

m/(\/\*[.|\n]*\*\/)/

but this didn't help either. If I got it right

Obviously, you didn't get it right. You're using a character class
which matches an actual period, vertical-bar, or a newline. Keep
reading the documentation in your Camel or in
perldoc perlretut
perldoc perlre

The correct way to get the period to match a newline is to add the /s
modifier onto your regexp.
Any suggestions?

1) Keep reading the documentation. (That's not a criticism, it's a
compliment - too many questioners in this group don't read and just
expect everyone to read the docs to them). Specifically, you need to
read up on character classes and regexp modifiers.
2) if you're doing this as a learning exercise, make . match the newline
the correct way.
3) If you're doing this for production code, don't bother rolling your
own solution - check the Perl FAQ before doing something like this:
perldoc -q comment

Paul Lalli
 
P

Paul Lalli

Vijai said:
I am learning perl so was working my way through the Camel book's
first chapter. I wrote the following program to match all C-style
multiline comments.
---------------

For RE: m/(\/\*.*?\*\/)/

I got matches for multi-line comments containing no \n. For example:

/* ... */

but not

/*
*/

I looked up "." and found that . won't match a \n. So, I modified the
RE to

m/(\/\*[.|\n]*\*\/)/

but this didn't help either. If I got it right


Obviously, you didn't get it right. You're using a character class
which matches an actual period, vertical-bar, or a newline. Keep
reading the documentation in your Camel or in
perldoc perlretut
perldoc perlre

The correct way to get the period to match a newline is to add the /s
modifier onto your regexp.
Any suggestions?

1) Keep reading the documentation. (That's not a criticism, it's a
compliment - too many questioners in this group don't read and just
expect everyone to read the docs to them). Specifically, you need to
read up on character classes and regexp modifiers.
2) if you're doing this as a learning exercise, make . match the newline
the correct way.
3) If you're doing this for production code, don't bother rolling your
own solution - check the Perl FAQ before doing something like this:
perldoc -q comment

Paul Lalli
 
V

Vijayaraghavan Kalyanapasupathy

Hello,
Obviously, you didn't get it right. You're using a character class
which matches an actual period, vertical-bar, or a newline. Keep
reading the documentation in your Camel or in
perldoc perlretut
perldoc perlre

Oh I see, I figured that meta-characters will work inside a character
class specification as well. My mistake.
The correct way to get the period to match a newline is to add the /s
modifier onto your regexp.


1) Keep reading the documentation. (That's not a criticism, it's a
compliment - too many questioners in this group don't read and just
expect everyone to read the docs to them). Specifically, you need to
read up on character classes and regexp modifiers.
2) if you're doing this as a learning exercise, make . match the newline
the correct way.

Yep. I am working my way through the Camel book; a couple hours of
experimenting every day. That's why I seem to rush off and write a
program without reading everything completely.
3) If you're doing this for production code, don't bother rolling your
own solution - check the Perl FAQ before doing something like this:
perldoc -q comment

Paul Lalli

Thanx for the comments,

-vijai.
 
V

Vijayaraghavan Kalyanapasupathy

Hello,
Well, as long as you read in the file line-by-line and match each line
separately, you will *never* find a multiline match.

Now, why didn't I think of that. Thanx for pointing it out.
I'd write the program as follows (untested!):

use strict; # I'm still learning, I need strict.
use warnings; # Cause I ain't perfect.

I will look this up. Maybe I should be using these too?

thanx for your example, but I will work my way to figuring it out on my
own. It wouldn't do to copy paste your example, although I understand
some of it, I am not familiar enuf with perl yet to try this on my own.

-vijai.
 
P

Paul Lalli

message
[Attirbutions added back in - please don't delete the bits that tell us
who said what]
Abigail wrote:

I will look this up. Maybe I should be using these too?

No maybe about it. You should always be using strict and warnings,
especially when teaching yourself Perl.

Paul Lalli
 
T

Tad McClellan

Oh I see, I figured that meta-characters will work inside a character
class specification as well. My mistake.


Meta-characters *do* work inside of a character class, but since it
is its own language distinct from the regular expression language
(and from the Perl language), which characters are meta is different.

There are only 4 meta-characters in a character class:

] ends the class if it is not the 1st char in the class

^ negates the class if it is the 1st char, else it is not meta

- forms a range unless it it 1st or last, in which case it
is not meta

\ used to remove the meta-ness of the other three



To compare a few meta-chars in the 3 languages:

. (dot, period, full-stop)

Perl: string concatenation
regex: matches any char except newline
char class: is not meta


^ (caret)

Perl: bitwise exclusive or
regex: matches beginning of string
char class: negates the class
 
T

Tad McClellan

Vijayaraghavan Kalyanapasupathy said:
I will look this up. Maybe I should be using these too?


No "maybe" about it.

They are Perl's "seatbelts". They help find common mistakes.

Wearing seatbelts is important all the time, but doubly so when
you are just learning how to drive. :)



You can read up on them via perldoc:

perldoc strict

perldoc warnings

perldoc perllexwarn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top