Regular expression help

L

lochuanjiang

Hi all,

I'm new to perl and i was looking through this piece of code

But can't seems to figure out what it means.

if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )

Need some gurus help here.

Does it means. if $fileline contains tt regular expression? <input with
any number of characters and contains = "or' hidden 'or" and any number
of characters till >

appreciate any help
 
P

Paul Lalli

I'm new to perl and i was looking through this piece of code

But can't seems to figure out what it means.


You need to read some documentation:
perldoc perlretut
perldoc perlre
perldoc perlreref
if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )

Need some gurus help here.

No you don't. You need the most basic level of knowledge about regular
expressions. You do not need a "guru".
Does it means. if $fileline contains tt regular expression?

I have no idea what you mean by "tt".
<input with
any number of characters and contains = "or' hidden 'or" and any number
of characters till >

I can't parse what you're description is, so I can't tell you if you're
right or wrong.
if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )
The regular expression will match if $fileline contains the pattern:
a less-than sign
input
anything (other than a newline)
any amount of whitespace
type
any amount of whitespace
an equals sign
any number of whitespace, single quotes, or double quotes
hidden
anything (other than a newline)
any amount of whitespace
a greater than sign

Note that this is an incredibly poorly written regular expression. If
the goal is to find all HTML input tags of type "hidden", this will
both match things it shouldn't and not match things it should. I
strongly suggest you do not use this, and throw away whatever source
you found it from.

For real HTML parsing, use an HTML parser:
Search search.cpan.org for "HTML::parser"

Paul Lalli
 
D

David Squire

Hi all,

I'm new to perl and i was looking through this piece of code

But can't seems to figure out what it means.

if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )

Need some gurus help here.

Does it means. if $fileline contains tt regular expression?

if $fileline *matches* the regular expression.
<input with
any number of characters and contains = "or' hidden 'or" and any number
of characters till >

appreciate any help

Here's an example script showing some of the things that match, and one
that doesn't:

#!/usr/bin/perl
use strict;
use warnings;

while (my $fileline = <DATA>) {
print "$fileline";
if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i ) {
print "\tMatch\n";
}
else {
print "\tNo Match\n";
}
}

__DATA__
<INPUTFROG type="HiDdENDUCK >
<inputCattype= HiDdENBAT >
<iNpUtDogtype=HiDdENBAT>
<INPUTCattype=-HiDdENBAT>

Output:
<INPUTFROG type="HiDdENDUCK >
Match
<inputCattype= HiDdENBAT >
Match
<iNpUtDogtype=HiDdENBAT>
Match
<INPUTCattype=-HiDdENBAT>
No Match


Somehow I doubt that that RE is really doing what its author wanted...

See perldoc perlre for an explanation of why this works the way it does.

DS

PS. There's no need to escape the '<' and '>' characters in the RE.
 
B

Brian Wakem

David said:
Hi all,

I'm new to perl and i was looking through this piece of code

But can't seems to figure out what it means.

if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )

Need some gurus help here.

Does it means. if $fileline contains tt regular expression?

if $fileline *matches* the regular expression.
<input with
any number of characters and contains = "or' hidden 'or" and any number
of characters till >

appreciate any help

Here's an example script showing some of the things that match, and one
that doesn't:

#!/usr/bin/perl
use strict;
use warnings;

while (my $fileline = <DATA>) {
print "$fileline";
if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i ) {
print "\tMatch\n";
}
else {
print "\tNo Match\n";
}
}

DS

PS. There's no need to escape the '<' and '>' characters in the RE.


Same for the " and ' in the character class.
 
T

Tad McClellan

I'm new to perl and i was looking through this piece of code

But can't seems to figure out what it means.

if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )
^ ^ ^ ^
^ ^ ^ ^

It means that whoever wrote it is neither a good software designer
nor a good Perl programmer.

A good designer would have known the futility of attempting
to "parse" a context free grammar with a regular expression,
and would have used a module from CPAN to do the grunt work.

A good Perl programmer would not backslash characters that
do not need backslashing, and would have known that

.*\s*

is equivalent to

.*

if $fileline truly is "a line".

If $fileline is not a line, then a good designer would not
have mislabeled it when choosing a variable name.

Need some gurus help here.


No you don't.

"guru" is at an extremely high level.

You only need the help of anyone who knows a little more than you do.

appreciate any help


Throw out that hobbyist code and write a Real Program instead.
 
L

lochuanjiang

Thanks for the advices..

Relatively still new to perl and trying to read out the regular
expression in english is a daunting task for me.



Tad said:
I'm new to perl and i was looking through this piece of code

But can't seems to figure out what it means.

if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )
^ ^ ^ ^
^ ^ ^ ^

It means that whoever wrote it is neither a good software designer
nor a good Perl programmer.

A good designer would have known the futility of attempting
to "parse" a context free grammar with a regular expression,
and would have used a module from CPAN to do the grunt work.

A good Perl programmer would not backslash characters that
do not need backslashing, and would have known that

.*\s*

is equivalent to

.*

if $fileline truly is "a line".

If $fileline is not a line, then a good designer would not
have mislabeled it when choosing a variable name.

Need some gurus help here.


No you don't.

"guru" is at an extremely high level.

You only need the help of anyone who knows a little more than you do.

appreciate any help


Throw out that hobbyist code and write a Real Program instead.
 
G

Guest

(e-mail address removed) wrote:
: Thanks for the advices..

: Relatively still new to perl and trying to read out the regular
: expression in english is a daunting task for me.

Redefine your approach to regular expressions. Carefully read the
Perl documentation (perlre), study the examples given (not only by
passively staring at them, but by actually trying to craft your
own sample data, things you want to find and matching REs) and go
from simple to complicated, rather than trying to understand this
aweful, cluttered and obfuscated regex code you presented here, and
thus going from complicated to simple (which will be unreachable).

Oliver.
 
E

Eric Schwartz

Redefine your approach to regular expressions. Carefully read the
Perl documentation (perlre), study the examples given (not only by
passively staring at them, but by actually trying to craft your
own sample data, things you want to find and matching REs) and go
from simple to complicated, rather than trying to understand this
aweful, cluttered and obfuscated regex code you presented here, and
thus going from complicated to simple (which will be unreachable).

In support of going from complicated to simple, you can also use qr,
the x modifier, and judicious use of whitespace to build up regular
expressions out of smaller ones (I believe this is in PBP, but I
simplified and expanded this example from the course notes that I
assume he wrote the book from):

# A digit is a
my $DIGIT = qr{ \d+ # number (sequence of digits)
(?: \. \d* ) # optionally followed by a . and
# more digits (the ?: means group, but
# don't capture)
| \. \d+ # or else it's a . followed by digits
}x; # x says "allow whitespace and comments
# within a regex

my $SIGN = qr{ [+-] # a + or -; here the [] is used to
# make the [+] not a quanitifier
}x;

# And finally, a number is:
my $NUM = qr{
( ($SIGN?) ($DIGIT) ) # an optional sign followed
# by some digits
}x;

Notice how, in this example, each regex is relatively small, and
easily explainable. Furthermore, $NUM, if written that way is easy to
understand-- the comments are superfluous, really. If you wrote it
out as one long regex, it would be much harder to read:

my $NUM = qr{(([+-])?(\d+(?:\.\d*)|\.\d+))}; # WTF?

It may take more lines, but who cares? Linefeeds are not expensive.
Write your program to be read, not run.

-=Eric
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top