Regular expression help

lochuanjiang · May 17, 2006

Hi all,

I'm new to perl and i was looking through this piece of code

But can't seems to figure out what it means.

if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )

Need some gurus help here.

Does it means. if $fileline contains tt regular expression? <input with
any number of characters and contains = "or' hidden 'or" and any number
of characters till >

appreciate any help

Paul Lalli · May 17, 2006

[email protected] said:
I'm new to perl and i was looking through this piece of code

But can't seems to figure out what it means.

You need to read some documentation:
perldoc perlretut
perldoc perlre
perldoc perlreref

if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )

Need some gurus help here.

No you don't. You need the most basic level of knowledge about regular
expressions. You do not need a "guru".

Does it means. if $fileline contains tt regular expression?

I have no idea what you mean by "tt".

<input with
any number of characters and contains = "or' hidden 'or" and any number
of characters till >

I can't parse what you're description is, so I can't tell you if you're
right or wrong.
if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )
The regular expression will match if $fileline contains the pattern:
a less-than sign
input
anything (other than a newline)
any amount of whitespace
type
any amount of whitespace
an equals sign
any number of whitespace, single quotes, or double quotes
hidden
anything (other than a newline)
any amount of whitespace
a greater than sign

Note that this is an incredibly poorly written regular expression. If
the goal is to find all HTML input tags of type "hidden", this will
both match things it shouldn't and not match things it should. I
strongly suggest you do not use this, and throw away whatever source
you found it from.

For real HTML parsing, use an HTML parser:
Search search.cpan.org for "HTML:

arser"

Paul Lalli

David Squire · May 17, 2006

Hi all,

I'm new to perl and i was looking through this piece of code

But can't seems to figure out what it means.

if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )

Need some gurus help here.

Does it means. if $fileline contains tt regular expression?

if $fileline *matches* the regular expression.

<input with
any number of characters and contains = "or' hidden 'or" and any number
of characters till >

appreciate any help

Here's an example script showing some of the things that match, and one
that doesn't:

#!/usr/bin/perl
use strict;
use warnings;

while (my $fileline = <DATA>) {
print "$fileline";
if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i ) {
print "\tMatch\n";
}
else {
print "\tNo Match\n";
}
}

__DATA__
<INPUTFROG type="HiDdENDUCK >
<inputCattype= HiDdENBAT >
<iNpUtDogtype=HiDdENBAT>
<INPUTCattype=-HiDdENBAT>

Output:
<INPUTFROG type="HiDdENDUCK >
Match
<inputCattype= HiDdENBAT >
Match
<iNpUtDogtype=HiDdENBAT>
Match
<INPUTCattype=-HiDdENBAT>
No Match

Somehow I doubt that that RE is really doing what its author wanted...

See perldoc perlre for an explanation of why this works the way it does.

DS

PS. There's no need to escape the '<' and '>' characters in the RE.

Brian Wakem · May 17, 2006

David said:
Hi all,

I'm new to perl and i was looking through this piece of code

But can't seems to figure out what it means.

if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )

Need some gurus help here.

Does it means. if $fileline contains tt regular expression?

Click to expand...

if $fileline *matches* the regular expression.

<input with
any number of characters and contains = "or' hidden 'or" and any number
of characters till >

appreciate any help

Click to expand...

Here's an example script showing some of the things that match, and one
that doesn't:

#!/usr/bin/perl
use strict;
use warnings;

while (my $fileline = <DATA>) {
print "$fileline";
if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i ) {
print "\tMatch\n";
}
else {
print "\tNo Match\n";
}
}

DS

PS. There's no need to escape the '<' and '>' characters in the RE.

Same for the " and ' in the character class.

Tad McClellan · May 17, 2006

I'm new to perl and i was looking through this piece of code

But can't seems to figure out what it means.

if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )

^ ^ ^ ^
^ ^ ^ ^

It means that whoever wrote it is neither a good software designer
nor a good Perl programmer.

A good designer would have known the futility of attempting
to "parse" a context free grammar with a regular expression,
and would have used a module from CPAN to do the grunt work.

A good Perl programmer would not backslash characters that
do not need backslashing, and would have known that

.*\s*

is equivalent to

.*

if $fileline truly is "a line".

If $fileline is not a line, then a good designer would not
have mislabeled it when choosing a variable name.

Need some gurus help here.

No you don't.

"guru" is at an extremely high level.

You only need the help of anyone who knows a little more than you do.

appreciate any help

Throw out that hobbyist code and write a Real Program instead.

lochuanjiang · May 18, 2006

Thanks for the advices..

Relatively still new to perl and trying to read out the regular
expression in english is a daunting task for me.

Tad said:
I'm new to perl and i was looking through this piece of code

But can't seems to figure out what it means.

if ( $fileline =~ /\<input.*\s*type\s*=[\s\"\']*hidden.*\s*\>/i )

Click to expand...

^ ^ ^ ^
^ ^ ^ ^

It means that whoever wrote it is neither a good software designer
nor a good Perl programmer.

A good designer would have known the futility of attempting
to "parse" a context free grammar with a regular expression,
and would have used a module from CPAN to do the grunt work.

A good Perl programmer would not backslash characters that
do not need backslashing, and would have known that

.*\s*

is equivalent to

.*

if $fileline truly is "a line".

If $fileline is not a line, then a good designer would not
have mislabeled it when choosing a variable name.

Need some gurus help here.

Click to expand...

No you don't.

"guru" is at an extremely high level.

You only need the help of anyone who knows a little more than you do.

appreciate any help

Click to expand...

Throw out that hobbyist code and write a Real Program instead.

Guest · May 18, 2006

(e-mail address removed) wrote:
: Thanks for the advices..

: Relatively still new to perl and trying to read out the regular
: expression in english is a daunting task for me.

Redefine your approach to regular expressions. Carefully read the
Perl documentation (perlre), study the examples given (not only by
passively staring at them, but by actually trying to craft your
own sample data, things you want to find and matching REs) and go
from simple to complicated, rather than trying to understand this
aweful, cluttered and obfuscated regex code you presented here, and
thus going from complicated to simple (which will be unreachable).

Oliver.

Eric Schwartz · May 18, 2006

Redefine your approach to regular expressions. Carefully read the
Perl documentation (perlre), study the examples given (not only by
passively staring at them, but by actually trying to craft your
own sample data, things you want to find and matching REs) and go
from simple to complicated, rather than trying to understand this
aweful, cluttered and obfuscated regex code you presented here, and
thus going from complicated to simple (which will be unreachable).

In support of going from complicated to simple, you can also use qr,
the x modifier, and judicious use of whitespace to build up regular
expressions out of smaller ones (I believe this is in PBP, but I
simplified and expanded this example from the course notes that I
assume he wrote the book from):

# A digit is a
my $DIGIT = qr{ \d+ # number (sequence of digits)
(?: \. \d* ) # optionally followed by a . and
# more digits (the ?: means group, but
# don't capture)
| \. \d+ # or else it's a . followed by digits
}x; # x says "allow whitespace and comments
# within a regex

my $SIGN = qr{ [+-] # a + or -; here the [] is used to
# make the [+] not a quanitifier
}x;

# And finally, a number is:
my $NUM = qr{
( ($SIGN?) ($DIGIT) ) # an optional sign followed
# by some digits
}x;

Notice how, in this example, each regex is relatively small, and
easily explainable. Furthermore, $NUM, if written that way is easy to
understand-- the comments are superfluous, really. If you wrote it
out as one long regex, it would be much harder to read:

my $NUM = qr{(([+-])?(\d+(?:\.\d*)|\.\d+))}; # WTF?

It may take more lines, but who cares? Linefeeds are not expensive.
Write your program to be read, not run.

-=Eric

Help in hangman game	1	Jul 24, 2023
Help with a function	8	Mar 11, 2020
Regular expression for BOM required	6	Jan 12, 2013
Requesting regular expression help	12	Feb 26, 2010
Serious Perl Regular Expression deficiency?	15	Dec 23, 2005
Regular Expression Help	2	Dec 15, 2004
Help in regular expression	4	Nov 7, 2006
Regular expression help	2	Sep 24, 2009

Regular expression help

lochuanjiang

Paul Lalli

David Squire

Brian Wakem

Tad McClellan

lochuanjiang

Guest

Eric Schwartz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads