How would I create a Regular Expression to check

Nathan · Jan 3, 2008

How would I create a Regular Expression to check Street address for
any of the below items:

If the first character is a P ...
p.o. box
po box
po. box
p.o box
post office box
POB
POX
PODRAWER
POSTOFFICE
PO BX
POBOX
P/O

If the first character is a B ...
BX
BOX
Buzon -- (Means 'Box' in Spanish)

If the first character is a A ...
Apartado -- (is 'PO Box in Spanish)
Aptdo -- (is POB abbreviated in Spanish)

Thanks,
Nathan

Ted Zlatanov · Jan 3, 2008

CW> The short answer: you can't. At least not one single, reasonably
CW> short regex that can cover it in one go. I'd simply iterate
CW> over all the possibilities and compare each one to the street address,
CW> like:

An alternate approach is to use Parse::RecDescent. It's really good in
my experience for parsing this kind of disparate input, and will
organize it for you (so you can tell that the street adress was in
Spanish, for example).

Ted

jjcassidy · Jan 3, 2008

How would I create a Regular Expression to check Street address for
any of the below items:

If the first character is a P ...
p.o. box
po box
po. box
p.o box
post office box
POB
POX
PODRAWER
POSTOFFICE
PO BX
POBOX
P/O

If the first character is a B ...
BX
BOX
Buzon -- (Means 'Box' in Spanish)

If the first character is a A ...
Apartado -- (is 'PO Box in Spanish)
Aptdo -- (is POB abbreviated in Spanish)

Thanks,
Nathan

It feels like I'm doing your homework, but here:

(Ap(?>artado|tdo)|B(?>O?X|uzon)|p(?>\.?o\.?|ost office) box|P(?>\/O|
O(?:B|X|DRAWER|STOFFICE|[ ]BX|BOX))

It's just simple decomposition.

Nathan · Jan 7, 2008

How would I create a Regular Expression to check Street address for
any of the below items:

Click to expand...

If the first character is a P ...
p.o. box
po box
po. box
p.o box
post office box
POB
POX
PODRAWER
POSTOFFICE
PO BX
POBOX
P/O

Click to expand...

If the first character is a B ...
BX
BOX
Buzon -- (Means 'Box' in Spanish)

Click to expand...

If the first character is a A ...
Apartado -- (is 'PO Box in Spanish)
Aptdo -- (is POB abbreviated in Spanish)

Click to expand...

Thanks,
Nathan

Click to expand...

It feels like I'm doing your homework, but here:

(Ap(?>artado|tdo)|B(?>O?X|uzon)|p(?>\.?o\.?|ost office) box|P(?>\/O|
O(?:B|X|DRAWER|STOFFICE|[ ]BX|BOX))

It's just simple decomposition.- Hide quoted text -

- Show quoted text -

J. Gleixner · Jan 7, 2008

Nathan said:
You did not do my homework but thanks... I will try yours as well...

Here is what I came up with but I like yours better I might try yours
instead of mine....

^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

Ever hear of case-insensitive pattern matching?

perldoc perlop

Search for "m/PATTERN/cgimosx".

Uri Guttman · Jan 7, 2008

JG> Nathan said:
You did not do my homework but thanks... I will try yours as well...
Here is what I came up with but I like yours better I might try yours
instead of mine....
^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

Click to expand...

JG> Ever hear of case-insensitive pattern matching?

JG> perldoc perlop

beyond that, note the [.\s] which is just . with the /s modifier. and it
has * after it which may not be correct (or just slower than +). [/] is
noisy and will break it unless alternate delimiters are used. beyond
that it is impossible to read (and /i will help there). and the way the
words are jammed together makes no sense or is impossible to parse out
visually. altogether a most horrible regex. i will copy it for training
purposes. i don't expect its author to claim this is proprietary code
just out of embarrasment.

uri

Peter J. Holzer · Jan 8, 2008

JG> Nathan said:
JG> Nathan said:

You did not do my homework but thanks... I will try yours as well...
Here is what I came up with but I like yours better I might try yours
instead of mine....
^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

Click to expand...

Click to expand...

JG> Ever hear of case-insensitive pattern matching?

JG> perldoc perlop

beyond that, note the [.\s] which is just . with the /s modifier.

What? A "." in a character class matches only a ".", But a \s still
matches any whitespace character, so [.\s] matches a "." or a whitespace
character. A /s modifier won't change its meaning.

hp

Jürgen Exner · Jan 8, 2008

[Please do not top-post, trying to correct]

Nathan said:
Here is what I came up with but I like yours better I might try yours
instead of mine....

^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

Sorry, but that's a great example for what not to do. Absolutely
unmaintainable. Within 4 weeks you will have no idea what that RE does and
how to modify it if you need to add another term.

IMO regular expressions are the wrong tool for the job. Far better would be
to put those terms in a hash (as keys), then extract the street name from
your address, and simply check if this street name exists() in the hash.
Or put the terms in an array and just loop through them.

Maybe that's not as smart as an RE approach, but it's much more intelligent.

jue

Ted Zlatanov · Jan 8, 2008

N> You did not do my homework but thanks... I will try yours as well...
N> Here is what I came up with but I like yours better I might try yours
N> instead of mine....

N> ^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
N> [Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
N> [Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
N> [Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

Good god, doesn't this bother you even a little bit? You should at
least submit it to the Daily WTF.

Ted

David Combs · Jan 31, 2008

CW> The short answer: you can't. At least not one single, reasonably
CW> short regex that can cover it in one go. I'd simply iterate
CW> over all the possibilities and compare each one to the street address,
CW> like:

An alternate approach is to use Parse::RecDescent. It's really good in
my experience for parsing this kind of disparate input, and will
organize it for you (so you can tell that the street adress was in
Spanish, for example).

Ted

A late response/request. *If* you find doing that pretty easy and
quick to do, *please* show us how you'd do it.

I've read the doc on it, and come away with neither facility nor understanding
for actually being able to use it in a real problem.

THANKS MUCH (from all of us?)

David

David Combs · Jan 31, 2008

Nathan said:
Nathan said:

You did not do my homework but thanks... I will try yours as well...

Here is what I came up with but I like yours better I might try yours
instead of mine....

^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

Click to expand...

Ever hear of case-insensitive pattern matching?

Without first going to perlop, I ask: even in *character classes*?!

perldoc perlop

Search for "m/PATTERN/cgimosx".

david

Gunnar Hjalmarsson · Jan 31, 2008

David said:
Nathan said:

^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

Click to expand...

Ever hear of case-insensitive pattern matching?

Click to expand...

Without first going to perlop, I ask: even in *character classes*?!

You should have tried it instead of asking hundreds of people.

C:\home>type test.pl
$_ = 'abc';
print "Yes\n" if /[A-Z]/i;

C:\home>test.pl
Yes

Ted Zlatanov · Jan 31, 2008

On Thu, 31 Jan 2008 13:46:10 +0000 (UTC) (e-mail address removed) (David Combs) wrote:

DC> In article said:
CW> The short answer: you can't. At least not one single, reasonably
CW> short regex that can cover it in one go. I'd simply iterate
CW> over all the possibilities and compare each one to the street address,
CW> like:
DC> A late response/request. *If* you find doing that pretty easy and
DC> quick to do, *please* show us how you'd do it.

DC> I've read the doc on it, and come away with neither facility nor understanding
DC> for actually being able to use it in a real problem.

I wrote a tutorial on P::RD a while ago, and it should still be valid.
IBM dW seems to be down right this moment, use the Google cache if you
have to. I don't mention auto_tree, which is really handy if you want
to process the data yourself.

http://www.ibm.com/developerworks/library/l-perl-speak.html

Here's another good one (and many others will come up in a web search):

http://www.perl.com/pub/a/2001/06/13/recdecent.html

Are you asking specifically for the mailing address example originally
posted to be implemented in P::RD, or do you need more information on
how to use P::RD for your own applications? I can certainly give a
P::RD grammar for the full list of address rules, but it's tedious work
to implement every rule the OP wanted and I don't want to spend hours of
my time doing it just to prove it's easy.

Thanks
Ted

Click to expand...

How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
How do I get the text that is found by a regular expression?	10	Apr 30, 2014
How to Create a random password generator in a separate window	4	May 26, 2022
FAQ 6.24 How do I match a regular expression that's in a variable?	0	Apr 19, 2011
FAQ 6.11 How do I use a regular expression to strip C style comments from a file?	0	Feb 10, 2011
How to store data from a sign up form on a website into an sql databse	1	Sep 9, 2022
Regular Expression : Bad Character Range	0	Dec 20, 2013

How would I create a Regular Expression to check

Nathan

Ted Zlatanov

jjcassidy

Nathan

J. Gleixner

Uri Guttman

Peter J. Holzer

Jürgen Exner

Ted Zlatanov

David Combs

David Combs

Gunnar Hjalmarsson

Ted Zlatanov

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads