How would I create a Regular Expression to check

N

Nathan

How would I create a Regular Expression to check Street address for
any of the below items:

If the first character is a P ...
p.o. box
po box
po. box
p.o box
post office box
POB
POX
PODRAWER
POSTOFFICE
PO BX
POBOX
P/O

If the first character is a B ...
BX
BOX
Buzon -- (Means 'Box' in Spanish)

If the first character is a A ...
Apartado -- (is 'PO Box in Spanish)
Aptdo -- (is POB abbreviated in Spanish)



Thanks,
Nathan
 
T

Ted Zlatanov

CW> The short answer: you can't. At least not one single, reasonably
CW> short regex that can cover it in one go. I'd simply iterate
CW> over all the possibilities and compare each one to the street address,
CW> like:

An alternate approach is to use Parse::RecDescent. It's really good in
my experience for parsing this kind of disparate input, and will
organize it for you (so you can tell that the street adress was in
Spanish, for example).

Ted
 
J

jjcassidy

How would I create a Regular Expression to check Street address for
any of the below items:

If the first character is a P ...
p.o. box
po box
po. box
p.o box
post office box
POB
POX
PODRAWER
POSTOFFICE
PO BX
POBOX
P/O

If the first character is a B ...
BX
BOX
Buzon -- (Means 'Box' in Spanish)

If the first character is a A ...
Apartado -- (is 'PO Box in Spanish)
Aptdo -- (is POB abbreviated in Spanish)

Thanks,
Nathan

It feels like I'm doing your homework, but here:

(Ap(?>artado|tdo)|B(?>O?X|uzon)|p(?>\.?o\.?|ost office) box|P(?>\/O|
O(?:B|X|DRAWER|STOFFICE|[ ]BX|BOX))

It's just simple decomposition.
 
N

Nathan

You did not do my homework but thanks... I will try yours as well...

Here is what I came up with but I like yours better I might try yours
instead of mine....

^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])



How would I create a Regular Expression to check Street address for
any of the below items:
If the first character is a P ...
p.o. box
po box
po. box
p.o box
post office box
POB
POX
PODRAWER
POSTOFFICE
PO BX
POBOX
P/O
If the first character is a B ...
BX
BOX
Buzon      -- (Means 'Box' in Spanish)
If the first character is a A ...
Apartado   -- (is 'PO Box in Spanish)
Aptdo      -- (is POB abbreviated in Spanish)
Thanks,
Nathan

It feels like I'm doing your homework, but here:

(Ap(?>artado|tdo)|B(?>O?X|uzon)|p(?>\.?o\.?|ost office) box|P(?>\/O|
O(?:B|X|DRAWER|STOFFICE|[ ]BX|BOX))

It's just simple decomposition.- Hide quoted text -

- Show quoted text -
 
J

J. Gleixner

Nathan said:
You did not do my homework but thanks... I will try yours as well...

Here is what I came up with but I like yours better I might try yours
instead of mine....

^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

Ever hear of case-insensitive pattern matching?

perldoc perlop

Search for "m/PATTERN/cgimosx".
 
U

Uri Guttman

JG> Nathan said:
You did not do my homework but thanks... I will try yours as well...
Here is what I came up with but I like yours better I might try yours
instead of mine....
^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

JG> Ever hear of case-insensitive pattern matching?

JG> perldoc perlop

beyond that, note the [.\s] which is just . with the /s modifier. and it
has * after it which may not be correct (or just slower than +). [/] is
noisy and will break it unless alternate delimiters are used. beyond
that it is impossible to read (and /i will help there). and the way the
words are jammed together makes no sense or is impossible to parse out
visually. altogether a most horrible regex. i will copy it for training
purposes. i don't expect its author to claim this is proprietary code
just out of embarrasment. :)

uri
 
P

Peter J. Holzer

JG> Nathan said:
You did not do my homework but thanks... I will try yours as well...
Here is what I came up with but I like yours better I might try yours
instead of mine....
^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

JG> Ever hear of case-insensitive pattern matching?

JG> perldoc perlop

beyond that, note the [.\s] which is just . with the /s modifier.

What? A "." in a character class matches only a ".", But a \s still
matches any whitespace character, so [.\s] matches a "." or a whitespace
character. A /s modifier won't change its meaning.

hp
 
J

Jürgen Exner

[Please do not top-post, trying to correct]
Nathan said:
Here is what I came up with but I like yours better I might try yours
instead of mine....

^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

Sorry, but that's a great example for what not to do. Absolutely
unmaintainable. Within 4 weeks you will have no idea what that RE does and
how to modify it if you need to add another term.

IMO regular expressions are the wrong tool for the job. Far better would be
to put those terms in a hash (as keys), then extract the street name from
your address, and simply check if this street name exists() in the hash.
Or put the terms in an array and just loop through them.

Maybe that's not as smart as an RE approach, but it's much more intelligent.

jue
 
T

Ted Zlatanov

N> You did not do my homework but thanks... I will try yours as well...
N> Here is what I came up with but I like yours better I might try yours
N> instead of mine....

N> ^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
N> [Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
N> [Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
N> [Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

Good god, doesn't this bother you even a little bit? You should at
least submit it to the Daily WTF.

Ted
 
D

David Combs

CW> The short answer: you can't. At least not one single, reasonably
CW> short regex that can cover it in one go. I'd simply iterate
CW> over all the possibilities and compare each one to the street address,
CW> like:

An alternate approach is to use Parse::RecDescent. It's really good in
my experience for parsing this kind of disparate input, and will
organize it for you (so you can tell that the street adress was in
Spanish, for example).

Ted

A late response/request. *If* you find doing that pretty easy and
quick to do, *please* show us how you'd do it.

I've read the doc on it, and come away with neither facility nor understanding
for actually being able to use it in a real problem.

THANKS MUCH (from all of us?)

David
 
D

David Combs

Nathan said:
You did not do my homework but thanks... I will try yours as well...

Here is what I came up with but I like yours better I might try yours
instead of mine....

^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

Ever hear of case-insensitive pattern matching?

Without first going to perlop, I ask: even in *character classes*?!
perldoc perlop

Search for "m/PATTERN/cgimosx".

david
 
G

Gunnar Hjalmarsson

David said:
Nathan said:
^([Pp]([Oo][Ss][Tt])?[.\s]*[Oo]([Ff][Ff][Ii][Cc][Ee])?[.\s]*[Bb][Oo]
[Xx])|[Pp][Oo]([Bb]|[Xx]|[Dd][Rr][Aa][Ww][Ee][Rr]|[Ss][Tt][Oo][Ff][Ff]
[Ii][Cc][Ee]|[ ][Bb][Xx]|[Bb][Oo][Xx])|[Pp][/][Oo]|[Bb]([Xx]|[Oo][Xx]|
[Uu][Zz][Oo][Nn])|[Aa]([Pp][Aa][Rr][Tt][Aa][Dd][Oo]|[Pp][Tt][Dd][Oo])

Ever hear of case-insensitive pattern matching?

Without first going to perlop, I ask: even in *character classes*?!

You should have tried it instead of asking hundreds of people.

C:\home>type test.pl
$_ = 'abc';
print "Yes\n" if /[A-Z]/i;

C:\home>test.pl
Yes
 
T

Ted Zlatanov

On Thu, 31 Jan 2008 13:46:10 +0000 (UTC) (e-mail address removed) (David Combs) wrote:

DC> In article said:
CW> The short answer: you can't. At least not one single, reasonably
CW> short regex that can cover it in one go. I'd simply iterate
CW> over all the possibilities and compare each one to the street address,
CW> like:
DC> A late response/request. *If* you find doing that pretty easy and
DC> quick to do, *please* show us how you'd do it.

DC> I've read the doc on it, and come away with neither facility nor understanding
DC> for actually being able to use it in a real problem.

I wrote a tutorial on P::RD a while ago, and it should still be valid.
IBM dW seems to be down right this moment, use the Google cache if you
have to. I don't mention auto_tree, which is really handy if you want
to process the data yourself.

http://www.ibm.com/developerworks/library/l-perl-speak.html

Here's another good one (and many others will come up in a web search):

http://www.perl.com/pub/a/2001/06/13/recdecent.html

Are you asking specifically for the mailing address example originally
posted to be implemented in P::RD, or do you need more information on
how to use P::RD for your own applications? I can certainly give a
P::RD grammar for the full list of address rules, but it's tedious work
to implement every rule the OP wanted and I don't want to spend hours of
my time doing it just to prove it's easy.

Thanks
Ted
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top