Phone number regular expression...

Discussion in 'Perl' started by joemono, Sep 26, 2003.

  1. joemono

    joemono Guest

    Hello everyone!

    First, I appologize if this posting isn't proper "netiquette" for this
    group.

    I've been working with perl for almost 2 years now. However, my regular
    expression knowledge is pretty limited. I wrote the following expression to
    take (hopefully) any _reasonable_ phone number input, and format it as
    (999) 999-9999 x 9999.

    Here's what I've come up with. I would like your comments, if you've got the
    time. I'm really interested in regular expressions, and I want to know if
    what I'm doing is inefficient, slow, etc...

    # area code
    \({0,1}\s*(\d{3}){0,1}\s*\){0,1}
    # optional parentheses, 3 digits, optional parentheses
    (?=[-| ]*(\d{3}){1}[-| ]*(\d{4}){1}) #
    match only if the first match is followed by

    # what looks like a phone number

    # this is the same match as the standard 7 digit phone number below
    # main phone number
    [-| ]*
    (\d{3}){1} # first 3 digits
    [-| ]*
    (\d{4}){0,1} # second 4 digits

    # extension
    [-| |x|X]*
    (\d{3,4}){0,1} # extension

    For example, here's a question I have. Is there a way to use the look-ahead
    match in the area code section _again_ for matching the main number, since
    they are the same? I also know that I could use ? instead of {0,1}
    (correct?), but I always get confused between that and non-greedy
    quantifier. Does that make sense?

    I wrote a script to test it (it generates many different possible phone
    number inputs, and then applies the regular expression), and it _seems_ to
    work. But like I said, I kinda don't know what I'm doing. I've been using
    http://www.perldoc.com/perl5.6/pod/perlre.html heavily. It's pretty useful.

    Here's another question, do people ever have extensions less than 3, or
    greater than 4 numbers?

    Thanks for your help!

    Joe
    joemono, Sep 26, 2003
    #1
    1. Advertising

  2. joemono

    Purl Gurl Guest

    joemono wrote:

    (snipped)

    > I wrote the following expression to take (hopefully) any _reasonable_
    > phone number input, and format it as
    > (999) 999-9999 x 9999.


    Parameter is "reasonable" American style phone numbers.


    > what I'm doing is inefficient, slow, etc...


    (snipped a lot of regex matching)

    Yes, very slow, very inefficient. Do not invoke a
    regex engine unless you have no choice, or a regex
    actually "proves" to be the most efficient method
    found within a collection of tested methods.


    > Is there a way to use the look-ahead match


    Never use look-ahead unless you have no choice.
    Using any style of look-ahead will almost always
    be slow and inefficient compared to other methods.

    Note my "almost always" does not mean "always" as some
    might ignorantly claim. In some cases, a look-ahead
    could be your only choice, or most efficient choice.


    > do people ever have extensions less than 3, or greater than 4 numbers?


    Extensions cannot be predicted. Length of an extension is
    directly controlled by an internal PBX system. An extension
    length can literally be any length.

    What is the length of those extensions you hear during a
    recorded menu selection? Is there more than one extension?
    These type of numbers, could be a problem.

    1-800-tru-idiots
    if you are stupid, press 1 now
    *next menu*
    if you are stupid and gullible, press 2 now
    *next menu*
    if you are stupid, gullible and tired of this, press 3 now
    *next menu*
    Thank you for calling America Onlame! You are an idiot! Goodbye!
    *dial tone*

    I count three extensions each with a length of one.

    Your methodology allows parentheses, hyphens and such, then
    tries to match for all possible combinations. This is quite
    inefficient and prone to error.

    Remove all characters except numbers, then work with your data.
    You are interested in phone numbers, are you not? So work with
    numbers, nothing else.

    Keep in mind, regardless of what methodology you employ, there
    is a good chance there will be false positives and false negatives.
    Parsing phone numbers is similar to parsing email addresses; it
    is difficult and unpredictable.

    Look over my method below. This method eliminates all characters
    except numbers, then generates a very uniform output appropriate
    for a data file. Output is also easy on the human eye.


    Ever wonder why people use "spelled" phone numbers, like

    1-800-bite-me

    When someone tries to give me a spelled number, I say,

    "Don't bother. I will not call you."


    Purl Gurl
    --
    Rock Midis! Science Fiction! Amazing Androids!
    http://www.purlgurl.net/~callgirl

    My $test_it is used to exemplify a non-destructive
    method, needed for a print of invalid numbers. You
    could easily use $_ throughout as well, but this
    defeats "full" printing of an invalid phone number.

    #!perl

    while (<DATA>)
    {
    my $test_it = $_;
    $test_it =~ s/[^\d+]//g;

    if ($test_it =~ tr/0-9// == 7)
    {
    substr ($test_it, 3, 0, " ");
    print "$test_it\n";
    }
    elsif ($test_it =~ tr/0-9// == 10)
    {
    substr ($test_it, 3, 0, " ");
    substr ($test_it, 7, 0, " ");
    print "$test_it\n";
    }
    elsif ($test_it =~ tr/0-9// > 10)
    {
    substr ($test_it, 3, 0, " ");
    substr ($test_it, 7, 0, " ");
    substr ($test_it, 12, 0, " ");
    print "$test_it\n";
    }
    else
    { print "Phone Number Appears Invalid: $_\n"; }
    }


    __DATA__
    123-4567
    123 4567
    (310) 123 4567
    310-123-4567
    310-123-4567 ext 890
    310 123 4567 890
    123-4567FUBAR
    310 123 FUBAR



    PRINTED RESULTS:
    ________________

    123 4567
    123 4567
    310 123 4567
    310 123 4567
    310 123 4567 890
    310 123 4567 890
    123 4567
    Phone Number Appears Invalid: 310 123 FUBAR
    Purl Gurl, Sep 28, 2003
    #2
    1. Advertising

  3. joemono

    Roy Johnson Guest

    I thought that you made a few odd (either esoteric or not Lazy enough)
    implementation decisions.

    Purl Gurl <> wrote in message news:<>...
    > [...]You could easily use $_ throughout as well, but this
    > defeats "full" printing of an invalid phone number.


    Instead of preserving $_ and working on $test_it, you could have saved
    a copy and then worked on $_ itself.

    You used s/[^\d+]//g instead of tr/0-9//dc to remove all non-digits.

    You used tr/0-9// instead of length.

    The use of the 4-argument version of substr() was neat, but a
    judicious pattern match instead of length-checking makes for tighter
    code:

    while (<DATA>) {
    my $save = $_;
    tr/0-9//dc;
    if (/(...)?(...)(....)/) {
    printf "%3s %s %s %s\n", $1, $2, $3, $';
    }
    else {
    print "Invalid phone number: $save\n";
    }
    }

    Now let's go back to the issue of stripping all non-numerics. If you
    do that, you can't distinguish 123-4567 x890 from (123) 456 7890.
    Granted, when you dial, the phone doesn't know the difference, but
    there may be some difference in how the person doing the dialing has
    to behave.

    If, instead of stripping the non-digits, you just look for groups of
    digits (optional 3, then mandatory 3 and 4, then optional however
    many) amongst the non-digits, you can address that:

    #!perl
    while (<DATA>) {
    my $save = $_;
    if (/^\D*(?:(\d{3})\D+)?(\d{3})\D+(\d{4})(?:\D+(\d+))?/) {
    printf "%3s %s %s %s\n", $1, $2, $3, $4;
    }
    else {
    print "Invalid phone number: $save\n";
    }
    }

    __DATA__
    123-4567
    123 4567
    123 4567 x890 <-- note
    (310) 123 4567
    310-123-4567
    310-123-4567 ext 890
    310 123 4567 890
    123-4567FUBAR
    310 123 FUBAR


    Output is:
    123 4567
    123 4567
    123 4567 890
    310 123 4567
    310 123 4567
    310 123 4567 890
    310 123 4567 890
    123 4567
    Invalid phone number: 310 123 FUBAR
    Roy Johnson, Oct 3, 2003
    #3
  4. joemono wrote:
    > I wrote the following expression to take (hopefully) any
    > _reasonable_ phone number input, and format it as (999) 999-9999 x
    > 9999.


    Hi Joe,

    I don't know the likelihood in your case that people outside the US
    are asked to enter their phone numbers. The reason why I mention it is
    that I have tried to enter my non-US number at quite a few US based
    web sites, resulting in error messages...

    So, out from that experience, I'd say that a strict phone number
    checking is sometimes a really bad idea. ;-)

    Gunnar
    (Sweden)

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Oct 3, 2003
    #4
  5. joemono

    Purl Gurl Guest

    Roy Johnson wrote:

    > Purl Gurl wrote in message


    > I thought that you made a few odd (either esoteric or not Lazy enough)
    > implementation decisions.


    I have no interest in reading Code Cop Crap.

    It is annoying to open an article only to discover
    this type of troll mule manure you write.

    Respond to the originating author as you should.

    You are wasting your time and the time of readers.


    Purl Gurl
    Purl Gurl, Oct 3, 2003
    #5
  6. joemono

    Roy Johnson Guest

    Purl Gurl <> wrote in message news:<>...

    > I have no interest in reading Code Cop Crap.


    Interesting. I have no interest in your critiques of my posts that
    have nothing to do with Perl.

    It's not "trolling" to point out that you're doing bizarre things when
    straightforward methods are available. My code was much more clear
    than yours, as well as being shorter.

    delete $shoulder->{'chip'}
    Roy Johnson, Oct 3, 2003
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,267
  2. Ori
    Replies:
    2
    Views:
    21,259
    Brian W
    Jan 26, 2004
  3. Jake K
    Replies:
    2
    Views:
    850
    Rad [Visual C# MVP]
    Jan 15, 2007
  4. venu
    Replies:
    4
    Views:
    147
  5. venu
    Replies:
    2
    Views:
    173
    RedGrittyBrick
    May 31, 2006
Loading...

Share This Page