What do you need to have to be considered a Master at Perl?

C

Charlton Wilbur

PJH> And I hope it isn't. Local conventions for phone numbers,
PJH> street addresses, etc. differ a lot, and software tends to be
PJH> used outside that local context. Forcing a non-US customer to
PJH> guess US conventions, or worse, rejecting a phone number
PJH> because it has too many or too few digits is a major annoyance.

Of course - but if we tried to get a candidate to write code to validate
*any* phone number worldwide, it would take unreasonably long for an
interview. And we'd need to have a considerably more extensive spec.

The question is useful because it's slightly open-ended, especially
phrased that way. Candidates who ask "Hm, are we matching US phone
numbers or phone numbers from anywhere?" are showing that they are aware
that the practices in the US are not universal, which is a very good
trait to have when you're developing software that's used around the
world.

But it's an interview question aimed at assessing the candidate's
fluency with regular expressions. Because of that, it helps to keep it
in a domain that people are familiar with, both in terms of valid input
(you can reasonably have both 7-digit and 10/11-digit US phone numbers)
and in terms of what fool things people are likely to do (do they write
the phone number 1 (617) 555-1212? 617.555.1212? 1-617-555-1212? is
16-17-5551-212 correct?)

So the goal of the exercise -- what you need to pass with flying colors
-- is to come up with a snippet of code or a regular expression that
matches a US phone number, and an acknowledgement that phone systems in
other countries have different formats for their phone numbers.

Charlton
 
D

darkon

I can't find Tom Christiansen's original, so I'll go with Nat's
reconstruction of:

   The Seven Stages of a Perl Programmer

(From http://prometheus.frii.com/~gnat/yapc/2000-stages/)

That has to be incomplete, because it would put me mostly in the
"Adept" category, and I don't consider myself that good.

And anyway, the longest program or module I've written in Perl is less
than 1000 lines; even less if you remove comments and whitespace
inserted for readability. Should that count in some way? After all,
one necessary (but not sufficient) way to become better at a
programming language is actually programming in it.
 
D

Dr.Ruud

Tad said:
I think my solution will allow plenty damnfoolery and still
yield a usable US telephone number:

sub validate_phonenumber {
my($phone) = @_;

$phone =~ s/\D+//g; # allow hyphens, dots, parens, spaces etc

s/allow/remove/

Another test question: How many elements does the \d character set have?

s/\\D/[^0-9]/
 
T

Ted Zlatanov

R> Another test question: How many elements does the \d character set
R> have?

R> s/\\D/[^0-9]/

That's a nasty one, you need to test with several Perl versions to be
sure. Looking at perlunicode, it seems like \d will match Unicode
digits, but [\d] will use byte semantics and thus only match [0-9]. But
that fails in this example:

# U+FF10 is the Fullwidth zero
perl -e'$x = chr 0xFF10; print "Yes normal\n" if $x =~ m/\d/; print "Yes in byte\n" if $x =~ m/[\d]/;'

Yes normal
Yes in byte

So [\d] still matches Unicode digits (this is against 5.8.8). Anyhow,
the answer to the original question is "depends on Perl version."

Ted
 
P

Peter J. Holzer

PJH> And I hope it isn't. Local conventions for phone numbers,
PJH> street addresses, etc. differ a lot, and software tends to be
PJH> used outside that local context. Forcing a non-US customer to
PJH> guess US conventions, or worse, rejecting a phone number
PJH> because it has too many or too few digits is a major annoyance.

Of course - but if we tried to get a candidate to write code to validate
*any* phone number worldwide, it would take unreasonably long for an
interview. And we'd need to have a considerably more extensive spec.

I don't think so. There are so many conventions that the spec probably
boils down to "optional (or maybe mandatory) '+', then a string of
numbers interspersed with a small set ([- /.()], any more?) of
reasonable punctuation characters (oh, and if there are parentheses,
they must match)". You really can't validate more than
that unless you know *all* conventions, and even if you do, one of them
might change next week.
The question is useful because it's slightly open-ended, especially
phrased that way.

I fully agree with that.
Candidates who ask "Hm, are we matching US phone
numbers or phone numbers from anywhere?" are showing that they are aware
that the practices in the US are not universal, which is a very good
trait to have when you're developing software that's used around the
world.

I also agree with that.

But it's an interview question aimed at assessing the candidate's
fluency with regular expressions. Because of that, it helps to keep it
in a domain that people are familiar with, both in terms of valid input
(you can reasonably have both 7-digit and 10/11-digit US phone numbers)
and in terms of what fool things people are likely to do (do they write
the phone number 1 (617) 555-1212? 617.555.1212? 1-617-555-1212? is
16-17-5551-212 correct?)

But you didn't specify this problem domain. You just ask for "a telephone
number". In my opinion, a candidate who assumed (without asking for
clarification) that you meant "a US telephone number" should lose points
for that assumption, and not "pass with flying colors".

hp
 
A

A. Sinan Unur

PJH> And I hope it isn't. Local conventions for phone numbers,
....
Of course - but if we tried to get a candidate to write code to
validate *any* phone number worldwide, it would take unreasonably
long for an interview. And we'd need to have a considerably more
extensive spec.

I don't think so. There are so many conventions that the spec probably
boils down to "optional (or maybe mandatory) '+', then a string of
numbers interspersed with a small set ([- /.()], any more?) of
reasonable punctuation characters (oh, and if there are parentheses,
they must match)".
The question is useful because it's slightly open-ended, especially
phrased that way.
....
But it's an interview question aimed at assessing the candidate's
fluency with regular expressions. Because of that, it helps to keep
it in a domain that people are familiar with, both in terms of valid
input (you can reasonably have both 7-digit and 10/11-digit US phone
numbers) and in terms of what fool things people are likely to do (do
they write the phone number 1 (617) 555-1212? 617.555.1212?
1-617-555-1212? is 16-17-5551-212 correct?)

But you didn't specify this problem domain. You just ask for "a
telephone number". In my opinion, a candidate who assumed (without
asking for clarification) that you meant "a US telephone number"
should lose points for that assumption, and not "pass with flying
colors".

IMHO, phone numbers should be solicited in three parts:

Country Code: ___

Area Code: ___

Number: ___

because each is subject to different validation mechanisms. To validate
a country code, you need a list of valid country codes (which may change
over time but by now are probably relatively stable).

The number of digits in an area code can depend on the country. So, you
need a set of rules that depend on the country code.

Similarly, the number of digits in a telephone number can depend both on
the country code and the area code. For example, not too long ago, area
codes in Turkey could be anywhere between 2-5 digits and phone numbers
could be 4-6 digits. Now that we have standardized on 3 digit area codes
and 7 digit phone numbers, the universe makes sense again. However,
there are still numbers that don't take area codes such as 444 0 489.

Clever(!) tricks such as trying to see if the phone number goes with the
address are useless. In the U.S., many don't bother to change cell
numbers when they move. In Turkey, cell area codes have their own domain
and are not directly related to the subscriber's address.

If one is given a generic string which may be a phone number, one's
options are limited. Sure one can split on inter-digit space and
punctuation, see if the first cluster can definitely be identified as a
country code by checking if it starts with a + sign etc etc.

But, if I were asked the question to check a phone number without any
further specification, I would state the issues above and go ahead and
write code to remove non-digits (only using [0-9] as digit characters)
and see if the resulting string is between, say, 6 - 17 digits long.
That should cover most variations.

IMHO, the interview question can be made better by splitting it into two
parts: 1) To see if the candidate has any understanding of localization
issues, ask the candidate about the types of phone number inputs one
might encounter when processing data for a company doing business
internationally and 2) To check the candidate's understanding of regular
expressions, give him a specification and ask him to write a pattern or
patterns to validate against that.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top