Code for title-casing (US) snail addresses?

R

rpardee

Hey All,

I've got to suck some address info out of a db, where it is stored in
all caps, like so:

14 WEST 21ST ST
742 NW EVERGREEN TERRACE
PO BOX 13

And I want to pretty up the casing:

14 West 21st St
742 NW Evergreen Terrace
PO Box 13

So it's mostly just title-casing, and then dealing w/certain special
components like 'NE', 'SW', 'PO' and so forth.

I've actually got some vb.net code that I could convert, but I thought
I'd ask here whether someone's already got something they've sweated
over before I did.

Also--more concretely: for simple title-casing, is this the
easiest/best approach?

pretty_address = my_address_var.split.each do |c| c.capitalize
end.join(" ")

Or is there something more graceful/efficient?

Thanks!

-Roy
 
J

James Edward Gray II

Hey All,

I've got to suck some address info out of a db, where it is stored in
all caps, like so:

14 WEST 21ST ST
742 NW EVERGREEN TERRACE
PO BOX 13

And I want to pretty up the casing:

14 West 21st St
742 NW Evergreen Terrace
PO Box 13

So it's mostly just title-casing, and then dealing w/certain special
components like 'NE', 'SW', 'PO' and so forth.

I've actually got some vb.net code that I could convert, but I thought
I'd ask here whether someone's already got something they've sweated
over before I did.

Also--more concretely: for simple title-casing, is this the
easiest/best approach?

pretty_address = my_address_var.split.each do |c| c.capitalize
end.join(" ")

Or is there something more graceful/efficient?

My thought is something like this:

abbreviations = %w{NW SW PO} # ... etc ...
my_address.gsub!(/[A-Z]+/) { |w| abbreviations.include?(w) ? w :
w.capitalize }

Hope that helps.

James Edward Gray II
 
R

rpardee

Niiice! I'm so glad I asked.

I think the only thing that doesn't get me is keeping suffixes on
ordinals in lowercase (I need 21st, not 21St). But I should be able to
get that together...

Thanks!

-Roy
 
J

James Edward Gray II

Niiice! I'm so glad I asked.

I think the only thing that doesn't get me is keeping suffixes on
ordinals in lowercase (I need 21st, not 21St). But I should be
able to
get that together...

Add this line, before the other gsub!():

my_address.gsub!(/(\d)([A-Z]+)/) { $1 + $2.downcase }

James Edward Gray II
 
D

David A. Black

Hi --

Niiice! I'm so glad I asked.

I think the only thing that doesn't get me is keeping suffixes on
ordinals in lowercase (I need 21st, not 21St). But I should be able to
get that together...

Add this line, before the other gsub!():

my_address.gsub!(/(\d)([A-Z]+)/) { $1 + $2.downcase }

You can safely downcase digits along with the letters:

/\d[A-Z]+/ {|s| s.downcase }

or something like that.


David
 
R

rpardee

And thanks again!

If I can beg your indulgence a bit further... I'm having trouble
elaborating this so it does the right thing for street names like
'McGrath'. The simpler version is:

def prettify(address)
abbreviations = %w(PO NE NW SE SW)
prefixes2 = %w(Mc O' D')
prefixes3 = %w(Mac)
address.gsub!(/(\d)([A-Z]+)/) { $1 + $2.downcase }
address.gsub(/[A-Z]+/) do |w|
# abbreviations.include?(w) ? w : w.capitalize
if abbreviations.include?(w) then
w
else
w.capitalize
end
end
end

Which works admirably, except it gives back "Mcgrath", rather than
"McGrath".

I *thought* I'd be able to tack a second Array.include? on there, like
so:

def prettify(address)
abbreviations = %w(PO NE NW SE SW)
prefixes2 = %w(Mc O' D')
prefixes3 = %w(Mac)
address.gsub!(/(\d)([A-Z]+)/) { $1 + $2.downcase }
address.gsub(/[A-Z]+/) do |w|
# abbreviations.include?(w) ? w : w.capitalize
if abbreviations.include?(w) then
w
else
w.capitalize
end
if prefixes2.include?(w[0..1]) then
w[2..2] = w[2..2].upcase
end
end
end

But when I do that, calls like prettify("20 E MCGRAW ST") just return
"20"

I've read over the gsub docs, but don't see why the above does not work
as expected. Anybody got a clue for me?

Thanks!

-Roy
 
J

James Edward Gray II

And thanks again!

Glad to help.
If I can beg your indulgence a bit further... I'm having trouble
elaborating this so it does the right thing for street names like
'McGrath'. The simpler version is:

See if this works (untested):
def prettify(address)
abbreviations = %w(PO NE NW SE SW)
prefixes2 = %w(Mc O' D')
prefixes3 = %w(Mac)
address.gsub!(/(\d)([A-Z]+)/) { $1 + $2.downcase }
address.gsub(/[A-Z]+/) do |w|
# abbreviations.include?(w) ? w : w.capitalize
if abbreviations.include?(w) then
w
else
w.capitalize
end
end

address.gsub!(/\b(#{prefixes2.join("|")})(\w)/) { $1 +
$2.capitalize }

James Edward Gray II
 
D

David A. Black

Hi --

And thanks again!

If I can beg your indulgence a bit further... I'm having trouble
elaborating this so it does the right thing for street names like
'McGrath'. The simpler version is:

def prettify(address)
abbreviations = %w(PO NE NW SE SW)
prefixes2 = %w(Mc O' D')
prefixes3 = %w(Mac)
address.gsub!(/(\d)([A-Z]+)/) { $1 + $2.downcase }
address.gsub(/[A-Z]+/) do |w|
# abbreviations.include?(w) ? w : w.capitalize
if abbreviations.include?(w) then
w
else
w.capitalize
end
end
end

Which works admirably, except it gives back "Mcgrath", rather than
"McGrath".

I *thought* I'd be able to tack a second Array.include? on there, like
so:

def prettify(address)
abbreviations = %w(PO NE NW SE SW)
prefixes2 = %w(Mc O' D')
prefixes3 = %w(Mac)
address.gsub!(/(\d)([A-Z]+)/) { $1 + $2.downcase }
address.gsub(/[A-Z]+/) do |w|
# abbreviations.include?(w) ? w : w.capitalize
if abbreviations.include?(w) then
w
else
w.capitalize

That whole if statement is a no-op: you don't actually change w, and
you don't return anything.
end
if prefixes2.include?(w[0..1]) then
w[2..2] = w[2..2].upcase
end

if's return nil on failure. So when this doesn't happen, the return
value of the whole block will be nil, which will be interpolated as an
empty string.
end
end

But when I do that, calls like prettify("20 E MCGRAW ST") just return
"20"

I've read over the gsub docs, but don't see why the above does not work
as expected. Anybody got a clue for me?

See notes above, and also see if this helps:

def prettify(address)
abbreviations = %w(PO NE NW SE SW)
prefixes2 = %w(Mc O' D')
prefixes3 = %w(Mac)
address.gsub!(/\d[A-Z]+/) {|w| w.downcase }
address.gsub(/[A-Z]+/) do |w|
w.capitalize! unless abbreviations.include?(w)
w[2..2] = w[2..2].upcase if prefixes2.include?(w[0..1])
w
end
end

(Note that last 'w', which ensures that w is the return value of the
block.)

I look forward to seeing how you'll handle people named de Forest who
live in Delaware :)


David
 
R

rpardee

Oooooh. Okay. I thought that gsub was going to take whatever 'w'
evaluated to on each loop, but I guess it's taking the last value
returned? Far out. Thanks for clearing that up.

Yeah, Mr. De Forest is going to have to live w/poor casing on his name
I'm afraid. ;-)

Thanks!

-Roy
 
W

William James

And thanks again!

If I can beg your indulgence a bit further... I'm having trouble
elaborating this so it does the right thing for street names like
'McGrath'. The simpler version is:

def prettify(address)
abbreviations = %w(PO NE NW SE SW)
prefixes2 = %w(Mc O' D')
prefixes3 = %w(Mac)
address.gsub!(/(\d)([A-Z]+)/) { $1 + $2.downcase }
address.gsub(/[A-Z]+/) do |w|
# abbreviations.include?(w) ? w : w.capitalize
if abbreviations.include?(w) then
w
else
w.capitalize
end
end
end

Which works admirably, except it gives back "Mcgrath", rather than
"McGrath".

I *thought* I'd be able to tack a second Array.include? on there, like
so:

def prettify(address)
abbreviations = %w(PO NE NW SE SW)
prefixes2 = %w(Mc O' D')
prefixes3 = %w(Mac)
address.gsub!(/(\d)([A-Z]+)/) { $1 + $2.downcase }
address.gsub(/[A-Z]+/) do |w|
# abbreviations.include?(w) ? w : w.capitalize
if abbreviations.include?(w) then
w
else
w.capitalize
end
if prefixes2.include?(w[0..1]) then
w[2..2] = w[2..2].upcase
end
end
end

But when I do that, calls like prettify("20 E MCGRAW ST") just return
"20"

I've read over the gsub docs, but don't see why the above does not work
as expected. Anybody got a clue for me?

Thanks!

-Roy


def prettify(address)
abbreviations = %w(PO NE NW SE SW)
prefixes = %w(Mc O' D' Mac)
address.gsub(/\S+/) do |w|
case
when abbreviations.include?(w)
w
when w =~ /^\d/
w.downcase
else
w = w.capitalize
prefixes.inject(w){|wrd,pre|
wrd[pre.size,1]=wrd[pre.size,1].upcase if
wrd.index(pre)==0
wrd
}
end
end
end
 
R

rpardee

Far out--I'm going to have to ponder that one--learn what inject()
does.

For now, here's what I've settled on:

@abbreviations = %w(PO NE NW SE SW P.O. N.E. N.W. S.E. S.W. C/O P.O)
@suffixes = %w(ST ND RD TH)
@prefixes = %w(Mc O' D')
@dash_reg = Regexp.new("([-/\"][A-Za-z])")

def prettify(address)
ret = Array.new
address.split(' ').each do |word|
word.capitalize! unless (@abbreviations.include?(word) or
word[0..0] =~ /[0-9#]/ )
word[2..2] = word[2..2].upcase if @prefixes.include?(word[0..1])
word[-2, 2] = word[-2, 2].downcase if
@suffixes.include?(word[-2,2])
ret << word
end
ret = ret.join(' ')
# puts ret
if ret =~ @dash_reg then ret.gsub(@dash_reg, $1.upcase) else ret end
end

This seems to handle both apartment numbers, and hyphenated street
names:

prettify("1234 SLEATER-KINNEY RD #A5")
 
D

David A. Black

Hi --

Far out--I'm going to have to ponder that one--learn what inject()
does.

For now, here's what I've settled on:

@abbreviations = %w(PO NE NW SE SW P.O. N.E. N.W. S.E. S.W. C/O P.O)
@suffixes = %w(ST ND RD TH)
@prefixes = %w(Mc O' D')
@dash_reg = Regexp.new("([-/\"][A-Za-z])")

def prettify(address)
ret = Array.new
address.split(' ').each do |word|
word.capitalize! unless (@abbreviations.include?(word) or
word[0..0] =~ /[0-9#]/ )
word[2..2] = word[2..2].upcase if @prefixes.include?(word[0..1])
word[-2, 2] = word[-2, 2].downcase if
@suffixes.include?(word[-2,2])
ret << word
end
ret = ret.join(' ')
# puts ret
if ret =~ @dash_reg then ret.gsub(@dash_reg, $1.upcase) else ret end
end

This seems to handle both apartment numbers, and hyphenated street
names:

prettify("1234 SLEATER-KINNEY RD #A5")

Just a few suggestions to prettify prettify :)

At the end, you don't need an 'if' clause; you can just do the gsub
operation, and if there's no match, nothing will happen:

ret.gsub(@dash_reg) { $1.upcase }

Actually, you don't need ret at all. split returns an array, so all
you have to do is join that array and filter it through the gsub. So
the whole method could be:

def prettify(address)
address.split(' ').each do |word|
word.capitalize! unless (@abbreviations.include?(word) or
word[0..0] =~ /[0-9#]/ )
word[2..2] = word[2..2].upcase if
@prefixes.include?(word[0..1])
word[-2, 2] = word[-2, 2].downcase if
@suffixes.include?(word[-2,2])
end.join(' ').gsub(@dash_reg) { $1.upcase }
end


David
 
D

Dominic Sisneros

Hey All,

I've got to suck some address info out of a db, where it is stored in
all caps, like so:

14 WEST 21ST ST
742 NW EVERGREEN TERRACE
PO BOX 13

And I want to pretty up the casing:

14 West 21st St
742 NW Evergreen Terrace
PO Box 13

So it's mostly just title-casing, and then dealing w/certain special
components like 'NE', 'SW', 'PO' and so forth.

I've actually got some vb.net code that I could convert, but I thought
I'd ask here whether someone's already got something they've sweated
over before I did.

I think perl has the most robust module for this and other geocoding
type of methods. I was thinking of porting it
but I am not too comfortable with Regular Expressions. Anyway the links
is below:.
http://search.cpan.org/author/SDERLE/Geo-StreetAddress-US-0.99/US.pm
 
J

James Edward Gray II

I think perl has the most robust module for this and other
geocoding type of methods. I was thinking of porting it but I am
not too comfortable with Regular Expressions.

That's about as easy as a port gets. Take a peek at the source.
It's about 80% documentation and simple Hash declarations.

And if you get stuck, we'll be here... ;)

James Edward Gray II
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top