Capitalization

Discussion in 'Ruby' started by Jason Vogel, Dec 8, 2006.

  1. Jason  Vogel

    Jason Vogel Guest

    Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
    RegEx is the answer, just don't know where to start.

    Current Source:
    str.split(' ').each {|w| w.capitalize!}.join(' ')

    Text:
    ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
    SELLER HEAT/AC/DUCTWORK

    Result:
    Additional Spa (only Available W/purchase Of Pool Or Spa)
    Seller Heat/ac/ductwork

    Desired:
    Additional Spa (Only Available w/Purchase of Pool or Spa)
    Seller Heat/AC/Ductwork

    Isssus:
    - Need to capitalize after a "/'
    - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
    "w/[a]" => "w/[A]")

    Thanks,
    Jason
     
    Jason Vogel, Dec 8, 2006
    #1
    1. Advertising

  2. On 12/9/06, Jason Vogel <> wrote:
    >
    > Isssus:
    > - Need to capitalize after a "/'
    > - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
    > "w/[a]" => "w/[A]")


    Take a look at http://zem.novylen.net/ruby/titlecase.rb (especially
    the icap method).

    martin
     
    Martin DeMello, Dec 8, 2006
    #2
    1. Advertising

  3. Try this:
    str.gsub(/[A-Za-z]+/) {|x| x.capitalize}

    If you want the W of W/ uncapitalized:
    str.downcase.gsub(/[A-Za-z]+(?!\/)/) {|x| x.capitalize}

    Paul Lutus wrote:
    > Jason Vogel wrote:
    >
    >> Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
    >> RegEx is the answer, just don't know where to start.
    >>
    >> Current Source:
    >> str.split(' ').each {|w| w.capitalize!}.join(' ')
    >>
    >> Text:
    >> ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
    >> SELLER HEAT/AC/DUCTWORK
    >>
    >> Result:
    >> Additional Spa (only Available W/purchase Of Pool Or Spa)
    >> Seller Heat/ac/ductwork
    >>
    >> Desired:
    >> Additional Spa (Only Available w/Purchase of Pool or Spa)
    >> Seller Heat/AC/Ductwork
    >>
    >> Isssus:
    >> - Need to capitalize after a "/'
    >> - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
    >> "w/[a]" => "w/[A]")

    >
    > How many special cases? In the worst case, you would have to use a
    > dictionary to avoid treating acronyms as a word. You already have two
    > rather difficult rules, one having to do with acronyms, another having to
    > do with special treatment of the sequence "w/".
    >
    > What I am saying is this is likely to be more difficult than it seems,
    > especially because we only have one example of what might end up being
    > thousands of examples of free-form text.
    >
     
    Daniel Finnie, Dec 8, 2006
    #3
  4. Oops, forgot to paste this one in:
    To get keep words like "of" and "is" lowercase: (basically anything
    under 3 letters)
    text.downcase.gsub(/[A-Za-z]{3,}(?!\/)/) {|x| x.capitalize}


    Daniel Finnie wrote:
    > Try this:
    > str.gsub(/[A-Za-z]+/) {|x| x.capitalize}
    >
    > If you want the W of W/ uncapitalized:
    > str.downcase.gsub(/[A-Za-z]+(?!\/)/) {|x| x.capitalize}
    >
    > Paul Lutus wrote:
    >> Jason Vogel wrote:
    >>
    >>> Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
    >>> RegEx is the answer, just don't know where to start.
    >>>
    >>> Current Source:
    >>> str.split(' ').each {|w| w.capitalize!}.join(' ')
    >>>
    >>> Text:
    >>> ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
    >>> SELLER HEAT/AC/DUCTWORK
    >>>
    >>> Result:
    >>> Additional Spa (only Available W/purchase Of Pool Or Spa)
    >>> Seller Heat/ac/ductwork
    >>>
    >>> Desired:
    >>> Additional Spa (Only Available w/Purchase of Pool or Spa)
    >>> Seller Heat/AC/Ductwork
    >>>
    >>> Isssus:
    >>> - Need to capitalize after a "/'
    >>> - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
    >>> "w/[a]" => "w/[A]")

    >>
    >> How many special cases? In the worst case, you would have to use a
    >> dictionary to avoid treating acronyms as a word. You already have two
    >> rather difficult rules, one having to do with acronyms, another having to
    >> do with special treatment of the sequence "w/".
    >>
    >> What I am saying is this is likely to be more difficult than it seems,
    >> especially because we only have one example of what might end up being
    >> thousands of examples of free-form text.
    >>

    >
    >
     
    Daniel Finnie, Dec 8, 2006
    #4
  5. Jason  Vogel

    Jacob Fugal Guest

    On 12/8/06, Daniel Finnie <> wrote:
    > Oops, forgot to paste this one in:
    > To get keep words like "of" and "is" lowercase: (basically anything
    > under 3 letters)
    > text.downcase.gsub(/[A-Za-z]{3,}(?!\/)/) {|x| x.capitalize}


    I agree with Paul Lutus, there are too many special cases. And
    Daniel's regex here is a good example. I can spot at least three (to
    me) obvious errors:

    1) Anything with a '/' trailing will not get capitalized, so in the
    OP's example, neither "heat" nor "ac" would be capitalized at all.

    2) There are plenty of words with fewer than three letters that should
    be capitalized. The first person pronoun "I", for instance. Or even
    "of" or "is", if they're the first word in the sentence.

    3) In the absence of 1 and 2, "ac" would still get turned into "Ac"
    rather than "AC".

    Jacob Fugal
     
    Jacob Fugal, Dec 9, 2006
    #5
  6. Jacob Fugal wrote:
    > On 12/8/06, Daniel Finnie <> wrote:
    >> Oops, forgot to paste this one in:
    >> To get keep words like "of" and "is" lowercase: (basically anything
    >> under 3 letters)
    >> text.downcase.gsub(/[A-Za-z]{3,}(?!\/)/) {|x| x.capitalize}

    >
    > I agree with Paul Lutus, there are too many special cases. And
    > Daniel's regex here is a good example. I can spot at least three (to
    > me) obvious errors:
    >
    > 1) Anything with a '/' trailing will not get capitalized, so in the
    > OP's example, neither "heat" nor "ac" would be capitalized at all.

    Trailing /'s do work as long as the word before it is at least 3 letters
    long.
    irb(main):004:0> src.downcase.gsub(/[A-Za-z]{3,}(?!\/)/) {|x| x.capitalize}
    => "Additional Spa (Only Available w/Purchase of Pool or Spa) Seller
    Heat/ac/Ductwork "

    > 2) There are plenty of words with fewer than three letters that should
    > be capitalized. The first person pronoun "I", for instance. Or even
    > "of" or "is", if they're the first word in the sentence.
    >
    > 3) In the absence of 1 and 2, "ac" would still get turned into "Ac"
    > rather than "AC".


    These are valid points that I feel shouldn't be incorporated into the
    original regexp.
     
    Daniel Finnie, Dec 9, 2006
    #6
  7. Jason Vogel wrote:
    > Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
    > RegEx is the answer, just don't know where to start.
    >
    > Current Source:
    > str.split(' ').each {|w| w.capitalize!}.join(' ')
    >
    > Text:
    > ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
    > SELLER HEAT/AC/DUCTWORK
    >
    > Result:
    > Additional Spa (only Available W/purchase Of Pool Or Spa)
    > Seller Heat/ac/ductwork
    >
    > Desired:
    > Additional Spa (Only Available w/Purchase of Pool or Spa)
    > Seller Heat/AC/Ductwork
    >
    > Isssus:
    > - Need to capitalize after a "/'
    > - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
    > "w/[a]" => "w/[A]")
    >
    > Thanks,
    > Jason


    specials = %w( of or w AC ).
    inject({}){|h,s| h.update({s.downcase,s}) }

    puts DATA.read.downcase.split( /([^a-z]+)/ ).map{|s|
    specials or s.capitalize }.join

    __END__
    ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
    SELLER HEAT/AC/DUCTWORK


    --- output -----
    Additional Spa (Only Available w/Purchase of Pool or Spa)
    Seller Heat/AC/Ductwork
     
    William James, Dec 9, 2006
    #7
  8. Jason  Vogel

    Jason Vogel Guest

    William,

    This is exactly what I'm looking for. I don't understand it, but it's
    what I'm looking for.

    Would you mind explaining what your code does?

    Thanks,
    Jason



    On Dec 8, 8:23 pm, "William James" <> wrote:
    > Jason Vogel wrote:
    > > Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
    > > RegEx is the answer, just don't know where to start.

    >
    > > Current Source:
    > > str.split(' ').each {|w| w.capitalize!}.join(' ')

    >
    > > Text:
    > > ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
    > > SELLER HEAT/AC/DUCTWORK

    >
    > > Result:
    > > Additional Spa (only Available W/purchase Of Pool Or Spa)
    > > Seller Heat/ac/ductwork

    >
    > > Desired:
    > > Additional Spa (Only Available w/Purchase of Pool or Spa)
    > > Seller Heat/AC/Ductwork

    >
    > > Isssus:
    > > - Need to capitalize after a "/'
    > > - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
    > > "w/[a]" => "w/[A]")

    >
    > > Thanks,
    > > Jasonspecials = %w( of or w AC ).

    > inject({}){|h,s| h.update({s.downcase,s}) }
    >
    > puts DATA.read.downcase.split( /([^a-z]+)/ ).map{|s|
    > specials or s.capitalize }.join
    >
    > __END__
    > ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
    > SELLER HEAT/AC/DUCTWORK
    >
    > --- output -----
    > Additional Spa (Only Available w/Purchase of Pool or Spa)
    > Seller Heat/AC/Ductwork
     
    Jason Vogel, Dec 10, 2006
    #8
  9. Jason Vogel wrote:
    > William,
    >
    > This is exactly what I'm looking for. I don't understand it, but it's
    > what I'm looking for.
    >
    > Would you mind explaining what your code does?
    >
    > Thanks,
    > Jason
    >
    >
    >
    > On Dec 8, 8:23 pm, "William James" <> wrote:
    > > Jason Vogel wrote:
    > > > Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
    > > > RegEx is the answer, just don't know where to start.

    > >
    > > > Current Source:
    > > > str.split(' ').each {|w| w.capitalize!}.join(' ')

    > >
    > > > Text:
    > > > ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
    > > > SELLER HEAT/AC/DUCTWORK

    > >
    > > > Result:
    > > > Additional Spa (only Available W/purchase Of Pool Or Spa)
    > > > Seller Heat/ac/ductwork

    > >
    > > > Desired:
    > > > Additional Spa (Only Available w/Purchase of Pool or Spa)
    > > > Seller Heat/AC/Ductwork

    > >
    > > > Isssus:
    > > > - Need to capitalize after a "/'
    > > > - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
    > > > "w/[a]" => "w/[A]")

    > >
    > > > Thanks,
    > > > Jasonspecials = %w( of or w AC ).

    > > inject({}){|h,s| h.update({s.downcase,s}) }
    > >
    > > puts DATA.read.downcase.split( /([^a-z]+)/ ).map{|s|
    > > specials or s.capitalize }.join
    > >
    > > __END__
    > > ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
    > > SELLER HEAT/AC/DUCTWORK
    > >
    > > --- output -----
    > > Additional Spa (Only Available w/Purchase of Pool or Spa)
    > > Seller Heat/AC/Ductwork


    It helps to inspect the data structures.

    Try:

    specials = %w( of or w AC ).
    inject({}){|h,s| h.update({s.downcase,s}) }

    p specials

    text = DATA.read.downcase
    p text.split( /([^a-z]+)/ )
    puts text.split( /([^a-z]+)/ ).map{|s|
    specials or s.capitalize }.join

    __END__
    ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
    SELLER HEAT/AC/DUCTWORK
     
    William James, Dec 10, 2006
    #9
  10. Jason  Vogel

    Jason Vogel Guest

    On Dec 10, 10:53 am, "William James" <> wrote:
    > Jason Vogel wrote:
    > > William,

    >
    > > This is exactly what I'm looking for. I don't understand it, but it's
    > > what I'm looking for.

    >
    > > Would you mind explaining what your code does?

    >
    > > Thanks,
    > > Jason

    >
    > > On Dec 8, 8:23 pm, "William James" <> wrote:
    > > > Jason Vogel wrote:
    > > > > Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
    > > > > RegEx is the answer, just don't know where to start.

    >
    > > > > Current Source:
    > > > > str.split(' ').each {|w| w.capitalize!}.join(' ')

    >
    > > > > Text:
    > > > > ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
    > > > > SELLER HEAT/AC/DUCTWORK

    >
    > > > > Result:
    > > > > Additional Spa (only Available W/purchase Of Pool Or Spa)
    > > > > Seller Heat/ac/ductwork

    >
    > > > > Desired:
    > > > > Additional Spa (Only Available w/Purchase of Pool or Spa)
    > > > > Seller Heat/AC/Ductwork

    >
    > > > > Isssus:
    > > > > - Need to capitalize after a "/'
    > > > > - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
    > > > > "w/[a]" => "w/[A]")

    >
    > > > > Thanks,
    > > > > Jasonspecials = %w( of or w AC ).
    > > > inject({}){|h,s| h.update({s.downcase,s}) }

    >
    > > > puts DATA.read.downcase.split( /([^a-z]+)/ ).map{|s|
    > > > specials or s.capitalize }.join

    >
    > > > __END__
    > > > ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
    > > > SELLER HEAT/AC/DUCTWORK

    >
    > > > --- output -----
    > > > Additional Spa (Only Available w/Purchase of Pool or Spa)
    > > > Seller Heat/AC/DuctworkIt helps to inspect the data structures.

    >
    > Try:
    >
    > specials = %w( of or w AC ).
    > inject({}){|h,s| h.update({s.downcase,s}) }
    >
    > p specials
    >
    > text = DATA.read.downcase
    > p text.split( /([^a-z]+)/ )
    > puts text.split( /([^a-z]+)/ ).map{|s|
    > specials or s.capitalize }.join
    >
    > __END__
    > ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
    > SELLER HEAT/AC/DUCTWORK


    Paul and William,

    Thank you both for taking the time to respond and explain. I really
    appreciate it.

    Thanks,
    Jason
     
    Jason Vogel, Dec 13, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bruce W.1

    Capitalization and case of html tags?

    Bruce W.1, Dec 17, 2003, in forum: ASP .Net
    Replies:
    5
    Views:
    1,071
    Joe Molloy
    Dec 18, 2003
  2. Replies:
    1
    Views:
    492
    Chris Uppal
    May 4, 2006
  3. Ray
    Replies:
    8
    Views:
    764
    Michael Hudson
    Aug 15, 2005
  4. Replies:
    2
    Views:
    338
    Gabriel Genellina
    May 12, 2007
  5. Jim Freeze

    Instance variable capitalization

    Jim Freeze, Feb 20, 2004, in forum: Ruby
    Replies:
    7
    Views:
    141
    Robert Klemme
    Feb 23, 2004
Loading...

Share This Page