Probably a dumb s/// question.

Discussion in 'Perl Misc' started by Mark Healey, Mar 16, 2005.

  1. Mark Healey

    Mark Healey Guest

    I'm trying to craft a search that capitalizes letters depending on their
    context, specifically after a space or the beginning of a string.

    For example I'd like to turn

    the quick brown fox jumped over the lazy dogs.

    to

    The Quick Brown Fox Jumped Over the Lazy Dogs.

    Is this doable on a single line?


    --
    Mark Healey
    marknews(at)healeyonline(dot)com
    Mark Healey, Mar 16, 2005
    #1
    1. Advertising

  2. Mark Healey

    Paul Lalli Guest

    "Mark Healey" <> wrote in message
    news:p...
    > I'm trying to craft a search that capitalizes letters depending on

    their
    > context, specifically after a space or the beginning of a string.
    >
    > For example I'd like to turn
    >
    > the quick brown fox jumped over the lazy dogs.
    >
    > to
    >
    > The Quick Brown Fox Jumped Over the Lazy Dogs.
    >
    > Is this doable on a single line?


    What have you tried so far?

    Have you read the posting guidelines for this group, posted twice a
    week?

    Because I'm feeling generous (and bored) anyway:

    s/(^|\s)([a-z])/$1\u$2/g;


    for more information on ^, |, (), $1 & $2:
    perldoc perlre
    perldoc perlretut
    perldoc perlreref

    for more information on \u:
    perldoc -f ucfirst

    Paul Lalli
    Paul Lalli, Mar 16, 2005
    #2
    1. Advertising

  3. At 2005-03-16 11:14AM, Mark Healey <> wrote:
    > For example I'd like to turn
    > the quick brown fox jumped over the lazy dogs.
    > to
    > The Quick Brown Fox Jumped Over the Lazy Dogs.
    >
    > Is this doable on a single line?


    my $string = 'the quick brown fox jumped over the lazy dogs.';
    my $String = join ' ', map {ucfirst lc} split ' ', $string;

    That forces your string to lower case first then capitalizes the first
    letter. It won't preserve whitespace though.

    --
    Glenn Jackman
    NCF Sysadmin
    Glenn Jackman, Mar 16, 2005
    #3
  4. Mark Healey

    Ted Zlatanov Guest

    On Wed, 16 Mar 2005, wrote:

    > I'm trying to craft a search that capitalizes letters depending on their
    > context, specifically after a space or the beginning of a string.
    >
    > For example I'd like to turn
    >
    > the quick brown fox jumped over the lazy dogs.
    >
    > to
    >
    > The Quick Brown Fox Jumped Over the Lazy Dogs.
    >
    > Is this doable on a single line?


    Generally no, because of strange combinations like "Dog+Cat" or
    "yes/no". There is a tool that will do it, but the internals are much
    more than a single line :)

    http://search.cpan.org/~doom/Text-Capitalize-0.4/Capitalize.pm

    Always check CPAN first.

    Ted
    Ted Zlatanov, Mar 16, 2005
    #4
  5. Mark Healey wrote:
    > I'm trying to craft a search that capitalizes letters depending on their
    > context, specifically after a space or the beginning of a string.
    >
    > For example I'd like to turn
    >
    > the quick brown fox jumped over the lazy dogs.
    >
    > to
    >
    > The Quick Brown Fox Jumped Over the Lazy Dogs.

    ----------------------------------^

    What determines that "the" is *not* converted to "The"?

    > Is this doable on a single line?


    If it is, I suppose it would be a *very* long line. :)

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Mar 16, 2005
    #5
  6. Mark Healey

    Ted Zlatanov Guest

    On 16 Mar 2005, wrote:

    > my $string = 'the quick brown fox jumped over the lazy dogs.';
    > my $String = join ' ', map {ucfirst lc} split ' ', $string;
    >
    > That forces your string to lower case first then capitalizes the first
    > letter. It won't preserve whitespace though.


    It's probably better to do something like this:

    perl -p -e's/(\w+)/ucfirst($1)/eg'

    Text::Capitalize is even better, but the above will be closer to what
    the OP wanted I think.

    HTH
    Ted
    Ted Zlatanov, Mar 16, 2005
    #6
  7. Mark Healey

    Ted Zlatanov Guest

    On Wed, 16 Mar 2005, wrote:

    > s/(^|\s)([a-z])/$1\u$2/g;


    I always find it better to work with Perl built-ins such as ucfirst
    and \w, which respect locale and know about Unicode uppercasing rules.
    Any time I see a range like [a-z] or [A-Za-z] I try to reduce it to at
    least a POSIX class like [:alpha:] unless I must only accept [a-z].

    Ted
    Ted Zlatanov, Mar 16, 2005
    #7
  8. Glenn Jackman wrote:
    > At 2005-03-16 11:14AM, Mark Healey <> wrote:
    >
    >> For example I'd like to turn
    >> the quick brown fox jumped over the lazy dogs.
    >> to
    >> The Quick Brown Fox Jumped Over the Lazy Dogs.
    >>
    >> Is this doable on a single line?

    >
    >
    > my $string = 'the quick brown fox jumped over the lazy dogs.';
    > my $String = join ' ', map {ucfirst lc} split ' ', $string;
    >
    > That forces your string to lower case first then capitalizes the first
    > letter. It won't preserve whitespace though.


    If you want to preserve whitespace:

    my $String = join '', map {ucfirst lc} split /(\s+)/, $string;


    John
    --
    use Perl;
    program
    fulfillment
    John W. Krahn, Mar 16, 2005
    #8
  9. Ted Zlatanov wrote:

    > On Wed, 16 Mar 2005, wrote:
    >
    >
    >>s/(^|\s)([a-z])/$1\u$2/g;

    >
    >
    > I always find it better to work with Perl built-ins such as ucfirst


    Err... is \u not the same thing as ucfirst?
    Brian McCauley, Mar 16, 2005
    #9
  10. Ted Zlatanov <> wrote:
    > On Wed, 16 Mar 2005, wrote:
    >
    >> s/(^|\s)([a-z])/$1\u$2/g;

    >
    > I always find it better to work with Perl built-ins such as ucfirst
    > and \w, which respect locale and know about Unicode uppercasing rules.


    \u _is_ ucfirst(), it respects locales too.


    > Any time I see a range like [a-z] or [A-Za-z]



    Ahh, now that is a different thing.


    > I try to reduce it to at
    > least a POSIX class like [:alpha:] unless I must only accept [a-z].



    For this application, we don't need to ensure that it is a letter at all:

    s/(^|\s)(.)/$1\u$2/g;

    :)


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Mar 16, 2005
    #10
  11. Ted Zlatanov <> wrote:
    > On 16 Mar 2005, wrote:
    >
    >> my $string = 'the quick brown fox jumped over the lazy dogs.';
    >> my $String = join ' ', map {ucfirst lc} split ' ', $string;
    >>
    >> That forces your string to lower case first then capitalizes the first
    >> letter. It won't preserve whitespace though.

    >
    > It's probably better to do something like this:
    >
    > perl -p -e's/(\w+)/ucfirst($1)/eg'



    Try it on this string:

    it's a wonderful life!


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Mar 17, 2005
    #11
  12. Mark Healey

    Ted Zlatanov Guest

    On Wed, 16 Mar 2005, wrote:

    Ted Zlatanov <> wrote:

    >> It's probably better to do something like this:
    >>
    >> perl -p -e's/(\w+)/ucfirst($1)/eg'

    >
    > Try it on this string:
    >
    > it's a wonderful life!


    Like I said, Text::Capitalize is better than any of the other
    solutions, unless the OP really did mean the original requirements.
    Natural language-such as it is- can,and often IS very tricky! :)

    Ted
    Ted Zlatanov, Mar 17, 2005
    #12
  13. Mark Healey

    Ted Zlatanov Guest

    On Wed, 16 Mar 2005, wrote:

    > \u _is_ ucfirst(), it respects locales too.


    I see this in perldoc perlre:

    " \u uppercase next char (think vi)
    ....
    If "use locale" is in effect, the case map used by "\l", "\L",
    "\u" and "\U" is taken from the current locale. See
    perllocale."

    This is not, however, the same as ucfirst. It's
    uc($char1) . $rest
    not
    ucfirst($char1 . $rest)

    In fact, ucfirst() should do a Titlecase, as it's called in Unicode,
    although I don't know if the internal Perl implementation does exactly
    that. Titlecase is not the uppercasing of the first character,
    although in English it usually works that way.

    Why is this important? If you look at the Unicode standard, there is
    a good example:
    (from http://www.unicode.org/reports/tr21/tr21-3.html)

    "Characters may also have different case mappings, depending on the context.

    For example, 03A3 capital sigma lowercases to 03C3 small sigma if it
    is followed by another letter, but lowercases to 03C2 small final
    sigma if it is not."

    and later

    "Converting to Titlecase

    Map each character to its titlecase or lowercase. If the preceeding
    letter is cased, chose the lowercase mapping; otherwise chose the
    titlecase mapping (in most cases, this will be the same as the
    uppercase, but not always)."

    The Unicode standard feels strongly enough about titlecasing to define
    different terms for it and to explicitly warn about just uppercasing
    the first character. That's why I brought it up. Sorry I didn't
    expand on it earlier.

    Sorry to get into technicalities like this, but I feel the point is
    important for anyone interested in Unicode programming. Often, things
    that seem obvious in English are radically different in other
    languages and writing systems.

    > For this application, we don't need to ensure that it is a letter at all:
    >
    > s/(^|\s)(.)/$1\u$2/g;


    Yes, noted by others too. As I said, Text::Capitalize is probably
    what the OP wants. My point was just that [a-z] is a good sign of trouble.

    Ted
    Ted Zlatanov, Mar 17, 2005
    #13
  14. Mark Healey

    Joe Smith Guest

    Mark Healey wrote:

    > the quick brown fox jumped over the lazy dogs.
    > to
    > The Quick Brown Fox Jumped Over the Lazy Dogs.


    After doing the Titlecase conversions as others have mentioned,
    you then have to do some postprocessing to undo some of it.

    To lowercase 'the' when it is not at the beginning of the line:
    s/(\s(of|at|the|and)\b/\l$1/g;
    But that does not do the right thing if there is an end-of-sentence
    period (as opposed to an end-of-abbreviation period) preceding 'the'.

    -Joe
    Joe Smith, Mar 28, 2005
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Healey

    Probably a dumb s/// question.

    Mark Healey, Mar 16, 2005, in forum: Perl
    Replies:
    2
    Views:
    507
    Glenn Jackman
    Mar 16, 2005
  2. Justin
    Replies:
    2
    Views:
    251
    Justin
    Oct 9, 2006
  3. Tom Anderson

    Probably dumb JAR question

    Tom Anderson, Jan 25, 2009, in forum: Java
    Replies:
    6
    Views:
    396
    Roedy Green
    Jan 27, 2009
  4. Jerry C.
    Replies:
    8
    Views:
    207
    Uri Guttman
    Nov 23, 2003
  5. Bob

    Probably a dumb question....

    Bob, Jan 23, 2007, in forum: Javascript
    Replies:
    2
    Views:
    66
Loading...

Share This Page