str.title question after '

Discussion in 'Python' started by Antoon Pardon, Nov 13, 2006.

  1. I have a text in ascii. I use the ' for an apostroph. The problem is
    this gives problems with the title method. I don't want letters
    after a ' to be uppercased. Here are some examples:

    argument result expected

    't smidje 'T Smidje 't Smidje
    na'ama Na'Ama Na'ama
    al pi tnu'at Al Pi Tnu'At Al Pi Tnu'at


    Is there an easy way to get what I want?

    Should the current behaviour condidered a bug?
    My would be inclined to answer yes, but that may be
    because this behaviour would be wrong in Dutch. I'm
    not so sure about english.

    --
    Antoon Pardon
     
    Antoon Pardon, Nov 13, 2006
    #1
    1. Advertising

  2. Antoon Pardon

    John Machin Guest

    Antoon Pardon wrote:
    > I have a text in ascii. I use the ' for an apostroph. The problem is
    > this gives problems with the title method. I don't want letters
    > after a ' to be uppercased. Here are some examples:
    >
    > argument result expected
    >
    > 't smidje 'T Smidje 't Smidje
    > na'ama Na'Ama Na'ama
    > al pi tnu'at Al Pi Tnu'At Al Pi Tnu'at
    >
    >
    > Is there an easy way to get what I want?


    Depends on your definition of "easy". Writing your own function that
    will regard the apostrophe as a letter would be "easy" in my book.

    >
    > Should the current behaviour condidered a bug?


    Its limitations could use some documentation.

    > My would be inclined to answer yes, but that may be
    > because this behaviour would be wrong in Dutch. I'm
    > not so sure about english.
    >


    It's not very appropriate for English, either:

    | >>> "didn't".title()
    | "Didn'T"

    It's OK for the English way of writing Irish surnames e.g. O'Brien, but
    not IMHO very good behaviour for anything else.

    The docs say: "Return a titlecased version of the string: words start
    with uppercase characters, all remaining cased characters are
    lowercase." Evidently the definition of "word" is the culprit.

    Doing titlecasing properly depends heavily on the language/locale and
    what data you are working on. For example, in the UK and anywhere that
    Scots have migrated in reasonable numbers, you would probably want to
    do McDonald and MacDonald. Avoiding nonsenses like MacE and MacHin :)
    takes some effort and a look-up table, and may not be cost-effective.

    A related problem: some people mistakenly try too hard to correct
    perceived data entry errors and also produce nonsenses -- a colleague
    of Dutch extraction occasionally received mail addressed to Mr O'Belt
    :)

    Cheers,
    John
     
    John Machin, Nov 13, 2006
    #2
    1. Advertising

  3. Antoon Pardon

    Leo Kislov Guest

    Antoon Pardon wrote:
    > I have a text in ascii. I use the ' for an apostroph. The problem is
    > this gives problems with the title method. I don't want letters
    > after a ' to be uppercased. Here are some examples:
    >
    > argument result expected
    >
    > 't smidje 'T Smidje 't Smidje
    > na'ama Na'Ama Na'ama
    > al pi tnu'at Al Pi Tnu'At Al Pi Tnu'at
    >
    >
    > Is there an easy way to get what I want?


    def title_words(s):
    words = re.split('(\s+)', s)
    return ''.join(word[0:1].upper()+word[1:] for word in words)

    >
    > Should the current behaviour condidered a bug?


    I believe it follows definition of \w from re module.

    > My would be inclined to answer yes, but that may be
    > because this behaviour would be wrong in Dutch. I'm
    > not so sure about english.


    The problem is more complicated. First of all, why title() should be
    limited to human languages? What about programming languages? Is
    "bar.bar.spam" three tokens or one in a foo programming language? There
    are some problems with human languages too: how are you going to
    process "out-of-the-box" and "italian-american"?

    -- Leo
     
    Leo Kislov, Nov 13, 2006
    #3
  4. Leo Kislov wrote:

    >> Is there an easy way to get what I want?

    >
    > def title_words(s):
    > words = re.split('(\s+)', s)
    > return ''.join(word[0:1].upper()+word[1:] for word in words)


    nit: to work well also for Unicode strings using arbitrary alphabets,
    you should use title() instead of upper(). a naive upper() will do the
    wrong thing in some cases, as can be seen in the following example:

    >>> u = u"\u01C9"
    >>> unicodedata.name(u)

    'LATIN SMALL LETTER LJ'
    >>> unicodedata.name(u.upper())

    'LATIN CAPITAL LETTER LJ'
    >>> unicodedata.name(u.title())

    'LATIN CAPITAL LETTER L WITH SMALL LETTER J'

    </F>
     
    Fredrik Lundh, Nov 13, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David
    Replies:
    2
    Views:
    480
    Thomas G. Marshall
    Aug 3, 2003
  2. Trevor

    sizeof(str) or sizeof(str) - 1 ?

    Trevor, Apr 3, 2004, in forum: C Programming
    Replies:
    9
    Views:
    636
    CBFalconer
    Apr 10, 2004
  3. Sullivan WxPyQtKinter

    It is fun.the result of str.lower(str())

    Sullivan WxPyQtKinter, Mar 7, 2006, in forum: Python
    Replies:
    5
    Views:
    340
    Tim Roberts
    Mar 9, 2006
  4. Stefan Ram

    str.equals(null) or str==null ?

    Stefan Ram, Jul 31, 2006, in forum: Java
    Replies:
    21
    Views:
    14,719
    Oliver Wong
    Aug 3, 2006
  5. maestro
    Replies:
    1
    Views:
    307
    Chris
    Aug 11, 2008
Loading...

Share This Page