Re: using split for a string : error

Discussion in 'Python' started by Chris Angelico, Jan 24, 2013.

  1. On Thu, Jan 24, 2013 at 10:16 PM, Tobias M. <> wrote:
    > Chris Angelico wrote:
    >> The other thing you may want to consider, if the values are supposed
    >> to be integers, is to convert them to Python integers before
    >> comparing.

    >
    > I thought of this too and I wonder if there are any major differences
    > regarding performance compared to using the strip() method when parsing
    > large files.
    >
    > In addition I guess one should catch the ValueError that might be raised by
    > the cast if there is something else than a number in the file.


    I'd not consider the performance, but the correctness. If you're
    expecting them to be integers, just cast them, and specifically
    _don't_ catch ValueError. Any non-integer value will then noisily
    abort the script. (It may be worth checking for blank first, though,
    depending on the data origin.)

    It's usually fine to have int() complain about any non-numerics in the
    string, but I must confess, I do sometimes yearn for atoi() semantics:
    atoi("123asd") == 123, and atoi("qqq") == 0. I've not seen a
    convenient Python function for doing that. Usually it involves
    manually getting the digits off the front. All I want is to suppress
    the error on finding a non-digit. Oh well.

    ChrisA
     
    Chris Angelico, Jan 24, 2013
    #1
    1. Advertising

  2. Chris Angelico wrote:

    > It's usually fine to have int() complain about any non-numerics in the
    > string, but I must confess, I do sometimes yearn for atoi() semantics:
    > atoi("123asd") == 123, and atoi("qqq") == 0. I've not seen a
    > convenient Python function for doing that. Usually it involves
    > manually getting the digits off the front. All I want is to suppress
    > the error on finding a non-digit. Oh well.


    It's easy enough to write your own. All you need do is decide what you
    mean by "suppress the error on finding a non-digit".

    Should atoi("123xyz456") return 123 or 123456?

    Should atoi("xyz123") return 0 or 123?

    And here's a good one:

    Should atoi("1OOl") return 1, 100, or 1001?

    That last is a serious suggestion by the way. There are still many people
    who do not distinguish between 1 and l or 0 and O.

    Actually I lied. It's not that easy. Consider:

    py> s = '໑໒໙'
    py> int(s)
    129


    Actually I lied again. It's not that hard:


    def atoi(s):
    from unicodedata import digit
    i = 0
    for c in s:
    i *= 10
    i += digit(c, 0)
    return i


    Variations that stop on the first non-digit, instead of treating them as
    zero, are not much more difficult.



    --
    Steven
     
    Steven D'Aprano, Jan 25, 2013
    #2
    1. Advertising

  3. On Fri, Jan 25, 2013 at 11:20 AM, Steven D'Aprano
    <> wrote:
    > Chris Angelico wrote:
    >
    >> It's usually fine to have int() complain about any non-numerics in the
    >> string, but I must confess, I do sometimes yearn for atoi() semantics:
    >> atoi("123asd") == 123, and atoi("qqq") == 0. I've not seen a
    >> convenient Python function for doing that. Usually it involves
    >> manually getting the digits off the front. All I want is to suppress
    >> the error on finding a non-digit. Oh well.

    >
    > It's easy enough to write your own. All you need do is decide what you
    > mean by "suppress the error on finding a non-digit".
    >
    > Should atoi("123xyz456") return 123 or 123456?
    >
    > Should atoi("xyz123") return 0 or 123?
    >
    > And here's a good one:
    >
    > Should atoi("1OOl") return 1, 100, or 1001?


    123, 0, and 1. That's standard atoi semantics.

    > That last is a serious suggestion by the way. There are still many people
    > who do not distinguish between 1 and l or 0 and O.


    Sure. But I'm not trying to cater to people who get it wrong; that's a
    job for a DWIM.

    > def atoi(s):
    > from unicodedata import digit
    > i = 0
    > for c in s:
    > i *= 10
    > i += digit(c, 0)
    > return i
    >
    > Variations that stop on the first non-digit, instead of treating them as
    > zero, are not much more difficult.


    And yes, I'm fully aware that I can roll my own. Here's a shorter
    version (ASCII digits only, feel free to expand to Unicode), not
    necessarily better:

    def atoi(s):
    return int("0"+s[:-len(s.lstrip("0123456789"))])

    It just seems silly that this should have to be done separately, when
    it's really just a tweak to the usual string-to-int conversion: when
    you come to a non-digit, take one of three options (throw error, skip,
    or terminate).

    Anyway, not a big deal.

    ChrisA
     
    Chris Angelico, Jan 25, 2013
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    492
  2. Carlos Ribeiro
    Replies:
    11
    Views:
    739
    Alex Martelli
    Sep 17, 2004
  3. trans.  (T. Onoma)

    split on '' (and another for split -1)

    trans. (T. Onoma), Dec 27, 2004, in forum: Ruby
    Replies:
    10
    Views:
    238
    Florian Gross
    Dec 28, 2004
  4. Sam Kong
    Replies:
    5
    Views:
    277
    Rick DeNatale
    Aug 12, 2006
  5. Stanley Xu
    Replies:
    2
    Views:
    708
    Stanley Xu
    Mar 23, 2011
Loading...

Share This Page