feature request: a better str.endswith

Discussion in 'Python' started by Michele Simionato, Jul 18, 2003.

  1. I often feel the need to extend the string method ".endswith" to tuple
    arguments, in such a way to automatically check for multiple endings.
    For instance, here is a typical use case:

    if filename.endswith(('.jpg','.jpeg','.gif','.png')):
    print "This is a valid image file"

    Currently this is not valid Python and I must use the ugly

    if filename.endswith('.jpg') or filename.endswith('.jpeg') \
    or filename.endswith('.gif') or filename.endswith('.png'):
    print "This is a valid image file"

    Of course a direct implementation is quite easy:

    import sys

    class Str(str):
    def endswith(self,suffix,start=0,end=sys.maxint):#not sure about sys.maxint
    endswith=super(Str,self).endswith
    if isinstance(suffix,tuple):
    return sum([endswith(s,start,end) for s in suffix]) # multi-or
    return endswith(suffix,start,end)

    if Str(filename).endswith(('.jpg','.jpeg','.gif','.png')):
    print "This is a valid image file"

    nevertheless I think this kind of checking is quite common and it would be
    worth to have it in standard Python.

    Any reaction, comment ?


    Michele
    Michele Simionato, Jul 18, 2003
    #1
    1. Advertising

  2. Michele Simionato

    Jp Calderone Guest

    On Fri, Jul 18, 2003 at 05:01:47AM -0700, Michele Simionato wrote:
    > I often feel the need to extend the string method ".endswith" to tuple
    > arguments, in such a way to automatically check for multiple endings.
    > For instance, here is a typical use case:
    >
    > if filename.endswith(('.jpg','.jpeg','.gif','.png')):
    > print "This is a valid image file"
    >
    > Currently this is not valid Python and I must use the ugly
    >
    > if filename.endswith('.jpg') or filename.endswith('.jpeg') \
    > or filename.endswith('.gif') or filename.endswith('.png'):
    > print "This is a valid image file"


    extensions = ('.jpg', '.jpeg', '.gif', '.png')
    if filter(filename.endswith, extensions):
    print "This is a valid image file

    Jp

    --
    "Pascal is Pascal is Pascal is dog meat."
    -- M. Devine and P. Larson, Computer Science 340
    Jp Calderone, Jul 18, 2003
    #2
    1. Advertising

  3. Michele Simionato wrote:

    > I often feel the need to extend the string method ".endswith" to tuple
    > arguments, in such a way to automatically check for multiple endings.
    > For instance, here is a typical use case:
    >
    > if filename.endswith(('.jpg','.jpeg','.gif','.png')):
    > print "This is a valid image file"
    >
    > Currently this is not valid Python and I must use the ugly
    >
    > if filename.endswith('.jpg') or filename.endswith('.jpeg') \
    > or filename.endswith('.gif') or filename.endswith('.png'):
    > print "This is a valid image file"
    >
    > Of course a direct implementation is quite easy:
    >
    > import sys
    >
    > class Str(str):
    > def endswith(self,suffix,start=0,end=sys.maxint):#not sure about
    > sys.maxint
    > endswith=super(Str,self).endswith
    > if isinstance(suffix,tuple):
    > return sum([endswith(s,start,end) for s in suffix]) # multi-or
    > return endswith(suffix,start,end)
    >
    > if Str(filename).endswith(('.jpg','.jpeg','.gif','.png')):
    > print "This is a valid image file"
    >
    > nevertheless I think this kind of checking is quite common and it would be
    > worth to have it in standard Python.


    Hi,

    I like this feature request.

    if the argument to endswith is not a string,
    it should try to treat the argument as a list or tuple.

    thomas
    Thomas =?ISO-8859-15?Q?G=FCttler?=, Jul 18, 2003
    #3
  4. Michele> I often feel the need to extend the string method ".endswith"
    Michele> to tuple arguments, in such a way to automatically check for
    Michele> multiple endings. For instance, here is a typical use case:

    Michele> if filename.endswith(('.jpg','.jpeg','.gif','.png')):
    Michele> print "This is a valid image file"

    This is analogous to how isinstance works, where its second arg can be a
    class or type or a tuple containing classes and types.

    I suggest you submit a feature request to SF. A patch to stringobject.c and
    unicodeobject.c would help improve chances of acceptance, and for symmetry
    you should probably also modify the startswith methods of both types.

    Skip
    Skip Montanaro, Jul 18, 2003
    #4
  5. Irmen de Jong <> wrote in message news:<3f17f883$0$49107$4all.nl>...
    > Jp Calderone wrote:
    > > On Fri, Jul 18, 2003 at 05:01:47AM -0700, Michele Simionato wrote:
    > >
    > >>I often feel the need to extend the string method ".endswith" to tuple
    > >>arguments, in such a way to automatically check for multiple endings.
    > >>For instance, here is a typical use case:
    > >>
    > >>if filename.endswith(('.jpg','.jpeg','.gif','.png')):
    > >> print "This is a valid image file"
    > >>
    > >>Currently this is not valid Python and I must use the ugly
    > >>
    > >>if filename.endswith('.jpg') or filename.endswith('.jpeg') \
    > >> or filename.endswith('.gif') or filename.endswith('.png'):
    > >> print "This is a valid image file"

    > >
    > >
    > > extensions = ('.jpg', '.jpeg', '.gif', '.png')
    > > if filter(filename.endswith, extensions):
    > > print "This is a valid image file
    > >
    > > Jp
    > >

    >
    > Using filter Michele's original statement becomes:
    >
    > if filter(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
    > print "This is a valid image file"
    >
    > IMHO this is simple enough to not require a change to the
    > .endswith method...
    >
    > --Irmen


    I haven't thought of "filter". It is true, it works, but is it really
    readable? I had to think to understand what it is doing.
    My (implicit) rationale for

    filename.endswith(('.jpg','.jpeg','.gif','.png'))

    was that it works exactly as "isinstance", so it is quite
    obvious what it is doing. I am asking just for a convenience,
    which has already a precedent in the language and respects
    the Principle of Least Surprise.

    Michele
    Michele Simionato, Jul 19, 2003
    #5
  6. Skip Montanaro <> wrote in message news:<>...
    > Michele> I often feel the need to extend the string method ".endswith"
    > Michele> to tuple arguments, in such a way to automatically check for
    > Michele> multiple endings. For instance, here is a typical use case:
    >
    > Michele> if filename.endswith(('.jpg','.jpeg','.gif','.png')):
    > Michele> print "This is a valid image file"
    >
    > This is analogous to how isinstance works, where its second arg can be a
    > class or type or a tuple containing classes and types.
    >
    > I suggest you submit a feature request to SF. A patch to stringobject.c and
    > unicodeobject.c would help improve chances of acceptance, and for symmetry
    > you should probably also modify the startswith methods of both types.
    >
    > Skip


    Too bad my skills with C are essentially unexistent :-(


    Michele
    Michele Simionato, Jul 19, 2003
    #6
  7. >> I suggest you submit a feature request to SF. A patch to
    >> stringobject.c and unicodeobject.c would help improve chances of
    >> acceptance, and for symmetry you should probably also modify the
    >> startswith methods of both types.


    Michele> Too bad my skills with C are essentially unexistent :-(

    Look at it as an opportunity to enhance those skills. You have plenty of
    time until 2.4. ;-)

    In any case, even if you can't whip up the actual C code, a complete feature
    request on SF would keep it from being entirely forgotten.

    Skip
    Skip Montanaro, Jul 19, 2003
    #7
  8. [Michele Simionato]
    > > >>I often feel the need to extend the string method ".endswith" to tuple
    > > >>arguments, in such a way to automatically check for multiple endings.
    > > >>For instance, here is a typical use case:
    > > >>
    > > >>if filename.endswith(('.jpg','.jpeg','.gif','.png')):
    > > >> print "This is a valid image file"


    [Jp]
    > > > extensions = ('.jpg', '.jpeg', '.gif', '.png')
    > > > if filter(filename.endswith, extensions):
    > > > print "This is a valid image file
    > > >
    > > > Jp



    [Irmen]
    > > Using filter Michele's original statement becomes:
    > >
    > > if filter(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
    > > print "This is a valid image file"
    > >
    > > IMHO this is simple enough to not require a change to the
    > > .endswith method...


    [Michele]
    > I haven't thought of "filter". It is true, it works, but is it really
    > readable? I had to think to understand what it is doing.
    > My (implicit) rationale for
    >
    > filename.endswith(('.jpg','.jpeg','.gif','.png'))
    >
    > was that it works exactly as "isinstance", so it is quite
    > obvious what it is doing. I am asking just for a convenience,
    > which has already a precedent in the language and respects
    > the Principle of Least Surprise.


    I prefer that this feature not be added. Convenience functions
    like this one rarely pay for themselves because:

    -- The use case is not that common (afterall, endswith() isn't even
    used that often).

    -- It complicates the heck out of the C code

    -- Checking for optional arguments results in a slight slowdown
    for the normal case.

    -- It is easy to implement a readable version in only two or three
    lines of pure python.

    -- It is harder to read because it requires background knowledge
    of how endswith() handles a tuple (quick, does it take any
    iterable or just a tuple, how about a subclass of tuple; is it
    like min() and max() in that it *args works just as well as
    argtuple; which python version implemented it, etc).

    -- It is a pain to keep the language consistent. Change endswith()
    and you should change startswith(). Change the string object and
    you should also change the unicode object and UserString and
    perhaps mmap. Update the docs for each and add test cases for
    each (including weird cases with zero-length tuples and such).

    -- The use case above encroaches on scanning patterns that are
    already efficiently implemented by the re module.

    -- Worst of all, it increases the sum total of python language to be
    learned without providing much in return.

    -- In general, the language can be kept more compact, efficient, and
    maintainable by not trying to vectorize everything (the recent addition
    of the __builtin__.sum() is a rare exception that is worth it). It is
    better to use a general purpose vectorizing function (like map, filter,
    or reduce). This particular case is best implemented in terms of the
    some() predicate documented in the examples for the new itertools module
    (though any() might have been a better name for it):

    some(filename.endswith, ('.jpg','.jpeg','.gif','.png'))

    The implementation of some() is better than the filter version because
    it provides an "early-out" upon the first successful hit.


    Raymond Hettinger
    Raymond Hettinger, Jul 20, 2003
    #8
  9. "Raymond Hettinger" <> wrote in message news:<NpkSa.16049$>..
    > I prefer that this feature not be added. Convenience functions
    > like this one rarely pay for themselves because:
    >
    > -- The use case is not that common (afterall, endswith() isn't even
    > used that often).


    This is arguable.

    > -- It complicates the heck out of the C code


    Really? Of course, you are the expert. I would do it in analogy to
    "isinstance" and internally calling "ifilter" as you suggest.

    > -- Checking for optional arguments results in a slight slowdown
    > for the normal case.


    Perhaps slight enough to be negligible? Of course without
    implementation
    we cannot say, but I would be surprised to have a sensible slowdown.

    > -- It is easy to implement a readable version in only two or three
    > lines of pure python.


    Yes, but not immediately obvious. See later.

    > -- It is harder to read because it requires background knowledge
    > of how endswith() handles a tuple (quick, does it take any
    > iterable or just a tuple, how about a subclass of tuple; is it
    > like min() and max() in that it *args works just as well as
    > argtuple; which python version implemented it, etc).


    I have used "isinstance" and never wondered about these
    technicalities, so
    I guess the average user should not be more concerned with .endswith.

    > -- It is a pain to keep the language consistent. Change endswith()
    > and you should change startswith(). Change the string object and
    > you should also change the unicode object and UserString and
    > perhaps mmap. Update the docs for each and add test cases for
    > each (including weird cases with zero-length tuples and such).


    This is true for any modification of the language. One has to balance
    costs and benefits. The balance is still largely subjective.

    > -- The use case above encroaches on scanning patterns that are
    > already efficiently implemented by the re module.


    I think the general rule is to avoid regular expressions when
    possible.

    > -- Worst of all, it increases the sum total of python language to be
    > learned without providing much in return.


    That it is exactly what I am arguing *against*: there is no additional
    learning
    effort needed, since a similar feature is already present in
    "isinstance"
    and an user could be even surprised that it is not implemented in
    ..endswith.

    > -- In general, the language can be kept more compact, efficient, and
    > maintainable by not trying to vectorize everything (the recent addition
    > of the __builtin__.sum() is a rare exception that is worth it). It is
    > better to use a general purpose vectorizing function (like map, filter,
    > or reduce). This particular case is best implemented in terms of the
    > some() predicate documented in the examples for the new itertools module
    > (though any() might have been a better name for it):
    >
    > some(filename.endswith, ('.jpg','.jpeg','.gif','.png'))


    Uhm... don't like "some", nor "any"; what about "the"?

    import itertools
    the=lambda pred,seq: list(itertools.ifilter(pred,seq))
    for filename in os.listdir('.'):
    if the(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
    print "This is a valid image"

    That's readable enough for me, still not completely obvious. The first
    time,
    I got it wrong by defining "the=itertools.ifilter". I had the idea
    that "ifilter" was acting just as "filter", which of course is not the
    case
    in this example.

    > The implementation of some() is better than the filter version because
    > it provides an "early-out" upon the first successful hit.


    No point against that.
    >
    > Raymond Hettinger


    Michele Simionato

    P.S. I am not going to pursue this further, since I like quite a lot

    if the(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
    dosomething()

    Instead, I will suggest this example to be added to the itertools
    documentation ;)
    I could also submit it as a cookbook recipe, since I think it is
    a quite useful trick.
    Also, it is good to make people aware of itertool goodies
    (myself I have learned something in this thread).
    Michele Simionato, Jul 20, 2003
    #9
  10. Skip Montanaro schrieb:

    > I suggest you submit a feature request to SF.


    +1 from me :)

    This is a commonly used case. Using things like stripext() is only a
    solution for this specific case where filename-extensions are matched.

    Michele: I suggesz menatoning this in the feature-request or simple use
    a different example (not based on filename extension.)

    Regards
    Hartmut Goebel
    --
    | Hartmut Goebel | IT-Security -- effizient |
    | | www.goebel-consult.de |
    Hartmut Goebel, Jul 21, 2003
    #10
  11. (Michele Simionato) wrote in message news:<>...
    > Oops! My mistake, I forgot the islice; it should be
    >
    > the=lambda pred,seq: list(itertools.islice(itertools.ifilter(pred,seq),0,1))
    >
    > in such a way that we exit at the first hit, otherwise one could just use
    > the standard "filter".


    How about:

    def the(pred,seq): return True in itertools.imap(pred,seq)

    if you really want to use the name "the" ("any" makes much more sense to me).

    Chris
    Chris Perkins, Jul 21, 2003
    #11
  12. (Chris Perkins) wrote in message news:<>...
    > (Michele Simionato) wrote in message news:<>...
    > > Oops! My mistake, I forgot the islice; it should be
    > >
    > > the=lambda pred,seq: list(itertools.islice(itertools.ifilter(pred,seq),0,1))
    > >
    > > in such a way that we exit at the first hit, otherwise one could just use
    > > the standard "filter".

    >
    > How about:
    >
    > def the(pred,seq): return True in itertools.imap(pred,seq)
    >
    > if you really want to use the name "the" ("any" makes much more sense to me).
    >
    > Chris



    That's a good idea, indeed. BTW, in this context I feel that

    if the(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
    dosomething()

    is more clear than

    if any(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
    dosomething()

    which is confusing to me since it seems "any" is referred to "filename"
    whereas it is referred to the tuple elements.


    M.S.
    Michele Simionato, Jul 21, 2003
    #12
  13. (Chris Perkins) wrote in message news:<>...
    > def the(pred,seq): return True in itertools.imap(pred,seq)


    BTW, this suggest to me two short idiomas for multiple "or" and multiple "and",
    with shortcut behavior:

    def imultior(pred,iterable):
    return True in itertools.imap(pred,iterable)

    def imultiand(pred,iterable):
    return not(False in itertools.imap(pred,iterable))

    Nevertheless, they seem to be slower than the non-iterator-based
    implementation :-( (at least in some preliminary profiling I did
    using a list and a custom defined predicate function)

    def multiand(pred,iterable):
    istrue=True
    for item in iterable:
    istrue=istrue and pred(item)
    if not istrue: return False
    return True

    def multior(pred,iterable):
    istrue=False
    for item in iterable:
    istrue=istrue or pred(item)
    if istrue: return True
    return False

    M.
    Michele Simionato, Jul 21, 2003
    #13
  14. On Tue, 22 Jul 2003 08:34:13 +0000 (UTC), Duncan Booth <> wrote:

    > (Bengt Richter) wrote in news:bfi4ir$t21$0@216.39.172.122:
    >
    >> I guess the full set of functions might be
    >> any_true, any_false, all_true, and all_false.
    >>
    >> or maybe someone can think of better short phrase?
    >>

    >
    >'all_false(...)' is simply 'not any_true(...)'
    >'any_false(...)' is 'not all_true(...)'
    >
    >So you could get by with just two of these functions, in which case
    >'any_of', and 'all_of' might be suitable names.
    >

    I don't think they're equivalent if they do short-circuiting.

    Regards,
    Bengt Richter
    Bengt Richter, Jul 22, 2003
    #14
  15. (Bengt Richter) wrote in message news:<bfi4ir$t21$0@216.39.172.122>...
    > I think I'd prefer
    >
    > if any_true(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
    > dosomething()
    >
    > I suspect it will more often make sense read aloud in the general
    >
    > if any_true(pred, seq):
    >
    > than
    >
    > if the(pred, seq)
    >
    > I guess the full set of functions might be
    > any_true, any_false, all_true, and all_false.
    >
    > or maybe someone can think of better short phrase?
    >
    > Regards,
    > Bengt Richter


    I think in the specific case I was talking about "the" was quite
    readable; however I agree that in the general case "any_true" etc.
    would be better.
    I would not be opposed to add these convenience functions in
    itertools. The
    advantage is standardization (i.e. I don't have to invent my own name,
    different from the name chosen by anybody else), the disadvantage is
    more things to learn; however, with such descriptive names, it would
    be
    difficult to not grasp what those functions are doing, even without
    looking at the documentation. Anyway, I am sure many will be opposed,
    saying that such functions are so simple that they do not deserve to
    be
    in the library. This would be a sensible opinion, BTW.


    Michele
    Michele Simionato, Jul 23, 2003
    #15
  16. (Bengt Richter) wrote in message news:<bfi4ir$t21$0@216.39.172.122>...
    > I think I'd prefer
    >
    > if any_true(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
    > dosomething()
    >
    > I suspect it will more often make sense read aloud in the general
    >
    > if any_true(pred, seq):
    >
    > than
    >
    > if the(pred, seq)
    >
    > I guess the full set of functions might be
    > any_true, any_false, all_true, and all_false.
    >
    > or maybe someone can think of better short phrase?
    >
    > Regards,
    > Bengt Richter


    I think in the specific case I was talking about "the" was quite
    readable; however I agree that in the general case "any_true" etc.
    would be better.
    I would not be opposed to add these convenience functions in
    itertools. The
    advantage is standardization (i.e. I don't have to invent my own name,
    different from the name chosen by anybody else), the disadvantage is
    more things to learn; however, with such descriptive names, it would
    be
    difficult to not grasp what those functions are doing, even without
    looking at the documentation. Anyway, I am sure many will be opposed,
    saying that such functions are so simple that they do not deserve to
    be
    in the library. This would be a sensible opinion, BTW.


    Michele
    Michele Simionato, Jul 23, 2003
    #16
  17. (Bengt Richter) wrote in message news:<bfi4ir$t21$0@216.39.172.122>...
    > I think I'd prefer
    >
    > if any_true(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
    > dosomething()
    >
    > I suspect it will more often make sense read aloud in the general
    >
    > if any_true(pred, seq):
    >
    > than
    >
    > if the(pred, seq)
    >
    > I guess the full set of functions might be
    > any_true, any_false, all_true, and all_false.
    >
    > or maybe someone can think of better short phrase?
    >
    > Regards,
    > Bengt Richter


    I think in the specific case I was talking about "the" was quite
    readable; however I agree that in the general case "any_true" etc.
    would be better.
    I would not be opposed to add these convenience functions in
    itertools. The
    advantage is standardization (i.e. I don't have to invent my own name,
    different from the name chosen by anybody else), the disadvantage is
    more things to learn; however, with such descriptive names, it would
    be
    difficult to not grasp what those functions are doing, even without
    looking at the documentation. Anyway, I am sure many will be opposed,
    saying that such functions are so simple that they do not deserve to
    be
    in the library. This would be a sensible opinion, BTW.


    Michele
    Michele Simionato, Jul 23, 2003
    #17
  18. Michele Simionato

    Duncan Booth Guest

    (Bengt Richter) wrote in news:bfjokm$kbc$0@216.39.172.122:

    >>'all_false(...)' is simply 'not any_true(...)'
    >>'any_false(...)' is 'not all_true(...)'
    >>
    >>So you could get by with just two of these functions, in which case
    >>'any_of', and 'all_of' might be suitable names.
    >>

    > I don't think they're equivalent if they do short-circuiting.
    >


    any_true short circuits as soon as it finds one that is true.
    all_false short circuits as soon as it find one that is true.

    all_true short circuits as soon as it finds on that is false.
    any_false ditto.

    Why aren't they equivalent?

    --
    Duncan Booth
    int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
    "\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
    Duncan Booth, Jul 23, 2003
    #18
  19. On Wed, 23 Jul 2003 08:06:17 +0000 (UTC), Duncan Booth <> wrote:

    > (Bengt Richter) wrote in news:bfjokm$kbc$0@216.39.172.122:
    >
    >>>'all_false(...)' is simply 'not any_true(...)'
    >>>'any_false(...)' is 'not all_true(...)'
    >>>
    >>>So you could get by with just two of these functions, in which case
    >>>'any_of', and 'all_of' might be suitable names.
    >>>

    >> I don't think they're equivalent if they do short-circuiting.
    >>

    >
    >any_true short circuits as soon as it finds one that is true.
    >all_false short circuits as soon as it find one that is true.
    >
    >all_true short circuits as soon as it finds on that is false.
    >any_false ditto.
    >
    >Why aren't they equivalent?
    >

    Oops, d'oh ... well, they're not spelled the same ;-)

    Regards,
    Bengt Richter
    Bengt Richter, Jul 23, 2003
    #19
  20. Michele Simionato, Jul 29, 2003
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    17
    Views:
    1,832
    Chris Uppal
    Nov 16, 2005
  2. metaperl
    Replies:
    5
    Views:
    288
    Lawrence D'Oliveiro
    Sep 29, 2006
  3. =?utf-8?B?Qm9yaXMgRHXFoWVr?=

    Significance of "start" parameter to string method "endswith"

    =?utf-8?B?Qm9yaXMgRHXFoWVr?=, Apr 19, 2007, in forum: Python
    Replies:
    4
    Views:
    425
    John Machin
    Apr 19, 2007
  4. =?utf-8?B?Qm9yaXMgRHXFoWVr?=
    Replies:
    5
    Views:
    315
    Steven D'Aprano
    Apr 21, 2007
  5. Ethan Furman
    Replies:
    4
    Views:
    238
    Roy Smith
    May 27, 2011
Loading...

Share This Page