feature request: a better str.endswith

M

Michele Simionato

I often feel the need to extend the string method ".endswith" to tuple
arguments, in such a way to automatically check for multiple endings.
For instance, here is a typical use case:

if filename.endswith(('.jpg','.jpeg','.gif','.png')):
print "This is a valid image file"

Currently this is not valid Python and I must use the ugly

if filename.endswith('.jpg') or filename.endswith('.jpeg') \
or filename.endswith('.gif') or filename.endswith('.png'):
print "This is a valid image file"

Of course a direct implementation is quite easy:

import sys

class Str(str):
def endswith(self,suffix,start=0,end=sys.maxint):#not sure about sys.maxint
endswith=super(Str,self).endswith
if isinstance(suffix,tuple):
return sum([endswith(s,start,end) for s in suffix]) # multi-or
return endswith(suffix,start,end)

if Str(filename).endswith(('.jpg','.jpeg','.gif','.png')):
print "This is a valid image file"

nevertheless I think this kind of checking is quite common and it would be
worth to have it in standard Python.

Any reaction, comment ?


Michele
 
J

Jp Calderone

I often feel the need to extend the string method ".endswith" to tuple
arguments, in such a way to automatically check for multiple endings.
For instance, here is a typical use case:

if filename.endswith(('.jpg','.jpeg','.gif','.png')):
print "This is a valid image file"

Currently this is not valid Python and I must use the ugly

if filename.endswith('.jpg') or filename.endswith('.jpeg') \
or filename.endswith('.gif') or filename.endswith('.png'):
print "This is a valid image file"

extensions = ('.jpg', '.jpeg', '.gif', '.png')
if filter(filename.endswith, extensions):
print "This is a valid image file

Jp
 
T

Thomas =?ISO-8859-15?Q?G=FCttler?=

Michele said:
I often feel the need to extend the string method ".endswith" to tuple
arguments, in such a way to automatically check for multiple endings.
For instance, here is a typical use case:

if filename.endswith(('.jpg','.jpeg','.gif','.png')):
print "This is a valid image file"

Currently this is not valid Python and I must use the ugly

if filename.endswith('.jpg') or filename.endswith('.jpeg') \
or filename.endswith('.gif') or filename.endswith('.png'):
print "This is a valid image file"

Of course a direct implementation is quite easy:

import sys

class Str(str):
def endswith(self,suffix,start=0,end=sys.maxint):#not sure about
sys.maxint
endswith=super(Str,self).endswith
if isinstance(suffix,tuple):
return sum([endswith(s,start,end) for s in suffix]) # multi-or
return endswith(suffix,start,end)

if Str(filename).endswith(('.jpg','.jpeg','.gif','.png')):
print "This is a valid image file"

nevertheless I think this kind of checking is quite common and it would be
worth to have it in standard Python.

Hi,

I like this feature request.

if the argument to endswith is not a string,
it should try to treat the argument as a list or tuple.

thomas
 
S

Skip Montanaro

Michele> I often feel the need to extend the string method ".endswith"
Michele> to tuple arguments, in such a way to automatically check for
Michele> multiple endings. For instance, here is a typical use case:

Michele> if filename.endswith(('.jpg','.jpeg','.gif','.png')):
Michele> print "This is a valid image file"

This is analogous to how isinstance works, where its second arg can be a
class or type or a tuple containing classes and types.

I suggest you submit a feature request to SF. A patch to stringobject.c and
unicodeobject.c would help improve chances of acceptance, and for symmetry
you should probably also modify the startswith methods of both types.

Skip
 
M

Michele Simionato

Irmen de Jong said:
Using filter Michele's original statement becomes:

if filter(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
print "This is a valid image file"

IMHO this is simple enough to not require a change to the
.endswith method...

--Irmen

I haven't thought of "filter". It is true, it works, but is it really
readable? I had to think to understand what it is doing.
My (implicit) rationale for

filename.endswith(('.jpg','.jpeg','.gif','.png'))

was that it works exactly as "isinstance", so it is quite
obvious what it is doing. I am asking just for a convenience,
which has already a precedent in the language and respects
the Principle of Least Surprise.

Michele
 
M

Michele Simionato

Skip Montanaro said:
Michele> I often feel the need to extend the string method ".endswith"
Michele> to tuple arguments, in such a way to automatically check for
Michele> multiple endings. For instance, here is a typical use case:

Michele> if filename.endswith(('.jpg','.jpeg','.gif','.png')):
Michele> print "This is a valid image file"

This is analogous to how isinstance works, where its second arg can be a
class or type or a tuple containing classes and types.

I suggest you submit a feature request to SF. A patch to stringobject.c and
unicodeobject.c would help improve chances of acceptance, and for symmetry
you should probably also modify the startswith methods of both types.

Skip

Too bad my skills with C are essentially unexistent :-(


Michele
 
S

Skip Montanaro

I suggest you submit a feature request to SF. A patch to
Michele> Too bad my skills with C are essentially unexistent :-(

Look at it as an opportunity to enhance those skills. You have plenty of
time until 2.4. ;-)

In any case, even if you can't whip up the actual C code, a complete feature
request on SF would keep it from being entirely forgotten.

Skip
 
R

Raymond Hettinger

[Michele Simionato]
[Jp]
[Irmen]
Using filter Michele's original statement becomes:

if filter(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
print "This is a valid image file"

IMHO this is simple enough to not require a change to the
.endswith method...
[Michele]
I haven't thought of "filter". It is true, it works, but is it really
readable? I had to think to understand what it is doing.
My (implicit) rationale for

filename.endswith(('.jpg','.jpeg','.gif','.png'))

was that it works exactly as "isinstance", so it is quite
obvious what it is doing. I am asking just for a convenience,
which has already a precedent in the language and respects
the Principle of Least Surprise.

I prefer that this feature not be added. Convenience functions
like this one rarely pay for themselves because:

-- The use case is not that common (afterall, endswith() isn't even
used that often).

-- It complicates the heck out of the C code

-- Checking for optional arguments results in a slight slowdown
for the normal case.

-- It is easy to implement a readable version in only two or three
lines of pure python.

-- It is harder to read because it requires background knowledge
of how endswith() handles a tuple (quick, does it take any
iterable or just a tuple, how about a subclass of tuple; is it
like min() and max() in that it *args works just as well as
argtuple; which python version implemented it, etc).

-- It is a pain to keep the language consistent. Change endswith()
and you should change startswith(). Change the string object and
you should also change the unicode object and UserString and
perhaps mmap. Update the docs for each and add test cases for
each (including weird cases with zero-length tuples and such).

-- The use case above encroaches on scanning patterns that are
already efficiently implemented by the re module.

-- Worst of all, it increases the sum total of python language to be
learned without providing much in return.

-- In general, the language can be kept more compact, efficient, and
maintainable by not trying to vectorize everything (the recent addition
of the __builtin__.sum() is a rare exception that is worth it). It is
better to use a general purpose vectorizing function (like map, filter,
or reduce). This particular case is best implemented in terms of the
some() predicate documented in the examples for the new itertools module
(though any() might have been a better name for it):

some(filename.endswith, ('.jpg','.jpeg','.gif','.png'))

The implementation of some() is better than the filter version because
it provides an "early-out" upon the first successful hit.


Raymond Hettinger
 
M

Michele Simionato

Raymond Hettinger said:
I prefer that this feature not be added. Convenience functions
like this one rarely pay for themselves because:

-- The use case is not that common (afterall, endswith() isn't even
used that often).

This is arguable.
-- It complicates the heck out of the C code

Really? Of course, you are the expert. I would do it in analogy to
"isinstance" and internally calling "ifilter" as you suggest.
-- Checking for optional arguments results in a slight slowdown
for the normal case.

Perhaps slight enough to be negligible? Of course without
implementation
we cannot say, but I would be surprised to have a sensible slowdown.
-- It is easy to implement a readable version in only two or three
lines of pure python.

Yes, but not immediately obvious. See later.
-- It is harder to read because it requires background knowledge
of how endswith() handles a tuple (quick, does it take any
iterable or just a tuple, how about a subclass of tuple; is it
like min() and max() in that it *args works just as well as
argtuple; which python version implemented it, etc).

I have used "isinstance" and never wondered about these
technicalities, so
I guess the average user should not be more concerned with .endswith.
-- It is a pain to keep the language consistent. Change endswith()
and you should change startswith(). Change the string object and
you should also change the unicode object and UserString and
perhaps mmap. Update the docs for each and add test cases for
each (including weird cases with zero-length tuples and such).

This is true for any modification of the language. One has to balance
costs and benefits. The balance is still largely subjective.
-- The use case above encroaches on scanning patterns that are
already efficiently implemented by the re module.

I think the general rule is to avoid regular expressions when
possible.
-- Worst of all, it increases the sum total of python language to be
learned without providing much in return.

That it is exactly what I am arguing *against*: there is no additional
learning
effort needed, since a similar feature is already present in
"isinstance"
and an user could be even surprised that it is not implemented in
..endswith.
-- In general, the language can be kept more compact, efficient, and
maintainable by not trying to vectorize everything (the recent addition
of the __builtin__.sum() is a rare exception that is worth it). It is
better to use a general purpose vectorizing function (like map, filter,
or reduce). This particular case is best implemented in terms of the
some() predicate documented in the examples for the new itertools module
(though any() might have been a better name for it):

some(filename.endswith, ('.jpg','.jpeg','.gif','.png'))

Uhm... don't like "some", nor "any"; what about "the"?

import itertools
the=lambda pred,seq: list(itertools.ifilter(pred,seq))
for filename in os.listdir('.'):
if the(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
print "This is a valid image"

That's readable enough for me, still not completely obvious. The first
time,
I got it wrong by defining "the=itertools.ifilter". I had the idea
that "ifilter" was acting just as "filter", which of course is not the
case
in this example.
The implementation of some() is better than the filter version because
it provides an "early-out" upon the first successful hit.

No point against that.
Raymond Hettinger

Michele Simionato

P.S. I am not going to pursue this further, since I like quite a lot

if the(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
dosomething()

Instead, I will suggest this example to be added to the itertools
documentation ;)
I could also submit it as a cookbook recipe, since I think it is
a quite useful trick.
Also, it is good to make people aware of itertool goodies
(myself I have learned something in this thread).
 
H

Hartmut Goebel

Skip said:
I suggest you submit a feature request to SF.

+1 from me :)

This is a commonly used case. Using things like stripext() is only a
solution for this specific case where filename-extensions are matched.

Michele: I suggesz menatoning this in the feature-request or simple use
a different example (not based on filename extension.)

Regards
Hartmut Goebel
 
C

Chris Perkins

Oops! My mistake, I forgot the islice; it should be

the=lambda pred,seq: list(itertools.islice(itertools.ifilter(pred,seq),0,1))

in such a way that we exit at the first hit, otherwise one could just use
the standard "filter".

How about:

def the(pred,seq): return True in itertools.imap(pred,seq)

if you really want to use the name "the" ("any" makes much more sense to me).

Chris
 
M

Michele Simionato

How about:

def the(pred,seq): return True in itertools.imap(pred,seq)

if you really want to use the name "the" ("any" makes much more sense to me).

Chris


That's a good idea, indeed. BTW, in this context I feel that

if the(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
dosomething()

is more clear than

if any(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
dosomething()

which is confusing to me since it seems "any" is referred to "filename"
whereas it is referred to the tuple elements.


M.S.
 
M

Michele Simionato

def the(pred,seq): return True in itertools.imap(pred,seq)

BTW, this suggest to me two short idiomas for multiple "or" and multiple "and",
with shortcut behavior:

def imultior(pred,iterable):
return True in itertools.imap(pred,iterable)

def imultiand(pred,iterable):
return not(False in itertools.imap(pred,iterable))

Nevertheless, they seem to be slower than the non-iterator-based
implementation :-( (at least in some preliminary profiling I did
using a list and a custom defined predicate function)

def multiand(pred,iterable):
istrue=True
for item in iterable:
istrue=istrue and pred(item)
if not istrue: return False
return True

def multior(pred,iterable):
istrue=False
for item in iterable:
istrue=istrue or pred(item)
if istrue: return True
return False

M.
 
B

Bengt Richter

(e-mail address removed) (Bengt Richter) wrote in

'all_false(...)' is simply 'not any_true(...)'
'any_false(...)' is 'not all_true(...)'

So you could get by with just two of these functions, in which case
'any_of', and 'all_of' might be suitable names.
I don't think they're equivalent if they do short-circuiting.

Regards,
Bengt Richter
 
M

Michele Simionato

I think I'd prefer

if any_true(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
dosomething()

I suspect it will more often make sense read aloud in the general

if any_true(pred, seq):

than

if the(pred, seq)

I guess the full set of functions might be
any_true, any_false, all_true, and all_false.

or maybe someone can think of better short phrase?

Regards,
Bengt Richter

I think in the specific case I was talking about "the" was quite
readable; however I agree that in the general case "any_true" etc.
would be better.
I would not be opposed to add these convenience functions in
itertools. The
advantage is standardization (i.e. I don't have to invent my own name,
different from the name chosen by anybody else), the disadvantage is
more things to learn; however, with such descriptive names, it would
be
difficult to not grasp what those functions are doing, even without
looking at the documentation. Anyway, I am sure many will be opposed,
saying that such functions are so simple that they do not deserve to
be
in the library. This would be a sensible opinion, BTW.


Michele
 
M

Michele Simionato

I think I'd prefer

if any_true(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
dosomething()

I suspect it will more often make sense read aloud in the general

if any_true(pred, seq):

than

if the(pred, seq)

I guess the full set of functions might be
any_true, any_false, all_true, and all_false.

or maybe someone can think of better short phrase?

Regards,
Bengt Richter

I think in the specific case I was talking about "the" was quite
readable; however I agree that in the general case "any_true" etc.
would be better.
I would not be opposed to add these convenience functions in
itertools. The
advantage is standardization (i.e. I don't have to invent my own name,
different from the name chosen by anybody else), the disadvantage is
more things to learn; however, with such descriptive names, it would
be
difficult to not grasp what those functions are doing, even without
looking at the documentation. Anyway, I am sure many will be opposed,
saying that such functions are so simple that they do not deserve to
be
in the library. This would be a sensible opinion, BTW.


Michele
 
M

Michele Simionato

I think I'd prefer

if any_true(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
dosomething()

I suspect it will more often make sense read aloud in the general

if any_true(pred, seq):

than

if the(pred, seq)

I guess the full set of functions might be
any_true, any_false, all_true, and all_false.

or maybe someone can think of better short phrase?

Regards,
Bengt Richter

I think in the specific case I was talking about "the" was quite
readable; however I agree that in the general case "any_true" etc.
would be better.
I would not be opposed to add these convenience functions in
itertools. The
advantage is standardization (i.e. I don't have to invent my own name,
different from the name chosen by anybody else), the disadvantage is
more things to learn; however, with such descriptive names, it would
be
difficult to not grasp what those functions are doing, even without
looking at the documentation. Anyway, I am sure many will be opposed,
saying that such functions are so simple that they do not deserve to
be
in the library. This would be a sensible opinion, BTW.


Michele
 
D

Duncan Booth

I don't think they're equivalent if they do short-circuiting.

any_true short circuits as soon as it finds one that is true.
all_false short circuits as soon as it find one that is true.

all_true short circuits as soon as it finds on that is false.
any_false ditto.

Why aren't they equivalent?
 
B

Bengt Richter

(e-mail address removed) (Bengt Richter) wrote in

any_true short circuits as soon as it finds one that is true.
all_false short circuits as soon as it find one that is true.

all_true short circuits as soon as it finds on that is false.
any_false ditto.

Why aren't they equivalent?
Oops, d'oh ... well, they're not spelled the same ;-)

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top