Python arrays and sting formatting options

S

Steven D'Aprano

I would weaken that claim a tad... I'd say it is "usual" to write
something like this:

alist = []
for x in some_values:
alist.append(something_from_x)


but it is not uncommon (at least not in my code) to write something
like this equivalent code instead:

alist = [None]*len(some_values)
for i, x in enumerate(some_values):
alist = something_from_x


I have never done this, except in the beginning I used Python, and --
maybe more importantly -- I've never seen this in others code. I really
looks like a construct from someone who is still programming in some
other language(s).



It occurs at least twice in the 2.5 standard library, once in
sre_parse.py:

groups = []
groupsappend = groups.append
literals = [None] * len(p)
for c, s in p:
if c is MARK:
groupsappend((i, s))
# literal is already None
else:
literals = s



and another time in xdrlib.py:

succeedlist = [1] * len(packtest)
count = 0
for method, args in packtest:
print 'pack test', count,
try:
method(*args)
print 'succeeded'
except ConversionError, var:
print 'ConversionError:', var.msg
succeedlist[count] = 0
count = count + 1



When will it be more natural to introduce an unnecessary index?

We can agree that the two idioms are functionally equivalent. Appending
is marginally less efficient, because the Python runtime engine has to
periodically resize the list as it grows, and that can in principle take
an arbitrary amount of time if it causes virtual memory paging. But
that's unlikely to be a significant factor for any but the biggest lists.

So in the same way that any while-loop can be rewritten as a recursive
function, and vice versa, so these two idioms can be trivially re-written
from one form to the other. When should you use one or the other?

When the algorithm you have is conceptually about growing a list by
appending to the end, then you should grow the list by appending to the
end. And when the algorithm is conceptually about dropping values into
pre-existing pigeon holes, then you should initialize the list and then
walk it, modifying the values in place.

And if the algorithm in indifferent to which idiom you use, then you
should use whichever idiom you are most comfortable with, and not claim
there's Only One True Way to build a list.

Everything acts by magic unless you know what it does. The Fortran

read(*,*)(a(i,j,k),j=1,3)

in the OP's first post looks like magic too.

It sure does. My memories of Fortran aren't good enough to remember what
that does.

But I think you do Python a disservice. One of my Perl coders was writing
some Python code the other day, and he was amazed at how guessable Python
was. You can often guess the right way to do something. He wanted a set
with all the elements of another set removed, so he guess that s1-s2
would do the job -- and it did. A lot of Python is amazingly readable to
people with no Python experience at all. But not everything.

I admit that my code shows
off advanced Python features but I don't think ``with`` is one of them.
It makes it easier to write robust code and maybe even understandable
without documentation by just reading it as "English text".

The first problem with "with" is that it looks like the Pascal "with"
statement, but acts nothing like it. That may confuse anyone with Pascal
experience, and there are a lot of us out there.

The second difficulty is that:

with open('test.txt') as lines:

binds the result of open() to the name "lines". How is that different
from "lines = open('test.txt')"? I know the answer, but we shouldn't
expect newbies coming across it to be anything but perplexed.

Now that the newbie has determined that lines is a file object, the very
next thing you do is assign something completely different to 'lines':

lines = (line for line in lines if line.strip())

So the reader needs to know that brackets aren't just for grouping like
in most other languages, but also that (x) can be equivalent to a for-
loop. They need to know, or guess, that iterating over a file object
returns lines of the file, and they have to keep the two different
bindings of "lines" straight in their head in a piece of code that uses
"lines" twice and "line" three times.

And then they hit the next line, which includes a function called
"partial", which has a technical meaning out of functional languages and
I am sure it will mean nothing whatsoever to anyone unfamiliar to it.
It's not something that is guessable, unlike open() or len() or append().


Do you mean `lines`? Then I disagree because the (duck) type is always
"iterable over lines". I just changed the content by filtering.

Nevertheless, for people coming from less dynamic languages than Python
(such as Fortran), it is a common idiom to never use the same variable
for two different things. It's not a bad choice really: imagine reading a
function where the name "lines" started off as an integer number of
lines, then became a template string, then was used for a list of
character positions...

Of course I'm not suggesting that your code was that bad. But rebinding a
name does make code harder to understand.
 
M

Marc 'BlackJack' Rintsch

I would weaken that claim a tad... I'd say it is "usual" to write
something like this:

alist = []
for x in some_values:
alist.append(something_from_x)


but it is not uncommon (at least not in my code) to write something
like this equivalent code instead:

alist = [None]*len(some_values)
for i, x in enumerate(some_values):
alist = something_from_x


I have never done this, except in the beginning I used Python, and --
maybe more importantly -- I've never seen this in others code. I
really looks like a construct from someone who is still programming in
some other language(s).



It occurs at least twice in the 2.5 standard library, once in
sre_parse.py:

groups = []
groupsappend = groups.append
literals = [None] * len(p)
for c, s in p:
if c is MARK:
groupsappend((i, s))
# literal is already None
else:
literals = s

and another time in xdrlib.py:

succeedlist = [1] * len(packtest)
count = 0
for method, args in packtest:
print 'pack test', count,
try:
method(*args)
print 'succeeded'
except ConversionError, var:
print 'ConversionError:', var.msg
succeedlist[count] = 0
count = count + 1


I guess the first falls into the "micro optimization" category because it
binds `groups.append` to a name to spare the attribute look up within the
loop.

Both have in common that not every iteration changes the list, i.e. the
preset values are not just place holders but values that are actually
used sometimes. That is different from creating a list of place holders
that are all overwritten in any case.
It sure does. My memories of Fortran aren't good enough to remember what
that does.

But I think you do Python a disservice. One of my Perl coders was
writing some Python code the other day, and he was amazed at how
guessable Python was. You can often guess the right way to do something.

I think my code would be as guessable to a Lisp, Scheme, or Haskell
coder. Okay, Lispers and Schemers might object the ugly syntax. ;-)
The first problem with "with" is that it looks like the Pascal "with"
statement, but acts nothing like it. That may confuse anyone with Pascal
experience, and there are a lot of us out there.

But Python is not Pascal either. Nonetheless a Pascal coder might guess
what the ``with`` does. Not all the gory details but that it opens a
file and introduces `lines` should be more or less obvious to someone who
has programmed before.
The second difficulty is that:

with open('test.txt') as lines:

binds the result of open() to the name "lines". How is that different
from "lines = open('test.txt')"? I know the answer, but we shouldn't
expect newbies coming across it to be anything but perplexed.

Even if newbies don't understand all the details they should be
introduced to ``with`` right away IMHO. Because if you explain all the
details, even if they understand them, they likely will ignore the
knowledge because doing it right is a lot of boiler plate code. So
usually people write less robust code and ``with`` is a simple way to
solve that problem.
Now that the newbie has determined that lines is a file object, the very
next thing you do is assign something completely different to 'lines':

lines = (line for line in lines if line.strip())

So the reader needs to know that brackets aren't just for grouping like
in most other languages, but also that (x) can be equivalent to a for-
loop. They need to know, or guess, that iterating over a file object
returns lines of the file, and they have to keep the two different
bindings of "lines" straight in their head in a piece of code that uses
"lines" twice and "line" three times.

Yes the reader needs to know a basic Python syntax construct to
understand this. And some knowledge from the tutorial about files. So
what?
And then they hit the next line, which includes a function called
"partial", which has a technical meaning out of functional languages and
I am sure it will mean nothing whatsoever to anyone unfamiliar to it.
It's not something that is guessable, unlike open() or len() or
append().

Why on earth has everything to be guessable for someone who doesn't know
Python or even programming at all?
Nevertheless, for people coming from less dynamic languages than Python
(such as Fortran), it is a common idiom to never use the same variable
for two different things. It's not a bad choice really: imagine reading
a function where the name "lines" started off as an integer number of
lines, then became a template string, then was used for a list of
character positions...

Which I'm not doing at all. It has the same duck type all the time:
"iterable of lines".
Of course I'm not suggesting that your code was that bad. But rebinding
a name does make code harder to understand.

Introducing a new name here would be worse IMHO because then the file
object would be still reachable by a name, which it shouldn't to document
that it won't be used anymore in the following code.

Again, I don't think I have written something deliberately obfuscated,
but readable, concise, and straight forward code -- for people who know
the language of course.

If someone ask how would you write this code from language X in Python, I
actually write Python, and not something that is a 1:1 almost literal
translation of the code in language X.

*I* think I would do Python a disservice if I encourage people to
continue writing Python code as if it where language X or pretending
Python is all about "readable, executable Pseudocode for anyone". Python
has dynamic typing, first class functions, "functional" syntax
constructs, and it seems the developers like iterators and generators.
That's the basic building blocks of the language, so I use them, even in
public. :)

Ciao,
Marc 'BlackJack' Rintsch
 
S

Steven D'Aprano

But Python is not Pascal either. Nonetheless a Pascal coder might guess
what the ``with`` does. Not all the gory details but that it opens a
file and introduces `lines` should be more or less obvious to someone
who has programmed before.

But that's not what the with statement does. It doesn't open a file and
it doesn't introduce lines. That's what open() does. So what you say is
"obvious" is actually wrong. To a newbie who knows nothing about context
managers, the statement

with open(filename) as lines

will look like "syntactic fat" (like syntactic sugar but harder to digest
and more fattening) for the simpler code:

lines = open(filename)



[snip]
Even if newbies don't understand all the details they should be
introduced to ``with`` right away IMHO. Because if you explain all the
details, even if they understand them, they likely will ignore the
knowledge because doing it right is a lot of boiler plate code. So
usually people write less robust code and ``with`` is a simple way to
solve that problem.

So what you're saying is that we should encourage cargo-cult coding.
"Write this boilerplate, because I tell you that if you do, good things
will happen."

Newbies aren't going to be writing robust code anyway. The ability to
write robust code is one of the things which distinguishes experienced
coders from newbies. If they don't understand what the code is actually
doing, they're going to make mistakes like these:

import urllib2
try:
result = urllib2.open('http://www.python.org')
except IOError, URLError:
print "Can't reach website"
except HTTPError:
print "Page not found"




[much more snippage]
Why on earth has everything to be guessable for someone who doesn't
know Python or even programming at all?

Oh please. Don't take my words out of context. I'm not talking about
"everything", and I'm not suggesting that advanced programming features
should be prohibited and we should write to the level my grandmother
would understand.

The context was that a Fortran programmer asked for some help in writing
a piece of code in Python. Your answer was entirely opaque and
undecipherable to the OP. If your intention in answering was to teach the
OP how to write Python code, you failed, because the OP couldn't
understand your code! You can argue with me until Doomsday and it won't
change that basic fact.

Your answer may have solved the OP's *technical* problem, but it didn't
do anything to solve the OP's *actual* problem, which was that he didn't
know enough basic Python techniques to solve a simple problem. And that's
the issue I was commenting on.


[more snippage]
Which I'm not doing at all. It has the same duck type all the time:
"iterable of lines".

It has nothing to do with duck typing and everything to do with re-use of
variables (or in Python, names) for different "things". Just because
"lines" has the same duck-type doesn't mean they are conceptually the
same things. If they were, the assignments would be null-ops.

There is a programming principle that says never re-use variables. It
makes it harder for the programmer to figure out what the variable
represents and for some languages, it can defeat compiler optimizations.

Now, I personally wouldn't treat this principle as a law. I'd treat it as
a guideline with just as many exceptions as examples. But there's no
doubt in my mind that reuse of names can lead to hard to understand code,
particularly if the reader is not used to the language and is already
struggling to understand it.


[snippity snip]
*I* think I would do Python a disservice if I encourage people to
continue writing Python code as if it where language X or pretending
Python is all about "readable, executable Pseudocode for anyone".

There's no "pretending". Python is excellent for writing readable,
executable pseudo-code for anyone. With Python 3.0, GvR had the
opportunity to strip Python of all the features that makes Python easy to
learn, and he didn't. Python still has features that are easy for
newbies, and features that are powerful for experienced coders, and that
friendliness for newbies isn't going away. That's a good thing.
 
M

Marc 'BlackJack' Rintsch

So what you're saying is that we should encourage cargo-cult coding.
"Write this boilerplate, because I tell you that if you do, good things
will happen."

It's not cargo cult programming if you tell people to use the ``with``
statement to make sure the file will be closed after the block is left,
for whatever reason the block was left.
Oh please. Don't take my words out of context. I'm not talking about
"everything", and I'm not suggesting that advanced programming features
should be prohibited and we should write to the level my grandmother
would understand.

The context was that a Fortran programmer asked for some help in writing
a piece of code in Python. Your answer was entirely opaque and
undecipherable to the OP. If your intention in answering was to teach
the OP how to write Python code, you failed, because the OP couldn't
understand your code! You can argue with me until Doomsday and it won't
change that basic fact.

My intention wasn't to teach the OP how to write Python but to give a
concise, easy and straight forward solution in Python. Yes, I really
believe I have written such thing. I'm well aware that a Fortran
programmer will not understand this without learning Python.
Your answer may have solved the OP's *technical* problem, but it didn't
do anything to solve the OP's *actual* problem, which was that he didn't
know enough basic Python techniques to solve a simple problem. And
that's the issue I was commenting on.

If he doesn't know enough basic Python techniques to solve *a simple
problem* I think this is the wrong forum and he should work through the
tutorial from the documentation to learn the basics first. The tutorial
includes `map()`, list comprehensions, methods in strings, the fact that
files are iterable, and generator expressions.
[more snippage]
Which I'm not doing at all. It has the same duck type all the time:
"iterable of lines".

It has nothing to do with duck typing and everything to do with re-use
of variables (or in Python, names) for different "things". Just because
"lines" has the same duck-type doesn't mean they are conceptually the
same things.

Of course it means they are the same "things", that is what duck typing
is about. In a statically typed language `lines` would be declared as
`Iterable<str>` or similar. Files opened for reading have that interface
and the generator expression has the very same type. A hypothetically
statically typed Python variant with a ``declare`` statement should
compile the following without problems because `generator` objects would
implement `Iterable<A>` and `line` is of type `str`:

declare lines as Iterable<str>
lines = open('test.txt')
lines = (line for line in lines if line.strip())
#...
There's no "pretending". Python is excellent for writing readable,
executable pseudo-code for anyone.

Yes, but that's not what Python is all about. I use it for programming
and not for writing code with the primary goal to be newbie friendly or
pseudo code like.

Ciao,
Marc 'BlackJack' Rintsch
 
B

bearophileHUGS

Steven D'Aprano:
With Python 3.0, GvR had the
opportunity to strip Python of all the features that makes Python easy to
learn, and he didn't. Python still has features that are easy for
newbies, and features that are powerful for experienced coders, and that
friendliness for newbies isn't going away. That's a good thing.

I think that making range, dict.keys, dict.values, filter, map, etc,
return lazy iterables GvR has made the language a little less easy to
understand for newbies.

What's a range(n)? A function that returns a list of n items, from 0
to n. This is easy to understand, while xrange(n) is a bit less easy
to understand (a generator or generators).

Python is growing toward being more fit for medium-large programs, and
less fit for being small, simple and easy. Lua for example is now
maybe better than Python if you need something light to script a large
C++ program, so the niche partially left free by Python that has gone
"up" is being partially replaced by Lua.

Bye,
bearophile
 
M

Marc 'BlackJack' Rintsch

What's a range(n)? A function that returns a list of n items, from 0 to
n. This is easy to understand, while xrange(n) is a bit less easy to
understand (a generator or generators).

<nitpick>

`xrange()` doesn't return a generator or iterator but an object that
implements the sequence protocol:

In [159]: a = xrange(0, 10, 2)

In [160]: len(a)
Out[160]: 5

In [161]: a[0]
Out[161]: 0

In [162]: a[2]
Out[162]: 4

</nitpick>

Ciao,
Marc 'BlackJack' Rintsch
 
S

Steven D'Aprano

I'm not sure if our views are moving closer together or further apart,
but here goes...


It's not cargo cult programming if you tell people to use the ``with``
statement to make sure the file will be closed after the block is left,
for whatever reason the block was left.

You are right. If you explain what "with" blocks do, it isn't cargo cult
programming.


My intention wasn't to teach the OP how to write Python but to give a
concise, easy and straight forward solution in Python. Yes, I really
believe I have written such thing. I'm well aware that a Fortran
programmer will not understand this without learning Python.

I'm curious what the point of answering the OP's question was if you knew
he wouldn't understand the answer. You might have saved us both a lot of
time if you started your post with "You aren't expected to understand
this".


If he doesn't know enough basic Python techniques to solve *a simple
problem* I think this is the wrong forum and he should work through the
tutorial from the documentation to learn the basics first. The tutorial
includes `map()`, list comprehensions, methods in strings, the fact that
files are iterable, and generator expressions.

Then you should have said so.


[more snippage]
Nevertheless, for people coming from less dynamic languages than
Python (such as Fortran), it is a common idiom to never use the same
variable for two different things. It's not a bad choice really:
imagine reading a function where the name "lines" started off as an
integer number of lines, then became a template string, then was
used for a list of character positions...

Which I'm not doing at all. It has the same duck type all the time:
"iterable of lines".

It has nothing to do with duck typing and everything to do with re-use
of variables (or in Python, names) for different "things". Just because
"lines" has the same duck-type doesn't mean they are conceptually the
same things.

Of course it means they are the same "things", that is what duck typing
is about.

No, you still don't understand me. Let me give you a more extreme example
to help clarify:

average_age = 64.7
width_of_page = 20.2
speed_of_car = 35.2
concentration_of_acid = 1.03
children_per_family = 2.3

All of the above have not just the same duck-type, but the same actual
type (floats), and yet they are COMPLETELY different things. Imagine a
piece of code like this:

def foo():
x = 64.7 # x is the average age of a person
... more lines of code here
x = 2.3 # x is now the average number of children per family
...
return something


Would you defend the above code on the basis that x had the same duck-
type in both places? I hope not.

A decade or so ago, one of the Mars spaceships crashed because a coder
used a variable that was a float in inches when they were supposed to use
a variable that was a float in millimetres (or vice versa, I forget).
Because of this mistake, the retro-rockets fired too late, and the
spaceship flew into the surface of Mars at some thousands of miles an
hour. And yet both variables were not just the same duck-type, but the
same actual type. You cannot conclude that two things are the same kind
of thing just because they have the same type.

The difference between the above and re-using the same variable for
lines_of_text_before_filtering and lines_of_text_after_filtering is one
of degree.

Now, in practice, I personally think that what you did was perfectly
acceptable. I do it myself. I think the coders who refuse to EVER re-use
variables in a code block are being over-strict. But I am aware that the
cost of re-using variables is to increase the risk of confusing the two
different meanings of the variable name.

When I'm reading code within the boundaries of my understanding, that
risk is tiny. But when I'm reading code that is complicated or in a
language I don't understand, then the risk is magnified greatly. That's
all I was trying to get across.

I don't think I'm making an unreasonable claim.


[snip]
Yes, but that's not what Python is all about. I use it for programming
and not for writing code with the primary goal to be newbie friendly or
pseudo code like.

I never suggested that being newbie friendly was the only acceptable use
of Python. But I think that when you are replying to a newbie's direct
question, there are only two ways to answer: newbie-friendly, or newbie-
hostile.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top