It's ...

A

Angus Rodgers

.... my first Python program! So please be gentle (no fifty ton
weights on the head!), but tell me if it's properly "Pythonic",
or if it's a dead parrot (and if the latter, how to revive it).

I'm working from Beazley's /Python: Essential Reference/ (2nd
ed. 2001), so my first newbie question is how best to find out
what's changed from version 2.1 to version 2.5. (I've recently
installed 2.5.4 on my creaky old Win98SE system.) I expect to
be buying the 4th edition when it comes out, which will be soon,
but before then, is there a quick online way to find this out?

Having only got up to page 84 - where we can actually start to
read stuff from the hard disk - I'm emboldened to try to learn
to do something useful, such as removing all those annoying hard
tab characters from my many old text files (before I cottoned on
to using soft tabs in my text editor).

This sort of thing seems to work, in the interpreter (for an
ASCII text file, named 'h071.txt', in the current directory):

stop = 3 # Tab stops every 3 characters
from types import StringType # Is this awkwardness necessary?
detab = lambda s : StringType.expandtabs(s, stop) # Or use def
f = open('h071.txt') # Do some stuff to f, perhaps, and then:
f.seek(0)
print ''.join(map(detab, f.xreadlines()))
f.close()

Obviously, to turn this into a generally useful program, I need
to learn to write to a new file, and how to parcel up the Python
code, and write a script to apply the "detab" function to all the
files found by searching a Windows directory, and replace the old
files with the new ones; but, for the guts of the program, is this
a reasonable way to write the code to strip tabs from a text file?

For writing the output file, this seems to work in the interpreter:

g = open('temp.txt', 'w')
g.writelines(map(detab, f.xreadlines()))
g.close()

In practice, does this avoid creating the whole string in memory
at one time, as is done by using ''.join()? (I'll have to read up
on "opaque sequence objects", which have only been mentioned once
or twice in passing - another instance perhaps being an xrange()?)
Not that that matters much in practice (in this simple case), but
it seems elegant to avoid creating the whole output file at once.

OK, I'm just getting my feet wet, and I'll try not to ask too many
silly questions!

First impressions are: (1) Python seems both elegant and practical;
and (2) Beazley seems a pleasantly unfussy introduction for someone
with at least a little programming experience in other languages.
 
A

Angus Rodgers

J

J. Cliff Dyer

... my first Python program! So please be gentle (no fifty ton
weights on the head!), but tell me if it's properly "Pythonic",
or if it's a dead parrot (and if the latter, how to revive it).

Yay. Welcome to Python.

I'm working from Beazley's /Python: Essential Reference/ (2nd
ed. 2001), so my first newbie question is how best to find out
what's changed from version 2.1 to version 2.5. (I've recently
installed 2.5.4 on my creaky old Win98SE system.) I expect to
be buying the 4th edition when it comes out, which will be soon,
but before then, is there a quick online way to find this out?

Check here: http://docs.python.org/whatsnew/index.html

It's not designed to be newbie friendly, but it's in there.
Having only got up to page 84 - where we can actually start to
read stuff from the hard disk - I'm emboldened to try to learn
to do something useful, such as removing all those annoying hard
tab characters from my many old text files (before I cottoned on
to using soft tabs in my text editor).

This sort of thing seems to work, in the interpreter (for an
ASCII text file, named 'h071.txt', in the current directory):

stop = 3 # Tab stops every 3 characters
from types import StringType # Is this awkwardness necessary?

Not anymore. You can just use str for this.
detab = lambda s : StringType.expandtabs(s, stop) # Or use def

First, use def. lambda is a rarity for use when you'd rather not assign
your function to a variable.

Second, expandtabs is a method on string objects. s is a string object,
so you can just use s.expandtabs(stop)

Third, I'd recommend passing your tabstops into detab with a default
argument, rather than defining it irrevocably in a global variable
(which is brittle and ugly)

def detab(s, stop=3):
#do stuff

Then you can do

three_space_version = detab(s)
eight_space_version = detab(s, 8)
f = open('h071.txt') # Do some stuff to f, perhaps, and then:
f.seek(0)

f is not opened for writing, so if you do stuff to the contents of f,
you'll have to put the new version in a different variable, so f.seek(0)
doesn't help. If you don't do stuff to it, then you're at the beginning
of the file anyway, so either way, you shouldn't need to f.seek(0).
print ''.join(map(detab, f.xreadlines()))

Sometime in the history of python, files became iterable, which means
you can do the following:

for line in f:
print detab(line)

Much prettier than running through join/map shenanigans. This is also
the place to modify the output before passing it to detab:

for line in f:
# do stuff to line
print detab(line)

Also note that you can iterate over a file several times:

f = open('foo.txt')
for line in f:
print line[0] # prints the first character of every line
for line in f:
print line[1] #prints the second character of every line
f.close()


Obviously, to turn this into a generally useful program, I need
to learn to write to a new file, and how to parcel up the Python
code, and write a script to apply the "detab" function to all the
files found by searching a Windows directory, and replace the old
files with the new ones; but, for the guts of the program, is this
a reasonable way to write the code to strip tabs from a text file?

For writing the output file, this seems to work in the interpreter:

g = open('temp.txt', 'w')
g.writelines(map(detab, f.xreadlines()))
g.close()

Doesn't help, as map returns a list. You can use itertools.imap, or you
can use a for loop, as above.
In practice, does this avoid creating the whole string in memory
at one time, as is done by using ''.join()? (I'll have to read up
on "opaque sequence objects", which have only been mentioned once
or twice in passing - another instance perhaps being an xrange()?)
Not that that matters much in practice (in this simple case), but
it seems elegant to avoid creating the whole output file at once.

The terms to look for, rather than opaque sequence objects are
"iterators" and "generators".
OK, I'm just getting my feet wet, and I'll try not to ask too many
silly questions!

First impressions are: (1) Python seems both elegant and practical;
and (2) Beazley seems a pleasantly unfussy introduction for someone
with at least a little programming experience in other languages.

Glad you're enjoying Beazley. I would look for something more
up-to-date. Python's come a long way since 2.1. I'd hate for you to
miss out on all the iterators, booleans, codecs, subprocess, yield,
unified int/longs, decorators, decimals, sets, context managers and
new-style classes that have come since then.


Cheers,
Cliff
 
M

MRAB

Angus Rodgers wrote:
[snip]
This sort of thing seems to work, in the interpreter (for an
ASCII text file, named 'h071.txt', in the current directory):

stop = 3 # Tab stops every 3 characters
from types import StringType # Is this awkwardness necessary?
detab = lambda s : StringType.expandtabs(s, stop) # Or use def
f = open('h071.txt') # Do some stuff to f, perhaps, and then:
f.seek(0)
print ''.join(map(detab, f.xreadlines()))
f.close()
stop = 3 # Tab stops every 3 characters
detab = lambda s: s.expandtabs(stop)
f = open('h071.txt') # Do some stuff to f, perhaps, and then:
# f.seek(0) # Not necessary
print ''.join(map(detab, f.xreadlines()))
f.close()
Obviously, to turn this into a generally useful program, I need
to learn to write to a new file, and how to parcel up the Python
code, and write a script to apply the "detab" function to all the
files found by searching a Windows directory, and replace the old
files with the new ones; but, for the guts of the program, is this
a reasonable way to write the code to strip tabs from a text file?

For writing the output file, this seems to work in the interpreter:

g = open('temp.txt', 'w')
g.writelines(map(detab, f.xreadlines()))
g.close()

In practice, does this avoid creating the whole string in memory
at one time, as is done by using ''.join()? (I'll have to read up
on "opaque sequence objects", which have only been mentioned once
or twice in passing - another instance perhaps being an xrange()?)
Not that that matters much in practice (in this simple case), but
it seems elegant to avoid creating the whole output file at once.

OK, I'm just getting my feet wet, and I'll try not to ask too many
silly questions!

First impressions are: (1) Python seems both elegant and practical;
and (2) Beazley seems a pleasantly unfussy introduction for someone
with at least a little programming experience in other languages.
STOP = 3 # Tab stops every 3 characters
in_file = open('h071.txt')
out_file = open('temp.txt', 'w')
for line in in_file: # Iterates one line at a time
out_file.write(line.expandtabs(STOP))
in_file.close()
out_file.close()
 
A

Angus Rodgers

[...]
from types import StringType # Is this awkwardness necessary?

Not anymore. You can just use str for this.
detab = lambda s : StringType.expandtabs(s, stop) # Or use def

First, use def. lambda is a rarity for use when you'd rather not assign
your function to a variable.

Second, expandtabs is a method on string objects. s is a string object,
so you can just use s.expandtabs(stop)

How exactly do I get detab, as a function from strings to strings
(for a fixed tab size)? (This is aside from the point, which you
make below, that the whole map/join idea is a bit of a no-no - in
some other context, I might want to isolate a method like this.)
Third, I'd recommend passing your tabstops into detab with a default
argument, rather than defining it irrevocably in a global variable
(which is brittle and ugly)

No argument there - I was just messing about in the interpreter,
to see if the main idea worked.
f is not opened for writing, so if you do stuff to the contents of f,
you'll have to put the new version in a different variable, so f.seek(0)
doesn't help. If you don't do stuff to it, then you're at the beginning
of the file anyway, so either way, you shouldn't need to f.seek(0).

I seemed to find that if I executed f.xreadlines() or f.readlines()
once, I was somehow positioned at the end of the file or something,
and had to do the f.seek(0) - but maybe I did something else silly.
print ''.join(map(detab, f.xreadlines()))

Sometime in the history of python, files became iterable, which means
you can do the following:

for line in f:
print detab(line)

Much prettier than running through join/map shenanigans. This is also
the place to modify the output before passing it to detab:

for line in f:
# do stuff to line
print detab(line)

Also note that you can iterate over a file several times:

f = open('foo.txt')
for line in f:
print line[0] # prints the first character of every line
for line in f:
print line[1] #prints the second character of every line
f.close()

This all looks very nice.
Doesn't help, as map returns a list.

Pity. Oh, well.
You can use itertools.imap, or you
can use a for loop, as above.

This is whetting my appetite!
The terms to look for, rather than opaque sequence objects are
"iterators" and "generators".

OK, will do.
Glad you're enjoying Beazley. I would look for something more
up-to-date. Python's come a long way since 2.1. I'd hate for you to
miss out on all the iterators, booleans, codecs, subprocess, yield,
unified int/longs, decorators, decimals, sets, context managers and
new-style classes that have come since then.

I'll get either Beazley's 4th ed. (due next month, IIRC), or Chun,
/Core Python Programming/ (2nd ed.), or both, unless someone has
a better suggestion. (Eventually I'll migrate from Windows 98SE(!),
and will need info on Python later than 2.5, but that's all I need
for now.)
 
A

Angus Rodgers

How exactly do I get detab, as a function from strings to strings
(for a fixed tab size)?

(It's OK - this has been explained in another reply. I'm still a
little hazy about what exactly objects are in Python, but the haze
will soon clear, I'm sure, especially after I have written more
than one one-line program!)
 
A

Angus Rodgers


I'm starting to see some of the mental haze that was confusing me.
Also, expandtabs is an instance method, so the roundabout is not needed.

def detab(s):
return s.expandtabs(stop)

I'd forgotten where Beazley had explained that "methods such as
.... s.expandtabs() always return a new string as opposed to mod-
ifying the string s." I must have been hazily thinking of it as
somehow modifying s, even though my awkward code itself depended
on a vague understanding that it didn't. No point in nailing
this polly to the perch any more!
I'd simply use:
for line in f:
print detab(line.rstrip())
or even:
for line in f:
print line.rstrip().expandtabs(stop)

I'll read up on iterating through files, somewhere online for
the moment, and then get a more up-to-date textbook.

And I'll try not too ask too many silly questions like this, but
I wanted to make sure I wasn't getting into any bad programming
habits right at the start - and it's a good thing I did, because
I was!
Nope. But you could use a generator expression if you wanted:
g.writelines(detab(line) for line in f)

Ah, so that actually does what I was fondly hoping my code would
do. Thanks! I must learn about these "generator" thingies.
 
A

Aahz

Glad you're enjoying Beazley. I would look for something more
up-to-date. Python's come a long way since 2.1. I'd hate for you to
miss out on all the iterators, booleans, codecs, subprocess, yield,
unified int/longs, decorators, decimals, sets, context managers and
new-style classes that have come since then.

While those are all nice, they certainly aren't essential to learning
Python.
 
A

Angus Rodgers

No point in nailing this polly to the perch any more!

Indeed not, so please skip what follows (I've surely been enough
of an annoying newbie, already!), but I've just remembered why I
wrote my program in such an awkward way. I wanted to be able to
import the type name t (StringType in this case) so that I could
simply use t.m() as the name of one of its methods [if "method"
is the correct term]; but in this case, where m is expandtabs(),
an additional parameter (the tab size) is needed; so, I used the
lambda expression to get around this, entirely failing to realise
that (as was clearly shown in the replies I got), if I was going
to use "lambda" at all (not recommended!), then it would be a lot
simpler to write the function as lambda s : s.m(), with or without
any additional parameters needed. (It didn't really have anything
to do with a separate confusion as to what exactly "objects" are.)
I wanted to make sure I wasn't getting into any bad programming
habits right at the start

I'm just trying to make sure I really understand how I screwed up.

(In future, I'll try to work through a textbook with exercises.
But I thought I'd better try to get some quick feedback at the
start, because I knew that I was fumbling around, and that it
was unlikely to be necessary to use such circumlocutions.)
 
J

J. Clifford Dyer

While those are all nice, they certainly aren't essential to learning
Python.

Mostly, no, you are correct. With some exceptions:

1) You have to know iterators at a basic level (not enough to understand
how the iterator protocol works, but enough to know what 'for line in
f:' does.

2) Sets are as essential as any other data structure. If you are
learning both lists and tuples, you should be learning sets as well.

3) If you're learning object-oriented programmin, new-style classes
should be the only classes you use.

4) You should know how a decorator works, in case you run across one in
the wild.

5) Booleans are a basic type. You should know them.

Codecs, the subprocess module, yield, decimals and context managers can
certainly come later. (All this of course, is assuming the Python 2.x
world, which I think is still the right way to learn, for now)


Cheers,
Cliff
 
A

Angus Rodgers

An equivalent in modern Pythons:

I guess the code below would also have worked in 2.1?
(It does in 2.5.4.)

print ''.join(line.expandtabs(3) for line in \
file('h071.txt').xreadlines())
 
A

Angus Rodgers

I guess the code below would also have worked in 2.1?
(It does in 2.5.4.)

print ''.join(line.expandtabs(3) for line in \
file('h071.txt').xreadlines())

Possibly silly question (in for a penny ...): does the new feature,
by which a file becomes iterable, operate by some kind of coercion
of a file object to a list object, via something like x.readlines()?
<runs for cover>
 
A

Angus Rodgers

[...] does the new feature,
by which a file becomes iterable, operate by some kind of coercion
of a file object to a list object, via something like x.readlines()?

Sorry to follow up my own post yet again (amongst my weapons is
a fanatical attention to detail when it's too late!), but I had
better rephrase that question:

Scratch "list object", and replace it with something like: "some
kind of iterator object, that is at least already implicit in 2.1
(although the term 'iterator' isn't mentioned in the index to the
2nd edition of Beazley's book)". Something like that! 8-P
 
M

MRAB

Angus said:
I guess the code below would also have worked in 2.1?
(It does in 2.5.4.)

print ''.join(line.expandtabs(3) for line in \
file('h071.txt').xreadlines())
That uses a generator expression, which was introduced in 2.4.
 
A

Angus Rodgers

That uses a generator expression, which was introduced in 2.4.

Sorry, I forgot that list comprehensions need square brackets.

The following code works in 2.1 (I installed version 2.1.3, on
a different machine, to check!):

f = open('h071.txt') # Can't use file('h071.txt') in 2.1
print ''.join([line.expandtabs(3) for line in f.xreadlines()])

(Of course, in practice I'll stick to doing it the more sensible
way that's already been explained to me. I'm ordering a copy of
Wesley Chun, /Core Python Programming/ (2nd ed., 2006), to learn
about version 2.5.)
 
G

Gabriel Genellina

[...] does the new feature,
by which a file becomes iterable, operate by some kind of coercion
of a file object to a list object, via something like x.readlines()?

Sorry to follow up my own post yet again (amongst my weapons is
a fanatical attention to detail when it's too late!), but I had
better rephrase that question:

Scratch "list object", and replace it with something like: "some
kind of iterator object, that is at least already implicit in 2.1
(although the term 'iterator' isn't mentioned in the index to the
2nd edition of Beazley's book)". Something like that! 8-P

Iterators were added in Python 2.2. An iterator is an object that can be
iterated over; that is, an object for which "for item in some_iterator:
...." works.
Files are their own iterators, yielding one line at a time.
See PEP 234 http://www.python.org/dev/peps/pep-0234/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top