basic python questions

N

nateastle

I have a simple assignment for school but am unsure where to go. The
assignment is to read in a text file, split out the words and say which
line each word appears in alphabetical order. I have the basic outline
of the program done which is:

def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):
if lines[line_num] == "": continue
words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)
return dict

My question is, how do I easily parse out punction marks and how do I
sort the list and if there anything else that I am doing wrong in this
code it would be much help.
 
P

Paddy

I have a simple assignment for school but am unsure where to go. The
assignment is to read in a text file, split out the words and say which
line each word appears in alphabetical order. I have the basic outline
of the program done which is:

def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):
if lines[line_num] == "": continue
words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)
return dict

My question is, how do I easily parse out punction marks and how do I
sort the list and if there anything else that I am doing wrong in this
code it would be much help.
Hi,
on first reading, you have a naked except clause that catches all
exceptions. You might want to try your program on a non-existent file
to find out the actual exception you need to trap for that error
message. Do you want the program to continue if you have no input file?

If you have not covered Regular Expressions, often called RE's then one
way of getting rid of puctuation is to turn the problem on its head.
create a string of all the characters that you consider as valid in
words then go through each input line discarding any character not *in*
the string. Use the doctored line for word extraction.

help(sorted) will start you of on sorting in python. Other
documentation sources have a lot more.

P.S. I have not run the code myself
P.P.S. Where is the functions docstring!
P.P.P.S. You might want to read up on enumerate. It gives another way
to do things when you want an index as well as each item from an
iterable but remember, the index given starts from zero.

Oh, and welcome to comp.lang.python :)

- Paddy.
 
M

Marc 'BlackJack' Rintsch

In <[email protected]>,
def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):

Instead of reading the file completely into a list you can iterate over
the (open) file object and the `enumerate()` function can be used to get
an index number for each line.
if lines[line_num] == "": continue

Take a look at the lines you've read and you'll see why the ``continue``
is never executed.
words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)

Instead of dealing with words that appear more than once in a line you may
use a `set()` to remove duplicates before entering the loop.

Ciao,
Marc 'BlackJack' Rintsch
 
F

Fredrik Lundh

I have a simple assignment for school but am unsure where to go. The
assignment is to read in a text file, split out the words and say which
line each word appears in alphabetical order. I have the basic outline
of the program done which is:

looks like an excellent start to me.
def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):
if lines[line_num] == "": continue
words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)
return dict

My question is, how do I easily parse out punction marks

it depends a bit how you define the term "word".

if you're using regular text, with a limited set of punctuation
characters, you can simply do e.g.

word = word.strip(".,!?:;")
if not word:
continue

inside the "for word" loop. this won't handle such characters if they
appear inside words, but that's probably good enough for your task.

another, slightly more advanced approach is to use regular expressions,
such as re.findall("\w+") to get a list of all alphanumeric "words" in
the text. that'll have other drawbacks (e.g. it'll split up words like
"couldn't" and "cross-reference", unless you tweak the regexp), and is
probably overkill.

and how do I sort the list and

how to sort the dictionary when printing the cross-reference, you mean?
just use "sorted" on the dictionary; that'll get you a sorted list
of the keys.

sorted(dict)

to avoid duplicates and simplify sorting, you probably want to normalize
the case of the words you add to the dictionary, e.g. by converting all
words to lowercase.
> if there anything else that I am doing wrong in this code

there's plenty of things that can be tweaked and tuned and written in a
slightly shorter way by an experienced Python programmer, but assuming
that this is a general programming assignment, I don't see something
seriously "wrong" in your code (just make sure you test it on a file
that doesn't exist before you hand it in)

</F>
 
P

Paul McGuire

I have a simple assignment for school but am unsure where to go. The
assignment is to read in a text file, split out the words and say which
line each word appears in alphabetical order. I have the basic outline
of the program done which is:

And in general, this is one of the best "can anyone help me with my
homework?" posts I've ever seen.
A. You told us up front that it was your homework.
B. You made an honest stab at the solution before posting, and posted the
actual code.
C. You ended with some specific questions on things that didn't work or that
you wanted to improve.

Your current program looks like at least A- material. Add use of sorted and
enumerate, and handle that exception a little better, and you're getting
into A+ territory.

Out of curiosity, what school are you attending that is teaching Python, and
under what course of study?

-- Paul
 
N

nateastle

I am currently going to school at Utah Valley State College, the course
that I am taking is analysis of programming languages. It's an upper
division course but our teacher wanted to teach us python as part of
the course, he spent about 2 - 3 weeks on python which has been good. I
currently work with .net and it is fun to see what other languages have
and what sytax they use.
 
N

nateastle

I have taken the coments and think I have implemented most. My only
question is how to use the enumerator. Here is what I did, I have tried
a couple of things but was unable to figure out how to get the line
number.

def Xref(filename):
try:
fp = open(filename, "r")
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
line_num=0
for words in iter(fp.readline,""):
words = set(words.split())
line_num = line_num+1
for word in words:
word = word.strip(".,!?:;")
if not dict.has_key(word):
dict[word] = []
dict[word].append(line_num)
fp.close()
keys = sorted(dict);
for key in keys:
print key," : ", dict[key]
return dict
In <[email protected]>,
def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):

Instead of reading the file completely into a list you can iterate over
the (open) file object and the `enumerate()` function can be used to get
an index number for each line.
if lines[line_num] == "": continue

Take a look at the lines you've read and you'll see why the ``continue``
is never executed.
words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)

Instead of dealing with words that appear more than once in a line you may
use a `set()` to remove duplicates before entering the loop.

Ciao,
Marc 'BlackJack' Rintsch
 
T

tom

I have taken the coments and think I have implemented most. My only
question is how to use the enumerator. Here is what I did, I have tried
a couple of things but was unable to figure out how to get the line
number.
Try this in the interpreter,

l = [5,4,3,2,1]
for count, i in enumerate(l):
print count, i

def Xref(filename):
try:
fp = open(filename, "r")
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
line_num=0
for words in iter(fp.readline,""):
words = set(words.split())
line_num = line_num+1
for word in words:
word = word.strip(".,!?:;")
if not dict.has_key(word):
dict[word] = []
dict[word].append(line_num)
fp.close()
keys = sorted(dict);
for key in keys:
print key," : ", dict[key]
return dict
In <[email protected]>,
def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):
Instead of reading the file completely into a list you can iterate over
the (open) file object and the `enumerate()` function can be used to get
an index number for each line.

if lines[line_num] == "": continue
Take a look at the lines you've read and you'll see why the ``continue``
is never executed.

words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)
Instead of dealing with words that appear more than once in a line you may
use a `set()` to remove duplicates before entering the loop.

Ciao,
Marc 'BlackJack' Rintsch
 
T

tom

tom said:
I have taken the coments and think I have implemented most. My only
question is how to use the enumerator. Here is what I did, I have tried
a couple of things but was unable to figure out how to get the line
number.
Try this in the interpreter,

l = [5,4,3,2,1]
for count, i in enumerate(l):
print count, i
you could do it like this.

for count, line in enumerate(fb):
for word in line.split():
etc...

filehandles are iterators themselves.

dont take my words for granted though, i'm kinda new to all this too :)
def Xref(filename):
try:
fp = open(filename, "r")
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
line_num=0
for words in iter(fp.readline,""):
words = set(words.split())
line_num = line_num+1
for word in words:
word = word.strip(".,!?:;")
if not dict.has_key(word):
dict[word] = []
dict[word].append(line_num)
fp.close()
keys = sorted(dict);
for key in keys:
print key," : ", dict[key]
return dict

Marc 'BlackJack' Rintsch wrote:

In <[email protected]>,
(e-mail address removed) wrote:



def Xref(filename):
try:
fp = open(filename, "r")
lines = fp.readlines()
fp.close()
except:
raise "Couldn't read input file \"%s\"" % filename
dict = {}
for line_num in xrange(len(lines)):


Instead of reading the file completely into a list you can iterate over
the (open) file object and the `enumerate()` function can be used to get
an index number for each line.



if lines[line_num] == "": continue


Take a look at the lines you've read and you'll see why the ``continue``
is never executed.



words = lines[line_num].split()
for word in words:
if not dict.has_key(word):
dict[word] = []
if line_num+1 not in dict[word]:
dict[word].append(line_num+1)


Instead of dealing with words that appear more than once in a line you may
use a `set()` to remove duplicates before entering the loop.

Ciao,
Marc 'BlackJack' Rintsch
 
D

Diez B. Roggisch

I have taken the coments and think I have implemented most. My only

Unfortunately, no.
question is how to use the enumerator. Here is what I did, I have tried
a couple of things but was unable to figure out how to get the line
number.

def Xref(filename):
try:
fp = open(filename, "r")
except:
raise "Couldn't read input file \"%s\"" % filename

You still got that I-catch-all-except in there.
This will produce subtle bugs when you e.g. misspell a variable name:

filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except:
raise "can't open filename"


Please notice the wrong-spelled 'fliename'.

This OTOH will give you more clues on what really goes wrong:



filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except IOError:
raise "can't open filename"


Diez
 
M

Mark Peters

dict = {}

As a general rule you should avoid variable names which shadow built in
types (list, dict, etc.). This can cause unexpected behavior later on.

Also, variable names should be more descriptive of their contents.

Try word_dict or some such variant
 
N

nateastle

So I implemented the exception spcified and in testing it returns:

DeprecationWarning: raising a string exception is deprecated

I am not to worried about depreciation warning however, out of
curiosity, what would the better way be to handle this? Is there a way
that (web site, help documentation, etc...) I would be able to find
this? I am running this in Python 2.5
 
N

nateastle

So I implemented the exception spcified and in testing it returns:

DeprecationWarning: raising a string exception is deprecated

I am not to worried about depreciation warning however, out of
curiosity, what would the better way be to handle this? Is there a way
that (web site, help documentation, etc...) I would be able to find
this? I am running this in Python 2.5
 
J

John Machin

So I implemented the exception spcified and in testing it returns:

DeprecationWarning: raising a string exception is deprecated

I am not to worried about depreciation warning however, out of
curiosity, what would the better way be to handle this? Is there a way
that (web site, help documentation, etc...) I would be able to find
this? I am running this in Python 2.5

Just try shortening the statement to the bare:
raise

For example:

| >>> try:
| ... f = open("nonesuch.txt")
| ... except IOError:
| ... raise
| ...
| Traceback (most recent call last):
| File "<stdin>", line 2, in <module>
# Coming from a file you'll get filename, linenumber, function/method
above
| IOError: [Errno 2] No such file or directory: 'nonesuch.txt'
| >>>

If you feel that the error message that you get is descriptive enough,
even better than what you'd contemplated writing yourself, you're done.
Otherwise you need to raise an instance of the Exception class, and the
degree of difficulty just went up a notch.

[Aside] How are you going to explain all this to your instructor, who
may be reading all this right now?

Cheers,
John
 
P

Paddy

John Machin wrote:

[Aside] How are you going to explain all this to your instructor, who
may be reading all this right now?

The instructor should be proud!
He has managed to do his very first post to a this newsgroup, about a
homework question, and do it in the right way. that is no mean feat.

- Paddy.
 
H

Hendrik van Rooyen

I am currently going to school at Utah Valley State College, the course
that I am taking is analysis of programming languages. It's an upper
division course but our teacher wanted to teach us python as part of

what does "upper division" mean in this context ? I am unfamiliar with the
term.

- Hendrik
 
P

Paul McGuire

Hendrik van Rooyen said:
I am currently going to school at Utah Valley State College, the course
that I am taking is analysis of programming languages. It's an upper
division course but our teacher wanted to teach us python as part of

what does "upper division" mean in this context ? I am unfamiliar with
the
term.

- Hendrik
[/QUOTE]
In a 4-year college program in the US, an upper division course is an
advanced course, usually reserved for those in the 3rd or 4th years.

-- Paul
 
J

John Machin

Paddy said:
John Machin wrote:

[Aside] How are you going to explain all this to your instructor, who
may be reading all this right now?

The instructor should be proud!
He has managed to do his very first post to a this newsgroup, about a
homework question, and do it in the right way. that is no mean feat.

- Paddy.

In fact, he may well by now know more than his instructor, and be
explaining the finer points of Python :)
 
N

nateastle

I normaly try to be as resourceful as I can. I find that newgroups give
a wide range of answers and solutions to problems and you get a lot
responses to what is the right way to do things and different point of
views about the language that you can't find in help manuals. I also
want to thank everyone for being so helpful in this group, it has been
one of the better groups that I have used.


John said:
Paddy said:
John Machin wrote:

[Aside] How are you going to explain all this to your instructor, who
may be reading all this right now?

The instructor should be proud!
He has managed to do his very first post to a this newsgroup, about a
homework question, and do it in the right way. that is no mean feat.

- Paddy.

In fact, he may well by now know more than his instructor, and be
explaining the finer points of Python :)
 
B

Bruno Desthuilliers

Diez B. Roggisch a écrit :
Unfortunately, no.



You still got that I-catch-all-except in there.
This will produce subtle bugs when you e.g. misspell a variable name:

filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except:
raise "can't open filename"


Please notice the wrong-spelled 'fliename'.

This OTOH will give you more clues on what really goes wrong:



filename = '/tmp/foo'
try:
f = open(fliename, 'r')
except IOError:
raise "can't open filename"

And this would be still more informative (and not deprecated...):

filename = '/tmp/foo'
f = open(fliename)

Catching an exception just to raise a less informative one is somewhat
useless IMHO.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top