Pattern matching from a text document

B

Ben

I'm currently trying to develop a demonstrator in python for an
ontology of a football team. At present all the fit players are
exported to a text document.

The program reads the document in and splits each line into a string
(since each fit player and their attributes is entered line by line in
the text document) using list = target.splitlines()

The program then performs a loop like so:

while foo > 0:
if len(list) == 0:
break
else:
pat =
"([a-z]+)(\s+)([a-z]+)(\s+)([a-z]+)(\s+)(\d{1})(\d{1})(\d{1})(\d{1})(\d{1})([a-z]+)"
ph = re.compile(pat,re.IGNORECASE)

match = ph.match(list[1])

forename = match.group(1)
surname = match.group(3)
attacking = match.group(7)
defending = match.group(8)
fitness = match.group(9)

print forename
print len(list)
del list[0]

The two main problems I'm having are that the first and entry in the
list is not printing. Once I have overcome this problem I then need
each player and there related variables to be stored seperately. This
is not happening at present because each time the loop runs it
overwrites the value in each variable.

Any help would be greatly appreciated.

Ben.
 
G

George Sakkis

B
Ben said:
I'm currently trying to develop a demonstrator in python for an
ontology of a football team. At present all the fit players are
exported to a text document.

The program reads the document in and splits each line into a string
(since each fit player and their attributes is entered line by line in
the text document) using list = target.splitlines()

[snipped]

The program then performs a loop like so:

The two main problems I'm having are that the first and entry in the
list is not printing. Once I have overcome this problem I then need
each player and there related variables to be stored seperately. This
is not happening at present because each time the loop runs it
overwrites the value in each variable.

Any help would be greatly appreciated.

Ben.


Ben, can you post a sample line from the document and indicate the fields you want to extract? I'm
sure it will be easier to help you this way.

George


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"If a slave say to his master: "You are not my master," if they convict
him his master shall cut off his ear."

Hammurabi's Code of Laws
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
I

infidel

First, if you're going to loop over each line, do it like this:

for line in file('playerlist.txt'):
#do stuff here

Second, this statement is referencing the *second* item in the list,
not the first:

match = ph.match(list[1])

Third, a simple splitting of the lines by some delimiter character
would be easier than regular expressions, but whatever floats your
boat. If you insist on using regexen, then you should compile the
pattern before the loop. No need to do it over and over again.

Fourth, if you want to create a list of players in memory, then you
need either a class or some other structure to represent each player,
and then you need to add them to some kind of list as you go. Like
this:

pat =
"([a-z]+)(\s+)([a-z]+)(\s+)([a­-z]+)(\s+)(\d{1})(\d{1})(\d{1}­)(\d{1})(\d{1})([a-z]+)"

ph = re.compile(pat,re.IGNORECASE)
players = []
for line in file('playerlist.txt'):
match = ph.match(line)
player = {
'forename' : match.group(1),
'surname' : match.group(3),
'attacking' : match.group(7),
'defending' : match.group(8),
'fitness' : match.group(9)
}
players.append(player)
 
L

Larry Bates

Ben,

Others have answered your specific questions, but I thought
I'd use this opportunity to make a general statement. Unlike
other programming languages, Python doesn't make its built-in
functions keywords. You should never, ever, ever name a
variable 'list' (the same is true of dict, tuple, str, ...).
When you do you mask the built-in Python function with your
variables. If this hasn't bitten you before, it will at some
point.

It really doesn't sound like you require regular expression
complexity to just read in some data. You might want to
investigate CSV module (for reading comma delimited files)
or you might just be able to use simple .split() method (for
tab delimited files).

Hope info helps.

Regards,
Larry Bates
 
B

Ben

George said:
B
Ben said:
I'm currently trying to develop a demonstrator in python for an
ontology of a football team. At present all the fit players are
exported to a text document.

The program reads the document in and splits each line into a string
(since each fit player and their attributes is entered line by line in
the text document) using list = target.splitlines()

[snipped]

The program then performs a loop like so:

The two main problems I'm having are that the first and entry in the
list is not printing. Once I have overcome this problem I then need
each player and there related variables to be stored seperately. This
is not happening at present because each time the loop runs it
overwrites the value in each variable.

Any help would be greatly appreciated.

Ben.


Ben, can you post a sample line from the document and indicate the
fields you want to extract? I'm
sure it will be easier to help you this way.

George


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"If a slave say to his master: "You are not my master," if they convict
him his master shall cut off his ear."

Hammurabi's Code of Laws
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Below is a few sample lines. There is the name followed by the class
(not important) followed by 5 digits each of which can range 1-9 and
each detail a different ability, such as fitness, attacking ability
etc. Finally the preferred foot is stated.

Freddie Ljungberg Player 02808right
Dennis Bergkamp Player 90705either
Thierry Henry Player 90906either
Ashley Cole Player 17705left


Thanks for your help

ben
 
F

F. Petitjean

Le 24 Mar 2005 06:16:12 -0800, Ben a écrit :
Below is a few sample lines. There is the name followed by the class
(not important) followed by 5 digits each of which can range 1-9 and
each detail a different ability, such as fitness, attacking ability
etc. Finally the preferred foot is stated.

Freddie Ljungberg Player 02808right
Dennis Bergkamp Player 90705either
Thierry Henry Player 90906either
Ashley Cole Player 17705left
filename = 'players' # to adapt
players = {} # mapping of name to abilities
fin = open(filename)
for line in fin:
firstname, lastname, type_, ability = line.split()
players[(lastname, firstname)] = Ability(ability)
fin.close()

where Ability can be e simple function which return processed the
information in the last word(string) of each line, or a class which
stores/manages such information
class Ability(object):
def __init__(self, ability):
digits = ability[:5]
self.details = map(int, list(digits)) # list of details
self.preferred_foot = ability[5:]
# and so on ....
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top