Reading a file into a data structure....

MrPink · Oct 13, 2011

This is a continuing to a post I made in August:
http://groups.google.com/group/comp...d09911e4107?lnk=gst&q=MrPink#ce6d4d09911e4107

I got some free time to work with Python again and have some followup
questions.

For example, I have a list in a text file like this:
Example list of lottery drawings:
date,wb,wb,wb,wb,wb,bb
4/1/2011,5,1,45,23,27,27
5/1/2011,15,23,8,48,22,32
6/1/2011,33,49,21,16,34,1
7/1/2011,9,3,13,22,45,41
8/1/2011,54,1,24,39,35,18
.....

Ticket:
startdate,enddate,wb,wb,wb,wb,wb,bb
4/1/2011,8/1/2011,5,23,32,21,3,27

I am trying to determine the optimal way to organize the data
structure of the drawing list, search the drawing list, and mark the
matches in the drawing list.

f = open("C:\temp\drawinglist.txt", "r")
lines = f.readlines()
f.close()
drawing = lines[1].split()

The results in drawing is this:
drawing[0] = '4/1/2011'
drawing[1] = '5'
drawing[2] = '1'
drawing[3] = '45'
drawing[4] = '23'
drawing[5] = '27'
drawing[6] = '27'

I need to convert drawing[0] to a date datatype. This works, but I'm
sure there is a better way.
from datetime import date
month, day, year = drawing[0].split('/')
drawing[0] = date(int(year), int(month), int(day))

For searching, I need to determine if the date of the drawing is
within the date range of the ticket. If yes, then mark which numbers
in the drawing match the numbers in the ticket.

ticket[0] = '4/1/2011'
ticket[0] = '8/1/2011'
ticket[0] = '5'
ticket[0] = '23'
ticket[0] = '32'
ticket[0] = '21'
ticket[0] = '3'
ticket[0] = 27'

drawing[0] = '4/1/2011' (match)
drawing[1] = '5' (match)
drawing[2] = '1'
drawing[3] = '45'
drawing[4] = '23' (match)
drawing[5] = '27'
drawing[6] = '27' (match)

I'm debating on structuring the drawing list like this:
drawing[0] = '4/1/2011'
drawing[1][0] = '5'
drawing[1][1] = '1'
drawing[1][2] = '45'
drawing[1][3] = '23'
drawing[1][4] = '27'
drawing[2] = '27'

Sort drawing[1] from low to high
drawing[1][0] = '1'
drawing[1][1] = '5'
drawing[1][2] = '23'
drawing[1][3] = '27'
drawing[1][4] = '45'

I want to keep the drawing list in memory for reuse.

Any guidance would be most helpful and appreciated.
BTW, I want to learn, so be careful not to do too much of the work for
me.
I'm using WingIDE to do my work.

Thanks,

Ian Kelly · Oct 13, 2011

This is a continuing to a post I made in August:
http://groups.google.com/group/comp...d09911e4107?lnk=gst&q=MrPink#ce6d4d09911e4107

I got some free time to work with Python again and have some followup
questions.

For example, I have a list in a text file like this:
Example list of lottery drawings:
date,wb,wb,wb,wb,wb,bb
4/1/2011,5,1,45,23,27,27
5/1/2011,15,23,8,48,22,32
6/1/2011,33,49,21,16,34,1
7/1/2011,9,3,13,22,45,41
8/1/2011,54,1,24,39,35,18
....

Ticket:
startdate,enddate,wb,wb,wb,wb,wb,bb
4/1/2011,8/1/2011,5,23,32,21,3,27

I am trying to determine the optimal way to organize the data
structure of the drawing list, search the drawing list, and mark the
matches in the drawing list.

f = open("C:\temp\drawinglist.txt", "r")
lines = f.readlines()
f.close()
drawing = lines[1].split()

That looks like a CSV file. If the contents are tightly constrained
then it may not matter, but if not then you should consider using the
csv module to read the lines, which will handle inconvenient details
like quoting and escape characters for you.

I need to convert drawing[0] to a date datatype. This works, but I'm
sure there is a better way.
from datetime import date
month, day, year = drawing[0].split('/')
drawing[0] = date(int(year), int(month), int(day))

If you already know the format:

from datetime import datetime
drawing[0] = datetime.strptime(drawing[0], '%m/%d/%Y').date()

If you can't be sure of the format, then I recommend using the
python-dateutil parser.parse() function, which will try to work it out
on the fly.

Jon Clements · Oct 14, 2011

This is a continuing to a post I made in August:http://groups.google.com/group/comp.lang.python/browse_thread/thread/...

I got some free time to work with Python again and have some followup
questions.

For example, I have a list in a text file like this:
Example list of lottery drawings:
date,wb,wb,wb,wb,wb,bb
4/1/2011,5,1,45,23,27,27
5/1/2011,15,23,8,48,22,32
6/1/2011,33,49,21,16,34,1
7/1/2011,9,3,13,22,45,41
8/1/2011,54,1,24,39,35,18
....

Ticket:
startdate,enddate,wb,wb,wb,wb,wb,bb
4/1/2011,8/1/2011,5,23,32,21,3,27

I am trying to determine the optimal way to organize the data
structure of the drawing list, search the drawing list, and mark the
matches in the drawing list.

f = open("C:\temp\drawinglist.txt", "r")
lines = f.readlines()
f.close()
drawing = lines[1].split()

The results in drawing is this:
drawing[0] = '4/1/2011'
drawing[1] = '5'
drawing[2] = '1'
drawing[3] = '45'
drawing[4] = '23'
drawing[5] = '27'
drawing[6] = '27'

I need to convert drawing[0] to a date datatype. This works, but I'm
sure there is a better way.
from datetime import date
month, day, year = drawing[0].split('/')
drawing[0] = date(int(year), int(month), int(day))

For searching, I need to determine if the date of the drawing is
within the date range of the ticket. If yes, then mark which numbers
in the drawing match the numbers in the ticket.

ticket[0] = '4/1/2011'
ticket[0] = '8/1/2011'
ticket[0] = '5'
ticket[0] = '23'
ticket[0] = '32'
ticket[0] = '21'
ticket[0] = '3'
ticket[0] = 27'

drawing[0] = '4/1/2011' (match)
drawing[1] = '5' (match)
drawing[2] = '1'
drawing[3] = '45'
drawing[4] = '23' (match)
drawing[5] = '27'
drawing[6] = '27' (match)

I'm debating on structuring the drawing list like this:
drawing[0] = '4/1/2011'
drawing[1][0] = '5'
drawing[1][1] = '1'
drawing[1][2] = '45'
drawing[1][3] = '23'
drawing[1][4] = '27'
drawing[2] = '27'

Sort drawing[1] from low to high
drawing[1][0] = '1'
drawing[1][1] = '5'
drawing[1][2] = '23'
drawing[1][3] = '27'
drawing[1][4] = '45'

I want to keep the drawing list in memory for reuse.

Any guidance would be most helpful and appreciated.
BTW, I want to learn, so be careful not to do too much of the work for
me.
I'm using WingIDE to do my work.

Thanks,

- Use the csv module to read the file
- Use strptime to process the date field
- Use a set for draw numbers (you'd have to do pure equality on the
bb)
- Look at persisting in a sqlite3 DB (maybe with a custom convertor)

hth,

Jon.

MrPink · Oct 15, 2011

This is what I have been able to accomplish:

def isInt(s):
try:
i = int(s)
return True
except ValueError:
return False

f = open("powerball.txt", "r")
lines = f.readlines()
f.close()

dDrawings = {}
for line in lines:
if isInt(line[0]):
t = line.split()
d = t[0]
month,day,year = t[0].split("/")
i = int(year + month + day)
wb = t[1:6]
wb.sort()
pb = t[6]
r = {'d':d,'wb':wb,'pb'

b}
dDrawings = r

The dictionary dDrawings contains records like this:
dDrawings[19971101]
{'pb': '20', 'd': '11/01/1997', 'wb': ['22', '25', '28', '33', '37']}

I am now able to search for ticket in a date range.
keys = dDrawings.keys()
b = [key for key in keys if 20110909 <= key <= 20111212]

How would I search for matching wb (White Balls) in the drawings?

Is there a better way to organize the data so that it will be flexible
enough for different types of searches?
Search by date range, search by pb, search by wb matches, etc.

I hope this all makes sense.

Thanks,

This is a continuing to a post I made in August:http://groups.google.com/group/comp.lang.python/browse_thread/thread/...

Click to expand...

I got some free time to work with Python again and have some followup
questions.

Click to expand...

For example, I have a list in a text file like this:
Example list of lottery drawings:
date,wb,wb,wb,wb,wb,bb
4/1/2011,5,1,45,23,27,27
5/1/2011,15,23,8,48,22,32
6/1/2011,33,49,21,16,34,1
7/1/2011,9,3,13,22,45,41
8/1/2011,54,1,24,39,35,18
....

I am trying to determine the optimal way to organize the data
structure of the drawing list, search the drawing list, and mark the
matches in the drawing list.

Click to expand...

f = open("C:\temp\drawinglist.txt", "r")
lines = f.readlines()
f.close()
drawing = lines[1].split()

Click to expand...

The results in drawing is this:
drawing[0] = '4/1/2011'
drawing[1] = '5'
drawing[2] = '1'
drawing[3] = '45'
drawing[4] = '23'
drawing[5] = '27'
drawing[6] = '27'

Click to expand...

I need to convert drawing[0] to a date datatype. This works, but I'm
sure there is a better way.
from datetime import date
month, day, year = drawing[0].split('/')
drawing[0] = date(int(year), int(month), int(day))

Click to expand...

For searching, I need to determine if the date of the drawing is
within the date range of the ticket. If yes, then mark which numbers
in the drawing match the numbers in the ticket.

Click to expand...

ticket[0] = '4/1/2011'
ticket[0] = '8/1/2011'
ticket[0] = '5'
ticket[0] = '23'
ticket[0] = '32'
ticket[0] = '21'
ticket[0] = '3'
ticket[0] = 27'

Click to expand...

drawing[0] = '4/1/2011' (match)
drawing[1] = '5' (match)
drawing[2] = '1'
drawing[3] = '45'
drawing[4] = '23' (match)
drawing[5] = '27'
drawing[6] = '27' (match)

Click to expand...

I'm debating on structuring the drawing list like this:
drawing[0] = '4/1/2011'
drawing[1][0] = '5'
drawing[1][1] = '1'
drawing[1][2] = '45'
drawing[1][3] = '23'
drawing[1][4] = '27'
drawing[2] = '27'

Click to expand...

Sort drawing[1] from low to high
drawing[1][0] = '1'
drawing[1][1] = '5'
drawing[1][2] = '23'
drawing[1][3] = '27'
drawing[1][4] = '45'

Click to expand...

I want to keep the drawing list in memory for reuse.

Click to expand...

Any guidance would be most helpful and appreciated.
BTW, I want to learn, so be careful not to do too much of the work for
me.
I'm using WingIDE to do my work.

Click to expand...

Thanks,

Click to expand...

- Use the csv module to read the file
- Use strptime to process the date field
- Use a set for draw numbers (you'd have to do pure equality on the
bb)
- Look at persisting in a sqlite3 DB (maybe with a custom convertor)

hth,

Jon.

Click to expand...

Chris Angelico · Oct 15, 2011

def isInt(s):
try:
i = int(s)
return True
except ValueError:
return False

f = open("powerball.txt", "r")
lines = f.readlines()
f.close()

dDrawings = {}
for line in lines:
if isInt(line[0]):
t = line.split()
d = t[0]
month,day,year = t[0].split("/")
i = int(year + month + day)
wb = t[1:6]
wb.sort()
pb = t[6]
r = {'d':d,'wb':wb,'pb'b}
dDrawings = r

Here's a quick rejig:

dDrawings = {}
for line in open("powerball.txt"):
try:
t = line.split()
d = t[0]
month,day,year = t[0].split("/")
i = int(year + month + day)
wb = t[1:6]
wb.sort()
pb = t[6]
r = {'d':d,'wb':wb,'pb'b}
dDrawings = r
except ValueError:
pass

There are two significant differences. One is that the file is kept
open until processing is complete, rather than reading the file into a
list and then iterating over the list. Your processing is pretty
simple, so it's unlikely to make a difference, but if you're doing a
lengthy operation on the lines of text, or conversely if you're
reading in gigs and gigs of file, you may want to take that into
consideration. The other change is that a ValueError _anywhere_ in
processing will cause the line to be silently ignored. If this isn't
what you want, then shorten the try/except block and make it use
'continue' instead of 'pass' (which will then silently ignore that
line, but leave the rest of processing unguarded by try/except).

The most likely cause of another ValueError is this line:
month,day,year = t[0].split("/")
If there are not precisely two slashes, this will:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
a,b,c="asdf/qwer".split("/")
ValueError: need more than 2 values to unpack

Do you want this to cause the line to be ignored, or to noisily abort
the whole script?

ChrisA

Chris Rebert · Oct 15, 2011

This is what I have been able to accomplish:

def isInt(s):
Â Â try:
Â Â Â Â i = int(s)
Â Â Â Â return True
Â Â except ValueError:
Â Â Â Â return False

f = open("powerball.txt", "r")
lines = f.readlines()
f.close()

dDrawings = {}
for line in lines:
Â Â if isInt(line[0]):
Â Â Â Â t = line.split()
Â Â Â Â d = t[0]
Â Â Â Â month,day,year = t[0].split("/")
Â Â Â Â i = int(year + month + day)
Â Â Â Â wb = t[1:6]
Â Â Â Â wb.sort()
Â Â Â Â pb = t[6]
Â Â Â Â r = {'d':d,'wb':wb,'pb'b}
Â Â Â Â dDrawings = r

The dictionary dDrawings contains records like this:
dDrawings[19971101]
{'pb': '20', 'd': '11/01/1997', 'wb': ['22', '25', '28', '33', '37']}

I am now able to search for ticket in a date range.
keys = dDrawings.keys()
b = [key for key in keys if 20110909 <= key <= 20111212]

How would I search for matching wb (White Balls) in the drawings?

Is there a better way to organize the data so that it will be flexible
enough for different types of searches?
Search by date range, search by pb, search by wb matches, etc.

I hope this all makes sense.

from datetime import datetime
from collections import namedtuple, defaultdict
# for efficient searching by date: import bisect

DATE_FORMAT = "%m/%d/%Y"
Ticket = namedtuple('Ticket', "white_balls powerball date".split())

powerball2ticket = defaultdict(set)
whiteball2ticket = defaultdict(set)
tickets_by_date = []

with open("powerball.txt", "r") as f:
for line in f:
if not line[0].isdigit():
# what are these other lines anyway?
continue # skip such lines

fields = line.split()

date = datetime.strptime(fields[0], DATE_FORMAT).date()
white_balls = frozenset(int(num_str) for num_str in fields[1:6])
powerball = int(fields[6])
ticket = Ticket(white_balls, powerball, date)

powerball2ticket[powerball].add(ticket)
for ball in white_balls:
whiteball2ticket[ball].add(ticket)
tickets_by_date.append(ticket)

tickets_by_date.sort(key=lambda ticket: ticket.date)

print(powerball2ticket[7]) # all tickets with a 7 powerball
print(whiteball2ticket[3]) # all tickets with a non-power 3 ball

Cheers,
Chris

Troy S · Oct 16, 2011

Chris,
Thanks for the help.
I am using the powerball numbers from this text file downloaded from the site.
http://www.powerball.com/powerball/winnums-text.txt
The first row is the header/fieldnames and the file starts off like this:

Draw Date WB1 WB2 WB3 WB4 WB5 PB PP
10/12/2011 43 10 12 23 47 18 3
10/08/2011 35 03 37 27 45 31 5
10/05/2011 46 07 43 54 20 17 4
10/01/2011 27 43 12 23 01 31 3
09/28/2011 41 51 30 50 53 08 2
09/24/2011 27 12 03 04 44 26 5
09/21/2011 47 52 55 48 12 13 4

The testing of a digit was used to skip the first row only.

I'm stil dissecting your Python code to better understand the use of
collection, namedtuples, etc.
I have not found many examples/descriptions yet about collections,
namedtuples, etc. I don't quite understand them that much. Do you
know of a reference that can break this stuff down better for me?
The couple of books that I have on Python do not go into collection,
namedtuples, etc that much.

Thanks,

This is what I have been able to accomplish:

def isInt(s):
try:
i = int(s)
return True
except ValueError:
return False

f = open("powerball.txt", "r")
lines = f.readlines()
f.close()

dDrawings = {}
for line in lines:
if isInt(line[0]):
t = line.split()
d = t[0]
month,day,year = t[0].split("/")
i = int(year + month + day)
wb = t[1:6]
wb.sort()
pb = t[6]
r = {'d':d,'wb':wb,'pb'b}
dDrawings = r

The dictionary dDrawings contains records like this:
dDrawings[19971101]
{'pb': '20', 'd': '11/01/1997', 'wb': ['22', '25', '28', '33', '37']}

I am now able to search for ticket in a date range.
keys = dDrawings.keys()
b = [key for key in keys if 20110909 <= key <= 20111212]

How would I search for matching wb (White Balls) in the drawings?

Is there a better way to organize the data so that it will be flexible
enough for different types of searches?
Search by date range, search by pb, search by wb matches, etc.

I hope this all makes sense.

Click to expand...

from datetime import datetime
from collections import namedtuple, defaultdict
# for efficient searching by date: import bisect

DATE_FORMAT = "%m/%d/%Y"
Ticket = namedtuple('Ticket', "white_balls powerball date".split())

powerball2ticket = defaultdict(set)
whiteball2ticket = defaultdict(set)
tickets_by_date = []

with open("powerball.txt", "r") as f:
for line in f:
if not line[0].isdigit():
# what are these other lines anyway?
continue # skip such lines

fields = line.split()

date = datetime.strptime(fields[0], DATE_FORMAT).date()
white_balls = frozenset(int(num_str) for num_str in fields[1:6])
powerball = int(fields[6])
ticket = Ticket(white_balls, powerball, date)

powerball2ticket[powerball].add(ticket)
for ball in white_balls:
whiteball2ticket[ball].add(ticket)
tickets_by_date.append(ticket)

tickets_by_date.sort(key=lambda ticket: ticket.date)

print(powerball2ticket[7]) # all tickets with a 7 powerball
print(whiteball2ticket[3]) # all tickets with a non-power 3 ball

Cheers,
Chris

MrPink · Oct 16, 2011

I did not understand what a tuple was.
So it was very hard for me to understand what a namedtuple was and
follow the code.
Then I looked up the word at dictionary.com and got this:
http://dictionary.reference.com/browse/tuple

tuple: computing a row of values in a relational database

Now I understand a little better what a tuple is and can follow the
code better.

A namedtuple seems like a dictionary type. I'll need to read up on
the difference between the two.

Thanks again.

This is what I have been able to accomplish:

Click to expand...

def isInt(s):
try:
i = int(s)
return True
except ValueError:
return False

Click to expand...

f = open("powerball.txt", "r")
lines = f.readlines()
f.close()

Click to expand...

dDrawings = {}
for line in lines:
if isInt(line[0]):
t = line.split()
d = t[0]
month,day,year = t[0].split("/")
i = int(year + month + day)
wb = t[1:6]
wb.sort()
pb = t[6]
r = {'d':d,'wb':wb,'pb'b}
dDrawings = r

Click to expand...

The dictionary dDrawings contains records like this:
dDrawings[19971101]
{'pb': '20', 'd': '11/01/1997', 'wb': ['22', '25', '28', '33', '37']}

Click to expand...

I am now able to search for ticket in a date range.
keys = dDrawings.keys()
b = [key for key in keys if 20110909 <= key <= 20111212]

Click to expand...

How would I search for matching wb (White Balls) in the drawings?

Click to expand...

Is there a better way to organize the data so that it will be flexible
enough for different types of searches?
Search by date range, search by pb, search by wb matches, etc.

Click to expand...

I hope this all makes sense.

Click to expand...

from datetime import datetime
from collections import namedtuple, defaultdict
# for efficient searching by date: import bisect

DATE_FORMAT = "%m/%d/%Y"
Ticket = namedtuple('Ticket', "white_balls powerball date".split())

powerball2ticket = defaultdict(set)
whiteball2ticket = defaultdict(set)
tickets_by_date = []

with open("powerball.txt", "r") as f:
for line in f:
if not line[0].isdigit():
# what are these other lines anyway?
continue # skip such lines

fields = line.split()

date = datetime.strptime(fields[0], DATE_FORMAT).date()
white_balls = frozenset(int(num_str) for num_str in fields[1:6])
powerball = int(fields[6])
ticket = Ticket(white_balls, powerball, date)

powerball2ticket[powerball].add(ticket)
for ball in white_balls:
whiteball2ticket[ball].add(ticket)
tickets_by_date.append(ticket)

tickets_by_date.sort(key=lambda ticket: ticket.date)

print(powerball2ticket[7]) # all tickets with a 7 powerball
print(whiteball2ticket[3]) # all tickets with a non-power 3 ball

Cheers,
Chris
--http://rebertia.com

Ian Kelly · Oct 16, 2011

I did not understand what a tuple was.
So it was very hard for me to understand what a namedtuple was and
follow the code.
Then I looked up the word at dictionary.com and got this:
http://dictionary.reference.com/browse/tuple

tuple: computing a row of values in a relational database

Now I understand a little better what a tuple is and can follow the
code better.

Python tuples do not have anything to do with relational databases.
You would get a better introduction from Wikipedia:

http://en.wikipedia.org/wiki/Tuple

In Python, tuples are nothing more than immutable sequences, as
opposed to lists, which are mutable sequences. Another way of
characterizing the difference between lists and tuples is in how they
are typically used. A list is typically used for a homogeneous
(meaning all elements are treated in the same way) sequence containing
an arbitrary number of unrelated objects. A tuple is typically used
for a heterogeneous sequence of a certain length. For example, a
tuple might be expected to contain exactly two strings and an int that
are related in some fashion.

A namedtuple seems like a dictionary type. I'll need to read up on
the difference between the two.

A namedtuple is a tuple subclass where each of the elements the tuple
is expected to contain has been given a specific name for ease of
reference. The names are essentially aliases for numerical indices.
It differs from a dictionary in that it is ordered and only contains
elements with specific names (and always contains those elements),
whereas a dictionary contains arbitrary key-value pairs.

Searching for Lottery drawing list of ticket match...	3	Aug 10, 2011
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Taskcproblem calendar	4	Aug 31, 2023
Minimum Total Difficulty	0	Nov 15, 2023
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
Duplicate animated SVG with Symbol Element not working	1	Sep 7, 2022
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023

Reading a file into a data structure....

MrPink

Ian Kelly

Jon Clements

MrPink

Chris Angelico

Chris Rebert

Troy S

MrPink

Ian Kelly

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads