Reading a file into a data structure....

M

MrPink

This is a continuing to a post I made in August:
http://groups.google.com/group/comp...d09911e4107?lnk=gst&q=MrPink#ce6d4d09911e4107

I got some free time to work with Python again and have some followup
questions.

For example, I have a list in a text file like this:
Example list of lottery drawings:
date,wb,wb,wb,wb,wb,bb
4/1/2011,5,1,45,23,27,27
5/1/2011,15,23,8,48,22,32
6/1/2011,33,49,21,16,34,1
7/1/2011,9,3,13,22,45,41
8/1/2011,54,1,24,39,35,18
.....

Ticket:
startdate,enddate,wb,wb,wb,wb,wb,bb
4/1/2011,8/1/2011,5,23,32,21,3,27

I am trying to determine the optimal way to organize the data
structure of the drawing list, search the drawing list, and mark the
matches in the drawing list.

f = open("C:\temp\drawinglist.txt", "r")
lines = f.readlines()
f.close()
drawing = lines[1].split()

The results in drawing is this:
drawing[0] = '4/1/2011'
drawing[1] = '5'
drawing[2] = '1'
drawing[3] = '45'
drawing[4] = '23'
drawing[5] = '27'
drawing[6] = '27'

I need to convert drawing[0] to a date datatype. This works, but I'm
sure there is a better way.
from datetime import date
month, day, year = drawing[0].split('/')
drawing[0] = date(int(year), int(month), int(day))

For searching, I need to determine if the date of the drawing is
within the date range of the ticket. If yes, then mark which numbers
in the drawing match the numbers in the ticket.

ticket[0] = '4/1/2011'
ticket[0] = '8/1/2011'
ticket[0] = '5'
ticket[0] = '23'
ticket[0] = '32'
ticket[0] = '21'
ticket[0] = '3'
ticket[0] = 27'

drawing[0] = '4/1/2011' (match)
drawing[1] = '5' (match)
drawing[2] = '1'
drawing[3] = '45'
drawing[4] = '23' (match)
drawing[5] = '27'
drawing[6] = '27' (match)


I'm debating on structuring the drawing list like this:
drawing[0] = '4/1/2011'
drawing[1][0] = '5'
drawing[1][1] = '1'
drawing[1][2] = '45'
drawing[1][3] = '23'
drawing[1][4] = '27'
drawing[2] = '27'

Sort drawing[1] from low to high
drawing[1][0] = '1'
drawing[1][1] = '5'
drawing[1][2] = '23'
drawing[1][3] = '27'
drawing[1][4] = '45'

I want to keep the drawing list in memory for reuse.

Any guidance would be most helpful and appreciated.
BTW, I want to learn, so be careful not to do too much of the work for
me.
I'm using WingIDE to do my work.

Thanks,
 
I

Ian Kelly

This is a continuing to a post I made in August:
http://groups.google.com/group/comp...d09911e4107?lnk=gst&q=MrPink#ce6d4d09911e4107

I got some free time to work with Python again and have some followup
questions.

For example, I have a list in a text file like this:
Example list of lottery drawings:
date,wb,wb,wb,wb,wb,bb
4/1/2011,5,1,45,23,27,27
5/1/2011,15,23,8,48,22,32
6/1/2011,33,49,21,16,34,1
7/1/2011,9,3,13,22,45,41
8/1/2011,54,1,24,39,35,18
....

Ticket:
startdate,enddate,wb,wb,wb,wb,wb,bb
4/1/2011,8/1/2011,5,23,32,21,3,27

I am trying to determine the optimal way to organize the data
structure of the drawing list, search the drawing list, and mark the
matches in the drawing list.

f = open("C:\temp\drawinglist.txt", "r")
lines = f.readlines()
f.close()
drawing = lines[1].split()

That looks like a CSV file. If the contents are tightly constrained
then it may not matter, but if not then you should consider using the
csv module to read the lines, which will handle inconvenient details
like quoting and escape characters for you.
I need to convert drawing[0] to a date datatype.  This works, but I'm
sure there is a better way.
from datetime import date
month, day, year = drawing[0].split('/')
drawing[0] = date(int(year), int(month), int(day))

If you already know the format:

from datetime import datetime
drawing[0] = datetime.strptime(drawing[0], '%m/%d/%Y').date()

If you can't be sure of the format, then I recommend using the
python-dateutil parser.parse() function, which will try to work it out
on the fly.
 
J

Jon Clements

This is a continuing to a post I made in August:http://groups.google.com/group/comp.lang.python/browse_thread/thread/...

I got some free time to work with Python again and have some followup
questions.

For example, I have a list in a text file like this:
Example list of lottery drawings:
date,wb,wb,wb,wb,wb,bb
4/1/2011,5,1,45,23,27,27
5/1/2011,15,23,8,48,22,32
6/1/2011,33,49,21,16,34,1
7/1/2011,9,3,13,22,45,41
8/1/2011,54,1,24,39,35,18
....

Ticket:
startdate,enddate,wb,wb,wb,wb,wb,bb
4/1/2011,8/1/2011,5,23,32,21,3,27

I am trying to determine the optimal way to organize the data
structure of the drawing list, search the drawing list, and mark the
matches in the drawing list.

f = open("C:\temp\drawinglist.txt", "r")
lines = f.readlines()
f.close()
drawing = lines[1].split()

The results in drawing is this:
drawing[0] = '4/1/2011'
drawing[1] = '5'
drawing[2] = '1'
drawing[3] = '45'
drawing[4] = '23'
drawing[5] = '27'
drawing[6] = '27'

I need to convert drawing[0] to a date datatype.  This works, but I'm
sure there is a better way.
from datetime import date
month, day, year = drawing[0].split('/')
drawing[0] = date(int(year), int(month), int(day))

For searching, I need to determine if the date of the drawing is
within the date range of the ticket.  If yes, then mark which numbers
in the drawing match the numbers in the ticket.

ticket[0] = '4/1/2011'
ticket[0] = '8/1/2011'
ticket[0] = '5'
ticket[0] = '23'
ticket[0] = '32'
ticket[0] = '21'
ticket[0] = '3'
ticket[0] = 27'

drawing[0] = '4/1/2011' (match)
drawing[1] = '5' (match)
drawing[2] = '1'
drawing[3] = '45'
drawing[4] = '23' (match)
drawing[5] = '27'
drawing[6] = '27' (match)

I'm debating on structuring the drawing list like this:
drawing[0] = '4/1/2011'
drawing[1][0] = '5'
drawing[1][1] = '1'
drawing[1][2] = '45'
drawing[1][3] = '23'
drawing[1][4] = '27'
drawing[2] = '27'

Sort drawing[1] from low to high
drawing[1][0] = '1'
drawing[1][1] = '5'
drawing[1][2] = '23'
drawing[1][3] = '27'
drawing[1][4] = '45'

I want to keep the drawing list in memory for reuse.

Any guidance would be most helpful and appreciated.
BTW, I want to learn, so be careful not to do too much of the work for
me.
I'm using WingIDE to do my work.

Thanks,

- Use the csv module to read the file
- Use strptime to process the date field
- Use a set for draw numbers (you'd have to do pure equality on the
bb)
- Look at persisting in a sqlite3 DB (maybe with a custom convertor)

hth,

Jon.
 
M

MrPink

This is what I have been able to accomplish:

def isInt(s):
try:
i = int(s)
return True
except ValueError:
return False

f = open("powerball.txt", "r")
lines = f.readlines()
f.close()

dDrawings = {}
for line in lines:
if isInt(line[0]):
t = line.split()
d = t[0]
month,day,year = t[0].split("/")
i = int(year + month + day)
wb = t[1:6]
wb.sort()
pb = t[6]
r = {'d':d,'wb':wb,'pb':pb}
dDrawings = r

The dictionary dDrawings contains records like this:
dDrawings[19971101]
{'pb': '20', 'd': '11/01/1997', 'wb': ['22', '25', '28', '33', '37']}

I am now able to search for ticket in a date range.
keys = dDrawings.keys()
b = [key for key in keys if 20110909 <= key <= 20111212]

How would I search for matching wb (White Balls) in the drawings?

Is there a better way to organize the data so that it will be flexible
enough for different types of searches?
Search by date range, search by pb, search by wb matches, etc.

I hope this all makes sense.

Thanks,

I got some free time to work with Python again and have some followup
questions.
For example, I have a list in a text file like this:
Example list of lottery drawings:
date,wb,wb,wb,wb,wb,bb
4/1/2011,5,1,45,23,27,27
5/1/2011,15,23,8,48,22,32
6/1/2011,33,49,21,16,34,1
7/1/2011,9,3,13,22,45,41
8/1/2011,54,1,24,39,35,18
....

I am trying to determine the optimal way to organize the data
structure of the drawing list, search the drawing list, and mark the
matches in the drawing list.
f = open("C:\temp\drawinglist.txt", "r")
lines = f.readlines()
f.close()
drawing = lines[1].split()
The results in drawing is this:
drawing[0] = '4/1/2011'
drawing[1] = '5'
drawing[2] = '1'
drawing[3] = '45'
drawing[4] = '23'
drawing[5] = '27'
drawing[6] = '27'
I need to convert drawing[0] to a date datatype.  This works, but I'm
sure there is a better way.
from datetime import date
month, day, year = drawing[0].split('/')
drawing[0] = date(int(year), int(month), int(day))
For searching, I need to determine if the date of the drawing is
within the date range of the ticket.  If yes, then mark which numbers
in the drawing match the numbers in the ticket.
ticket[0] = '4/1/2011'
ticket[0] = '8/1/2011'
ticket[0] = '5'
ticket[0] = '23'
ticket[0] = '32'
ticket[0] = '21'
ticket[0] = '3'
ticket[0] = 27'
drawing[0] = '4/1/2011' (match)
drawing[1] = '5' (match)
drawing[2] = '1'
drawing[3] = '45'
drawing[4] = '23' (match)
drawing[5] = '27'
drawing[6] = '27' (match)
I'm debating on structuring the drawing list like this:
drawing[0] = '4/1/2011'
drawing[1][0] = '5'
drawing[1][1] = '1'
drawing[1][2] = '45'
drawing[1][3] = '23'
drawing[1][4] = '27'
drawing[2] = '27'
Sort drawing[1] from low to high
drawing[1][0] = '1'
drawing[1][1] = '5'
drawing[1][2] = '23'
drawing[1][3] = '27'
drawing[1][4] = '45'
I want to keep the drawing list in memory for reuse.
Any guidance would be most helpful and appreciated.
BTW, I want to learn, so be careful not to do too much of the work for
me.
I'm using WingIDE to do my work.

- Use the csv module to read the file
- Use strptime to process the date field
- Use a set for draw numbers (you'd have to do pure equality on the
bb)
- Look at persisting in a sqlite3 DB (maybe with a custom convertor)

hth,

Jon.
 
C

Chris Angelico

def isInt(s):
   try:
       i = int(s)
       return True
   except ValueError:
       return False

f = open("powerball.txt", "r")
lines = f.readlines()
f.close()

dDrawings = {}
for line in lines:
   if isInt(line[0]):
       t = line.split()
       d = t[0]
       month,day,year = t[0].split("/")
       i = int(year + month + day)
       wb = t[1:6]
       wb.sort()
       pb = t[6]
       r = {'d':d,'wb':wb,'pb':pb}
       dDrawings = r


Here's a quick rejig:

dDrawings = {}
for line in open("powerball.txt"):
try:
t = line.split()
d = t[0]
month,day,year = t[0].split("/")
i = int(year + month + day)
wb = t[1:6]
wb.sort()
pb = t[6]
r = {'d':d,'wb':wb,'pb':pb}
dDrawings = r
except ValueError:
pass

There are two significant differences. One is that the file is kept
open until processing is complete, rather than reading the file into a
list and then iterating over the list. Your processing is pretty
simple, so it's unlikely to make a difference, but if you're doing a
lengthy operation on the lines of text, or conversely if you're
reading in gigs and gigs of file, you may want to take that into
consideration. The other change is that a ValueError _anywhere_ in
processing will cause the line to be silently ignored. If this isn't
what you want, then shorten the try/except block and make it use
'continue' instead of 'pass' (which will then silently ignore that
line, but leave the rest of processing unguarded by try/except).

The most likely cause of another ValueError is this line:
month,day,year = t[0].split("/")
If there are not precisely two slashes, this will:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
a,b,c="asdf/qwer".split("/")
ValueError: need more than 2 values to unpack

Do you want this to cause the line to be ignored, or to noisily abort
the whole script?

ChrisA
 
C

Chris Rebert

This is what I have been able to accomplish:

def isInt(s):
   try:
       i = int(s)
       return True
   except ValueError:
       return False

f = open("powerball.txt", "r")
lines = f.readlines()
f.close()

dDrawings = {}
for line in lines:
   if isInt(line[0]):
       t = line.split()
       d = t[0]
       month,day,year = t[0].split("/")
       i = int(year + month + day)
       wb = t[1:6]
       wb.sort()
       pb = t[6]
       r = {'d':d,'wb':wb,'pb':pb}
       dDrawings = r

The dictionary dDrawings contains records like this:
dDrawings[19971101]
{'pb': '20', 'd': '11/01/1997', 'wb': ['22', '25', '28', '33', '37']}

I am now able to search for ticket in a date range.
keys = dDrawings.keys()
b = [key for key in keys if 20110909 <= key <= 20111212]

How would I search for matching wb (White Balls) in the drawings?

Is there a better way to organize the data so that it will be flexible
enough for different types of searches?
Search by date range, search by pb, search by wb matches, etc.

I hope this all makes sense.


from datetime import datetime
from collections import namedtuple, defaultdict
# for efficient searching by date: import bisect

DATE_FORMAT = "%m/%d/%Y"
Ticket = namedtuple('Ticket', "white_balls powerball date".split())

powerball2ticket = defaultdict(set)
whiteball2ticket = defaultdict(set)
tickets_by_date = []

with open("powerball.txt", "r") as f:
for line in f:
if not line[0].isdigit():
# what are these other lines anyway?
continue # skip such lines

fields = line.split()

date = datetime.strptime(fields[0], DATE_FORMAT).date()
white_balls = frozenset(int(num_str) for num_str in fields[1:6])
powerball = int(fields[6])
ticket = Ticket(white_balls, powerball, date)

powerball2ticket[powerball].add(ticket)
for ball in white_balls:
whiteball2ticket[ball].add(ticket)
tickets_by_date.append(ticket)

tickets_by_date.sort(key=lambda ticket: ticket.date)

print(powerball2ticket[7]) # all tickets with a 7 powerball
print(whiteball2ticket[3]) # all tickets with a non-power 3 ball


Cheers,
Chris
 
T

Troy S

Chris,
Thanks for the help.
I am using the powerball numbers from this text file downloaded from the site.
http://www.powerball.com/powerball/winnums-text.txt
The first row is the header/fieldnames and the file starts off like this:

Draw Date WB1 WB2 WB3 WB4 WB5 PB PP
10/12/2011 43 10 12 23 47 18 3
10/08/2011 35 03 37 27 45 31 5
10/05/2011 46 07 43 54 20 17 4
10/01/2011 27 43 12 23 01 31 3
09/28/2011 41 51 30 50 53 08 2
09/24/2011 27 12 03 04 44 26 5
09/21/2011 47 52 55 48 12 13 4

The testing of a digit was used to skip the first row only.

I'm stil dissecting your Python code to better understand the use of
collection, namedtuples, etc.
I have not found many examples/descriptions yet about collections,
namedtuples, etc. I don't quite understand them that much. Do you
know of a reference that can break this stuff down better for me?
The couple of books that I have on Python do not go into collection,
namedtuples, etc that much.

Thanks,

This is what I have been able to accomplish:

def isInt(s):
   try:
       i = int(s)
       return True
   except ValueError:
       return False

f = open("powerball.txt", "r")
lines = f.readlines()
f.close()

dDrawings = {}
for line in lines:
   if isInt(line[0]):
       t = line.split()
       d = t[0]
       month,day,year = t[0].split("/")
       i = int(year + month + day)
       wb = t[1:6]
       wb.sort()
       pb = t[6]
       r = {'d':d,'wb':wb,'pb':pb}
       dDrawings = r

The dictionary dDrawings contains records like this:
dDrawings[19971101]
{'pb': '20', 'd': '11/01/1997', 'wb': ['22', '25', '28', '33', '37']}

I am now able to search for ticket in a date range.
keys = dDrawings.keys()
b = [key for key in keys if 20110909 <= key <= 20111212]

How would I search for matching wb (White Balls) in the drawings?

Is there a better way to organize the data so that it will be flexible
enough for different types of searches?
Search by date range, search by pb, search by wb matches, etc.

I hope this all makes sense.


from datetime import datetime
from collections import namedtuple, defaultdict
# for efficient searching by date: import bisect

DATE_FORMAT = "%m/%d/%Y"
Ticket = namedtuple('Ticket', "white_balls powerball date".split())

powerball2ticket = defaultdict(set)
whiteball2ticket = defaultdict(set)
tickets_by_date = []

with open("powerball.txt", "r") as f:
   for line in f:
       if not line[0].isdigit():
           # what are these other lines anyway?
           continue # skip such lines

       fields = line.split()

       date = datetime.strptime(fields[0], DATE_FORMAT).date()
       white_balls = frozenset(int(num_str) for num_str in fields[1:6])
       powerball = int(fields[6])
       ticket = Ticket(white_balls, powerball, date)

       powerball2ticket[powerball].add(ticket)
       for ball in white_balls:
           whiteball2ticket[ball].add(ticket)
       tickets_by_date.append(ticket)

tickets_by_date.sort(key=lambda ticket: ticket.date)

print(powerball2ticket[7]) # all tickets with a 7 powerball
print(whiteball2ticket[3]) # all tickets with a non-power 3 ball


Cheers,
Chris
 
M

MrPink

I did not understand what a tuple was.
So it was very hard for me to understand what a namedtuple was and
follow the code.
Then I looked up the word at dictionary.com and got this:
http://dictionary.reference.com/browse/tuple

tuple: computing a row of values in a relational database

Now I understand a little better what a tuple is and can follow the
code better.

A namedtuple seems like a dictionary type. I'll need to read up on
the difference between the two.

Thanks again.

This is what I have been able to accomplish:
def isInt(s):
   try:
       i = int(s)
       return True
   except ValueError:
       return False
f = open("powerball.txt", "r")
lines = f.readlines()
f.close()
dDrawings = {}
for line in lines:
   if isInt(line[0]):
       t = line.split()
       d = t[0]
       month,day,year = t[0].split("/")
       i = int(year + month + day)
       wb = t[1:6]
       wb.sort()
       pb = t[6]
       r = {'d':d,'wb':wb,'pb':pb}
       dDrawings = r

The dictionary dDrawings contains records like this:
dDrawings[19971101]
{'pb': '20', 'd': '11/01/1997', 'wb': ['22', '25', '28', '33', '37']}
I am now able to search for ticket in a date range.
keys = dDrawings.keys()
b = [key for key in keys if 20110909 <= key <= 20111212]
How would I search for matching wb (White Balls) in the drawings?
Is there a better way to organize the data so that it will be flexible
enough for different types of searches?
Search by date range, search by pb, search by wb matches, etc.
I hope this all makes sense.

from datetime import datetime
from collections import namedtuple, defaultdict
# for efficient searching by date: import bisect

DATE_FORMAT = "%m/%d/%Y"
Ticket = namedtuple('Ticket', "white_balls powerball date".split())

powerball2ticket = defaultdict(set)
whiteball2ticket = defaultdict(set)
tickets_by_date = []

with open("powerball.txt", "r") as f:
    for line in f:
        if not line[0].isdigit():
            # what are these other lines anyway?
            continue # skip such lines

        fields = line.split()

        date = datetime.strptime(fields[0], DATE_FORMAT).date()
        white_balls = frozenset(int(num_str) for num_str in fields[1:6])
        powerball = int(fields[6])
        ticket = Ticket(white_balls, powerball, date)

        powerball2ticket[powerball].add(ticket)
        for ball in white_balls:
            whiteball2ticket[ball].add(ticket)
        tickets_by_date.append(ticket)

tickets_by_date.sort(key=lambda ticket: ticket.date)

print(powerball2ticket[7]) # all tickets with a 7 powerball
print(whiteball2ticket[3]) # all tickets with a non-power 3 ball

Cheers,
Chris
--http://rebertia.com
 
I

Ian Kelly

I did not understand what a tuple was.
So it was very hard for me to understand what a namedtuple was and
follow the code.
Then I looked up the word at dictionary.com and got this:
http://dictionary.reference.com/browse/tuple

tuple:  computing  a row of values in a relational database

Now I understand a little better what a tuple is and can follow the
code better.

Python tuples do not have anything to do with relational databases.
You would get a better introduction from Wikipedia:

http://en.wikipedia.org/wiki/Tuple

In Python, tuples are nothing more than immutable sequences, as
opposed to lists, which are mutable sequences. Another way of
characterizing the difference between lists and tuples is in how they
are typically used. A list is typically used for a homogeneous
(meaning all elements are treated in the same way) sequence containing
an arbitrary number of unrelated objects. A tuple is typically used
for a heterogeneous sequence of a certain length. For example, a
tuple might be expected to contain exactly two strings and an int that
are related in some fashion.
A namedtuple seems like a dictionary type.  I'll need to read up on
the difference between the two.

A namedtuple is a tuple subclass where each of the elements the tuple
is expected to contain has been given a specific name for ease of
reference. The names are essentially aliases for numerical indices.
It differs from a dictionary in that it is ordered and only contains
elements with specific names (and always contains those elements),
whereas a dictionary contains arbitrary key-value pairs.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top