Building Time Based Bins

M

MCD

Hello, I'm new to python and this group and am trying to build some
bins and was wondering if any of you could kindly help me out. I'm a
bit lost on how to begin.

I have some text files that have a time filed along with 2 other fields
formatted like this >>

1231 23 56
1232 25 79
1234 26 88
1235 22 34
1237 31 85
1239 35 94

This goes on throughout a 12hr. period. I'd like to be able to place
the low and high values of the additional fields in a single line
divided into 5min intervals. So it would look something like this >>

1235 22 88
1240 31 94

I hope that makes sense. Should I be using a module like numarray for
this, or is it possible to just use the native functions? Any ideas
would help me very much.

Thank you - Marcus
 
M

Michael Spencer

MCD said:
Hello, I'm new to python and this group and am trying to build some
bins and was wondering if any of you could kindly help me out. I'm a
bit lost on how to begin.

I have some text files that have a time filed along with 2 other fields
formatted like this >>

1231 23 56
1232 25 79
1234 26 88
1235 22 34
1237 31 85
1239 35 94

This goes on throughout a 12hr. period. I'd like to be able to place
the low and high values of the additional fields in a single line
divided into 5min intervals. So it would look something like this >>

1235 22 88
1240 31 94

I hope that makes sense. Should I be using a module like numarray for
this, or is it possible to just use the native functions? Any ideas
would help me very much.

Thank you - Marcus
This sort of thing would do it:



from itertools import groupby

def splitter(iterable):
"""Takes a line-based iterator, yields a list of values per line
edit this for more sophisticated line-based parsing if required"""
for line in iterable:
yield [int(item) for item in line.split()]

def groupkey(data):
"""Groups times by 5 min resolution. Note this version doesn't work
exactly like the example - so fix if necessary"""
time = data[0]
return time / 100 * 100 + (time % 100) / 5 * 5

def grouper(iterable):
"""Groups and summarizes the lines"""
for time, data in groupby(iterable, groupkey):
data_x = zip(*data) #transform the data from cols to rows
print time, min(data_x[1]), max(data_x[2])



# Exercise it:

source = """1231 23 56
1232 25 79
1234 26 88
1235 22 34
1237 31 85
1239 35 94
""" 1230 23 88
1235 22 94
Note this groups by the time at the end of each 5 mins, rather than the
beginning as in your example. If this needs changing, fix groupkey

HTH

Michael
 
J

John Machin

Hello, I'm new to python and this group and am trying to build some
bins and was wondering if any of you could kindly help me out. I'm a
bit lost on how to begin.

Are you (extremely) new to computer programming? Is this school
homework? The reason for asking is that the exercise requires no data
structure more complicated than a one-dimensional array of integers
(if one doubts that the times will always be in ascending order), and
*NO* data structures if one is trusting. It can be done easily without
any extra modules or libraries in just about any computer language
ever invented. So, it's not really a Python question. Perhaps you
should be looking at some basic computer programming learning. Python
*is* a really great language for that -- check out the Python website.

Anyway here's one way of doing it -- only the input and output
arrangements are Python-specific. And you don't need iter*.*.* (yet)
:)

HTH,
John
===========================
C:\junk>type mcd.py
# Look, Ma, no imports!
lines = """\
1231 23 56
1232 25 79
1234 26 88
1235 22 34
1237 31 85
1239 35 94
"""
DUMMY = 9999
bintm = DUMMY
for line in lines.split('\n'): # in practice, open('input_file', 'r'):
if not line: continue
ilist = [int(fld) for fld in line.strip().split()]
print "ilist:", ilist
klock, lo, hi = ilist
newbintm = ((klock + 4) // 5 * 5) % 2400
print "bintm = %d, klock = %d, newbintm = %d" % (bintm, klock,
newbintm)
if newbintm != bintm:
if bintm != DUMMY:
print "==>> %04d %02d %02d" % (bintm, binlo, binhi)
bintm, binlo, binhi = newbintm, lo, hi
else:
binlo = min(binlo, lo)
binhi = max(binhi, hi)
print "end of file ..."
if bintm != DUMMY:
print "==>> %4d %2d %2d" % (bintm, binlo, binhi)

C:\junk>python mcd.py
ilist: [1231, 23, 56]
bintm = 9999, klock = 1231, newbintm = 1235
ilist: [1232, 25, 79]
bintm = 1235, klock = 1232, newbintm = 1235
ilist: [1234, 26, 88]
bintm = 1235, klock = 1234, newbintm = 1235
ilist: [1235, 22, 34]
bintm = 1235, klock = 1235, newbintm = 1235
ilist: [1237, 31, 85]
bintm = 1235, klock = 1237, newbintm = 1240
==>> 1235 22 88
ilist: [1239, 35, 94]
bintm = 1240, klock = 1239, newbintm = 1240
end of file ...
==>> 1240 31 94

C:\junk>
================================
 
M

MCD

John said:
Are you (extremely) new to computer programming? Is this school
homework?

Lol, yes, I am relatively new to programming... and very new to python.
I have experience working with loops, if thens, and boolean operations,
but I haven't worked with lists or array's as of yet... so this is my
first forray. This isn't homework, I'm long out of school. I've been
wanting to extend my programming abilities and I chose python as the
means to acheiving that goal... so far I really like it :)

Thank you both for the code. I ended up working with John's because
it's a bit easier for me to get through. I very much appreciate the
code... it taught me quite a few things about how python converts
string's to integers and vice versa. I didn't expect to get thorugh it,
but after looking at it a bit, I did, and was able to modify it so that
I could work with my own files. Yeah!

The only question I have is in regards to being able to sum a field in
a bin. Using sum(hi) returns only the last value... I'm uncertain how
to cumulatively add up the values as the script runs through each line.
Any pointers?

Thank you again for all your help.
Marcus
 
M

MCD

Never mind about the summing... I learned that you can do this:

sumhi = 0
sumhi += hi

Cool!

Thanks again.
 
A

alessandro -oggei- ogier

MCD said:
This goes on throughout a 12hr. period. I'd like to be able to place
the low and high values of the additional fields in a single line
divided into 5min intervals. So it would look something like this >>

1235 22 88
1240 31 94

what about a sane list comprehension madness ? <g>

lines = """\
1231 23 56
1232 25 79
1234 26 88
1235 22 34
1237 31 85
1239 35 94
"""

input = lines.split('\n') # this is your input

div = lambda x: (x-1)/5

l = dict([
(div(x), []) for x,y,z in [
tuple([int(x) for x in x.split()]) for x in input if x
]
])

[
l[x[0]].append(x[1]) for x in
[
[div(x), (x,y,z)] for x,y,z in
[
tuple([int(x) for x in x.split()]) for x in input if x
]
]
]

print [
[max([x[0] for x in l[j]]),
min([x[1] for x in l[j]]),
max([x[2] for x in l[j]])
] for j in dict([
(div(x), []) for x,y,z in [
tuple([int(x) for x in x.split()]) for x in input
if x
]
]).keys()
]


i think it's a bit memory hungry, though

cya,
 
M

MCD

Thanks Alessandro... I'll have to try that as well.

I have a modified working version of John's code (thanks John!). I'm
able to output the bins by 5min intervals, sum one of the fields, and
get the high and low of each field. So far I'm really happy with how it
works. Thank you to everybody.

The only thing that I'd like to do, which I've been racking my brain on
how to do in python... is how to keep track of the bins, so that I can
refer back to them. For instance, if I wanted to get "binlo" from two
bins back... in the scripting language I was working with (pascal
based) you could create a counting series:

for binlo = binlo - 1 do
begin

2binlosBack = (binlo - 2)

# if it was 12:00, I'd be looking back to 11:50

I would really appreciat if anyone could explain to me how this could
be accomplished using python grammar... or perhaps some other method
"look back" which I'm unable to conceive of.

Many thanks,
Marcus
 
M

Michael Spencer

MCD said:
Thanks Alessandro... I'll have to try that as well.

I have a modified working version of John's code (thanks John!). I'm
able to output the bins by 5min intervals, sum one of the fields, and
get the high and low of each field. So far I'm really happy with how it
works. Thank you to everybody.

The only thing that I'd like to do, which I've been racking my brain on
how to do in python... is how to keep track of the bins, so that I can
refer back to them. For instance, if I wanted to get "binlo" from two
bins back... in the scripting language I was working with (pascal
based) you could create a counting series:

for binlo = binlo - 1 do
begin

2binlosBack = (binlo - 2)

# if it was 12:00, I'd be looking back to 11:50

I would really appreciat if anyone could explain to me how this could
be accomplished using python grammar... or perhaps some other method
"look back" which I'm unable to conceive of.

Many thanks,
Marcus
Just append the results to a list as you go:
bins = []

for bin in ... # whichever method you use to get each new bin
bins.append(bin)

Then refer to previous bins using negative index (starting at -1 for the most
recent):
e.g., 2binlosBack = bins[-3]

Michael
 
M

MCD

Hi Michael, thanks for responding. I actually don't use a method to get
each bin... the bin outputs are nested in the loop. Here's my code:

data_file = open('G:\file.txt')
DUMMY = 9999
bintm = DUMMY
for line in data_file:
fields = line.strip().split()
if not line: continue
ilist = [int(time), int(a)]
# print "ilist:", ilist
klock, a = ilist
newbintm = ((klock + 4) // 5 * 5 ) % 2400
print "bintm = %d, newbintm = %d, a = %d" % (bintm, newbintm, a)
# the above is the raw data and now the bin loop
if bintm == 9999:
bintm = newbintm
binlo = a
elif bintm == newbintm:
binlo = min(binl, t)
else:
print " ==>> %04d %2d" % (bintm, binl) ## this is the bin
bintm = newbintm
binl = a

#-------------------

the input file is in my first post in this thread, the output looks
like:

bintm = 9999, newbintm = 1235, a = 23
bintm = 1235, newbintm = 1235, a = 25
bintm = 1235, newbintm = 1235, a = 26
bintm = 1235, newbintm = 1240, a = 22
==>> 1235 23
bintm = 1240, newbintm = 1240, a = 31
bintm = 1240, newbintm = 1240, a = 35

#---------------------

I'm not sure where I could create the new list without it getting
overwritten in the bin loop. Confused as to how to add the append
method in a for loop without a defined method for the current bin.
Anyway, I'll keep at it, but I'm not sure how to execute it. Thank you
very much for your suggestion.

Marcus
 
M

Michael Spencer

MCD said:
Hi Michael, thanks for responding. I actually don't use a method to get
each bin...

That's because you picked the wrong suggestion ;-) No, seriously, you can do it
easily with this approach:
the bin outputs are nested in the loop. Here's my code:
data_file = open('G:\file.txt')
DUMMY = 9999
bintm = DUMMY bins = []
for line in data_file:
fields = line.strip().split()
if not line: continue
ilist = [int(time), int(a)]
(BTW, there must be more to your code than you have shared for the above line to
execute without raising an exception - where are 'time' and 'a' initially bound?
BTW2, 'time' is the name of a stdlib module, so it's bad practice to use it as
an identifier)
# print "ilist:", ilist
klock, a = ilist
newbintm = ((klock + 4) // 5 * 5 ) % 2400
print "bintm = %d, newbintm = %d, a = %d" % (bintm, newbintm, a)
# the above is the raw data and now the bin loop
if bintm == 9999:
bintm = newbintm
binlo = a
elif bintm == newbintm:
binlo = min(binl, t)
else:
print " ==>> %04d %2d" % (bintm, binl) ## this is the bin
This is where you've declared that you have a bin, so add it to the bins cache:
bins.append((bintm, binl))
bintm = newbintm
binl = a
Michael
 
M

MCD

Ok, thanks Michael, I got it sorted out now. It was just a question of
placing the append statement and the new list in the right place. I
also added a delete command so the list doesn't become too huge,
especially when there's no need to keep it. Here's the corrected code:

if bintm == 9999:
bintm = newbintm
binlo = a
lastbinlo = [binlo] ## new bin creation
elif bintm == newbintm:
binlo = min(binl, t)
else:
if len(lastbinlo) > 1: ## check for append data
del lastbinlo(0) ## delete extras
lastbinlo.append(binlo) ## append new data here
print lastbinlo[-2]
print " ==>> %04d %2d" % (bintm, binl) ## this is the bin
bintm = newbintm
binlo = a

Anyway, many thanks to everyone who helped with this code.

Best regards,
Marcus
 
M

MCD

Michael said:
(BTW, there must be more to your code than you have shared for the above line to
execute without raising an exception - where are 'time' and 'a' initially bound?
BTW2, 'time' is the name of a stdlib module, so it's bad practice to use it as
an identifier)

Yes there is more, I was copy/pasting a bit haphazardly as I see now.
You're right about the identifier, I changed it in my current code to
"t".
This is where you've declared that you have a bin, so add it to the bins cache:
bins.append((bintm, binl))
Michael

Thanks Michael, I haven't been able to read my mail so I ended up
placing the append a bit differently than the way you described, and
somehow got it working... your way looks much easier :). I'm going to
try that right now.

I've mostly been racking my brain with this bit of code:

newtm = ((klock + 4) // 5 * 5 ) % 2400

It works ok until you get to the last five minutes of the hour. For
instance, 956 will return 960... oops, that's not gonna work :). I
don't completely understand how this code is doing what it's doing...
I've played around with different values, but it's still a bit of a
mystery in coming up with a solution. My only work around that I've
been able to come up with is to add 40 to newtm when the last 2 digits
are at 60, but I'm still working on how to do that.

Anyway, thanks for your help, mentioning the append function... that
really opened up a lot of solutions/possibilities for me.

Take care,
Marcus
 
M

Michael Spencer

MCD said:
I've mostly been racking my brain with this bit of code:

newtm = ((klock + 4) // 5 * 5 ) % 2400
You might want to take another look at the first reply I sent you: it contains a
function that does this:

def groupkey(data):
"""Groups times by 5 min resolution. Note this version doesn't work
exactly like the example - so fix if necessary"""
time = data[0]
return time / 100 * 100 + (time % 100) / 5 * 5

# test it:
>>> for i in range(900,959): print groupkey(),

...
900 900 900 900 900 905 905 905 905 905 910 910 910 910 910 915 915 915 915
915 920 920 920 920 920 925 925 925 925 925 930 930 930 930 930 935 935 935 935
935 940 940 940 940 940 945 945 945 945 945 950 950 950 950 950 955 955 955 955
It rounds down, for the reason you have come across

Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top