Time script help sought!

K

kpp9c

I am kind of in a bit of a jam (okay a big jam) and i was hoping that
someone here could give me a quick hand. I had a few pages of time
calculations to do. So, i just started in on them typing them in my
time calculator and writing them in by hand. Now i realize, that i
really need a script to do this because:

1. It turns out there are hundreds of pages of this stuff.
2. I have to do something similar in again soon.
3. By doing it by hand i am introducing wonderful new errors!
4. It all has to be typed up anyway (which means weeks of work and even
more typos!)

The input would like so:

Item_1 TAPE_1 1 00:23 8:23

Item_2 TAPE_1 2 8:23 9:41

Item_3 TAPE_1 3 9:41 10:41
Item_3 TAPE_1 4 10:47 11:19
Item_3 TAPE_1 5 11:21 11:55
Item_3 TAPE_1 6 11:58 12:10
Item_3 TAPE_1 7 12:15 12:45 Defect in analog tape sound.
Item_3 TAPE_1 8 12:58 24:20 Defect in analog tape sound.

Item_4 TAPE_1 9 24:33
Item_4 TAPE_1 10 25:48
Item_4 TAPE_1 11 29:48
Item_4 TAPE_1 12 31:46
Item_4 TAPE_1 13 34:17 Electronic sounds.
Item_4 TAPE_1 14 35:21
Item_4 TAPE_1 15 36:06
Item_4 TAPE_1 16 37:01 37:38

These are analog tapes that were digitized (on to CD or a digital tape)
that have now been exported as individual files that are meant to be
part of an on-line audio archive. The timings refer to the time display
on the CD or digital tape. The now all have to adjusted so that each
item starts at 0.00 since they have all been edited out of their
context and are now all individual items that start at 00:00. So Item_1
which was started at 00:23 on the tape and ended at 8:23 needs to have
23 seconds subtracted to it so that it says:

Item_1 TAPE_1 1 00:00 08:00

Item_2 TAPE_1 2 08:23 09:41

would change to:

Item_2 TAPE_1 2 00:00 01:18

etc.

but as always you may notice a wrinkle.... some items have many times
(here 6) indicated:

Item_3 TAPE_1 3 9:41 10:41
Item_3 TAPE_1 4 10:47 11:19
Item_3 TAPE_1 5 11:21 11:55
Item_3 TAPE_1 6 11:58 12:10
Item_3 TAPE_1 7 12:15 12:45 Defect in analog tape sound.
Item_3 TAPE_1 8 12:58 24:20 Defect in analog tape sound.

This is all a single sound file and these separate times mark where
there was a break, defect, or edit in the individual item. These have
to be adjusted as well to show where these events would appear in the
new sound file which now starts at 00:00.

Item_3 TAPE_1 3 00:00 01:00 ----
Item_3 TAPE_1 4 01:00 01:38 ----
Item_3 TAPE_1 5 01:38 02:14 ----
Item_3 TAPE_1 6 02:14 02:29 ----
Item_3 TAPE_1 7 02:29 03:04 Defect in analog tape sound.
Item_3 TAPE_1 8 03:04 14:39 Defect in analog tape sound.

Further wrinkles: Some have start and end times indicated, some only
start times. I suppose that the output would ideally have both.... some
have comments and others don't ... and I need these comments echo-ed or
since i probably need to make a database or table eventually non
comments just have some place holder.

I'd have a lot of similar type calculations to do... I was hoping and
praying that some one here was feeling generous and show me the way and
then, of course i could modify that to do other tasks... Usually i am
happy to take the long road and all but i'll be honest, i am in a big
jam here and this huge task was just dumped on me. I am frankly a
little desperate for help on this and hoping someone is feeling up to
spoon feeding me a clear modifiable example that works. Sorry.....
cheers,

kevin
 
P

Paul Rubin

kpp9c said:
These are analog tapes that were digitized (on to CD or a digital tape)
that have now been exported as individual files that are meant to be
part of an on-line audio archive. ...
I was hoping and
praying that some one here was feeling generous and show me the way...

Is this online archive going to be accessible by the public for free?
What's in the archive? If you're asking for volunteer labor it's
generally appropriate to say precisely what that the labor is for.
 
K

kpp9c

Yes, Ultimately it will be part of a large digital archive available
for researchers on site and eventually probably on-line for the New
York Public Library. It is a huge undertaking and most of the
soundfiles have been made. I (we) are struggling with the sheer size
of the documentation.... Sorry about that.... i should have been more
clear, epecially since i am sort of begging for a little help. Sorry, i
am slightly overwhelmed at the moment...
 
M

Mark McEahern

kpp9c said:
The input would like so:

[...]

Attached is a first cut at a parser that actually uses the raw content
of your original email. You'll notice that the net effect is that the
parser instance's items attribute contains the source ordered list of
items with attributes for each of the various parts of the line. From
this, it should be pretty easy to adjust the times and what not.

Cheers,

// m

#!/usr/bin/env python

"""usage: %prog
"""

raw = """I am kind of in a bit of a jam (okay a big jam) and i was hoping that
someone here could give me a quick hand. I had a few pages of time
calculations to do. So, i just started in on them typing them in my
time calculator and writing them in by hand. Now i realize, that i
really need a script to do this because:

1. It turns out there are hundreds of pages of this stuff.
2. I have to do something similar in again soon.
3. By doing it by hand i am introducing wonderful new errors!
4. It all has to be typed up anyway (which means weeks of work and even
more typos!)

The input would like so:

Item_1 TAPE_1 1 00:23 8:23

Item_2 TAPE_1 2 8:23 9:41

Item_3 TAPE_1 3 9:41 10:41
Item_3 TAPE_1 4 10:47 11:19
Item_3 TAPE_1 5 11:21 11:55
Item_3 TAPE_1 6 11:58 12:10
Item_3 TAPE_1 7 12:15 12:45 Defect in analog tape sound.
Item_3 TAPE_1 8 12:58 24:20 Defect in analog tape sound.

Item_4 TAPE_1 9 24:33
Item_4 TAPE_1 10 25:48
Item_4 TAPE_1 11 29:48
Item_4 TAPE_1 12 31:46
Item_4 TAPE_1 13 34:17 Electronic sounds.
Item_4 TAPE_1 14 35:21
Item_4 TAPE_1 15 36:06
Item_4 TAPE_1 16 37:01 37:38

These are analog tapes that were digitized (on to CD or a digital tape)
that have now been exported as individual files that are meant to be
part of an on-line audio archive. The timings refer to the time display
on the CD or digital tape. The now all have to adjusted so that each
item starts at 0.00 since they have all been edited out of their
context and are now all individual items that start at 00:00. So Item_1
which was started at 00:23 on the tape and ended at 8:23 needs to have
23 seconds subtracted to it so that it says:

Item_1 TAPE_1 1 00:00 08:00

Item_2 TAPE_1 2 08:23 09:41

would change to:

Item_2 TAPE_1 2 00:00 01:18

etc.

but as always you may notice a wrinkle.... some items have many times
(here 6) indicated:

Item_3 TAPE_1 3 9:41 10:41
Item_3 TAPE_1 4 10:47 11:19
Item_3 TAPE_1 5 11:21 11:55
Item_3 TAPE_1 6 11:58 12:10
Item_3 TAPE_1 7 12:15 12:45 Defect in analog tape sound.
Item_3 TAPE_1 8 12:58 24:20 Defect in analog tape sound.

This is all a single sound file and these separate times mark where
there was a break, defect, or edit in the individual item. These have
to be adjusted as well to show where these events would appear in the
new sound file which now starts at 00:00.

Item_3 TAPE_1 3 00:00 01:00 ----
Item_3 TAPE_1 4 01:00 01:38 ----
Item_3 TAPE_1 5 01:38 02:14 ----
Item_3 TAPE_1 6 02:14 02:29 ----
Item_3 TAPE_1 7 02:29 03:04 Defect in analog tape sound.
Item_3 TAPE_1 8 03:04 14:39 Defect in analog tape sound.

Further wrinkles: Some have start and end times indicated, some only
start times. I suppose that the output would ideally have both.... some
have comments and others don't ... and I need these comments echo-ed or
since i probably need to make a database or table eventually non
comments just have some place holder.

I'd have a lot of similar type calculations to do... I was hoping and
praying that some one here was feeling generous and show me the way and
then, of course i could modify that to do other tasks... Usually i am
happy to take the long road and all but i'll be honest, i am in a big
jam here and this huge task was just dumped on me. I am frankly a
little desperate for help on this and hoping someone is feeling up to
spoon feeding me a clear modifiable example that works. Sorry.....
cheers,

kevin

-- http://mail.python.org/mailman/listinfo/python-list """

import optparse
import re

pat = re.compile('\s+')

class Item:

def __init__(self, line):
parts = pat.split(line)
self.name, self.tape, self.number, self.start = parts[:4]
if len(parts) == 5:
self.end = parts[4]
else:
self.end = None
if len(parts) > 5:
self.comment = ' '.join(parts[5:])
else:
self.comment = None

class Parser:

def __init__(self):
self.items = []

def feed(self, line):
item = Item(line)
self.items.append(item)

def parseCommandLine(usage, requiredArgCount, argv=None):
"""Parse the command line and return (options, args).

Raise an error if there are insufficient positional arguments as
specified by requiredArgCount.
"""
parser = optparse.OptionParser(usage)
## parser.add_option('-x',
## '--xxx',
## action='',
## default='',
## help='')
options, args = parser.parse_args(argv)
if len(args) < requiredArgCount:
parser.error('Missing parameters.')
return options, args

def main(argv=None):
usage = __doc__
requiredArgCount = 0
options, args = parseCommandLine(usage, requiredArgCount, argv)
filename = args[0]
parser = Parser()
for line in raw.split('\n'):
if not line.startswith('Item_'):
continue
parser.feed(line)

if __name__ == '__main__':
main()
 
S

sp1d3rx

read below for my sample script....
I am kind of in a bit of a jam (okay a big jam) and i was hoping that
someone here could give me a quick hand. I had a few pages of time
calculations to do. So, i just started in on them typing them in my
time calculator and writing them in by hand. Now i realize, that i
really need a script to do this because:

1. It turns out there are hundreds of pages of this stuff.
2. I have to do something similar in again soon.
3. By doing it by hand i am introducing wonderful new errors!
4. It all has to be typed up anyway (which means weeks of work and even
more typos!)

The input would like so:

Item_1 TAPE_1 1 00:23 8:23

Item_2 TAPE_1 2 8:23 9:41

Item_3 TAPE_1 3 9:41 10:41
Item_3 TAPE_1 4 10:47 11:19
Item_3 TAPE_1 5 11:21 11:55
Item_3 TAPE_1 6 11:58 12:10
Item_3 TAPE_1 7 12:15 12:45 Defect in analog tape sound.
Item_3 TAPE_1 8 12:58 24:20 Defect in analog tape sound.

Item_4 TAPE_1 9 24:33
Item_4 TAPE_1 10 25:48
Item_4 TAPE_1 11 29:48
Item_4 TAPE_1 12 31:46
Item_4 TAPE_1 13 34:17 Electronic sounds.
Item_4 TAPE_1 14 35:21
Item_4 TAPE_1 15 36:06
Item_4 TAPE_1 16 37:01 37:38

These are analog tapes that were digitized (on to CD or a digital tape)
that have now been exported as individual files that are meant to be
part of an on-line audio archive. The timings refer to the time display
on the CD or digital tape. The now all have to adjusted so that each
item starts at 0.00 since they have all been edited out of their
context and are now all individual items that start at 00:00. So Item_1
which was started at 00:23 on the tape and ended at 8:23 needs to have
23 seconds subtracted to it so that it says:

Item_1 TAPE_1 1 00:00 08:00

Item_2 TAPE_1 2 08:23 09:41

would change to:

Item_2 TAPE_1 2 00:00 01:18

etc.

but as always you may notice a wrinkle.... some items have many times
(here 6) indicated:

Item_3 TAPE_1 3 9:41 10:41
Item_3 TAPE_1 4 10:47 11:19
Item_3 TAPE_1 5 11:21 11:55
Item_3 TAPE_1 6 11:58 12:10
Item_3 TAPE_1 7 12:15 12:45 Defect in analog tape sound.
Item_3 TAPE_1 8 12:58 24:20 Defect in analog tape sound.

This is all a single sound file and these separate times mark where
there was a break, defect, or edit in the individual item. These have
to be adjusted as well to show where these events would appear in the
new sound file which now starts at 00:00.

Item_3 TAPE_1 3 00:00 01:00 ----
Item_3 TAPE_1 4 01:00 01:38 ----
Item_3 TAPE_1 5 01:38 02:14 ----
Item_3 TAPE_1 6 02:14 02:29 ----
Item_3 TAPE_1 7 02:29 03:04 Defect in analog tape sound.
Item_3 TAPE_1 8 03:04 14:39 Defect in analog tape sound.

Further wrinkles: Some have start and end times indicated, some only
start times. I suppose that the output would ideally have both.... some
have comments and others don't ... and I need these comments echo-ed or
since i probably need to make a database or table eventually non
comments just have some place holder.

I'd have a lot of similar type calculations to do... I was hoping and
praying that some one here was feeling generous and show me the way and
then, of course i could modify that to do other tasks... Usually i am
happy to take the long road and all but i'll be honest, i am in a big
jam here and this huge task was just dumped on me. I am frankly a
little desperate for help on this and hoping someone is feeling up to
spoon feeding me a clear modifiable example that works. Sorry.....
cheers,

kevin
--------START--------
inp = file("input1.txt",'r') # input file is opened readonly
x = inp.readline() #read in the first line of the file
x = x.upper().split(None,5) #convert it to uppercase and split into 5
segments
print x #show the line as splitted and converted
print x[1] #show the second element
start = x[3].split(":") #split the minutes from the seconds
end = x[4].split(":") #split the minutes from the seconds
print "Start at:", start[0], "minutes and ", start[1], "seconds."
start_in_seconds = int(start[0])*60 + int(start[1]) #converts
minutes/seconds to seconds
print start_in_seconds , "seconds offset."
print "End at:", end[0],"minutes and",end[1], "seconds."
end_in_seconds = int(end[0])*60 + int(end[1])#converts minutes/seconds
to seconds
print end_in_seconds , "seconds offset."
totaltime = end_in_seconds - start_in_seconds #calculate the length of
the segment
print "Total time of segment in seconds:", totaltime
print "Total time of segment in minutes/seconds:", totaltime/60,
"minutes and", totaltime % 60, "seconds."
# ^^^ converts seconds back to minutes and seconds.
--------END--------

This should give you an excellent starting point.
 
S

sp1d3rx

Gosh Mark you must be a friggin genius or something. I can't even begin
to read your code. Anyways, I think mine is easier to understand. My
program has all the functions (save a couple) that you would need for
your project. It's an exercise for you to copy and paste what you want
where you want it and to form a basic program structure out of it. some
of my comments spilled on to the next line (thanks google!) so you will
have to clean it up a bit.
 
K

kpp9c

Thanks for this Everyone!

Trying to work with all the stuff folks are giving me on this i a have
come across a problem... down
the line i notice that some of the times will also have an hour as well
as in H:M:S (e.g. 1:22:40)

so in some cases i would need to convert H:M:S to sec and some just M:S


or should there just be a fun that passes all the times and converts
them to H:M:S first and
just appends a 00: ((e.g. 00:02:07) if there is no hour value?
cheers,

-kp
 
S

sp1d3rx

using my code above...
start = x[3].split(":") #split the minutes from the seconds
this splits something like 1:22:40 into three parts
1(hours), 22(mintes), 40(seconds)
so, you must add a special case handler...
if len(start) == 3:
{
start[1] = int(start[0]) * 60 + int(start[1])
start = start[1], start[2]
}
and there you go....
what this does is take the hours , multiply them by 60 and add it to
the minutes.
You will also have to change your output function to convert from
"seconds" to "hours, minutes, seconds".
 
K

kpp9c

I also notice that there is the is the 'datetime' module, which is new
to version 2.3, which i now have access to. My feeling is that this
will do much of what i want, but i can't get my head round the standard
library reference stuff

http://www.python.org/doc/lib/module-datetime.html

I don't have any texts with me either and it probably is too new to be
in the Python Standard Library book by Fredrik Lundh or the Python
Essential Reference by David Beazley


-kevin-
 
K

kpp9c

still working on it and also fixing the input data. I think for
simplicity and consistency's sake i will have *all* time values input
and output as hh:mm:ss maybe that would be easier.... but i have a few
thousand find and replaceeseseses to do now (yes i am doing them by
hand)

grr... this is hard!
 
K

kpp9c

so all the imput will look more like this now... ( no comments either)

tem_133, DAT_20, 7, 00:58:25, 01:15:50

Item_134, DAT_20, 8, 01:15:50, 01:32:15

Item_135, DAT_21, 1, 00:01:00, 00:36:15

Item_136, DAT_60, 3, 00:18:30
Item_136, DAT_60, 4, 00:19:30
Item_136, DAT_60, 5, 00:23:00, 00:28:00

Item_137, DAT_21, 4, 00:37:00, 00:47:00

Item_139, DAT_21, 2, 00:36:15
Item_139, DAT_21, 3, 00:42:15, 01:00:50

Item_140, DAT_21, 4, 01:00:50
Item_140, DAT_21, 5, 01:25:10, 01:26:35

...... snip...
 
P

Paul McGuire

kpp9c said:
still working on it and also fixing the input data. I think for
simplicity and consistency's sake i will have *all* time values input
and output as hh:mm:ss maybe that would be easier.... but i have a few
thousand find and replaceeseseses to do now (yes i am doing them by
hand)

grr... this is hard!

Oh, I wasn't going to chime in on this thread, your data looked so
well-formed that I wouldn't recommend pyparsing, but there is enough
variability going on here, I thought I'd give it a try. Here's a pyparsing
treatment of your problem. It will accommodate trailing comments or none,
leading hours or none on timestamps, and missing end times, and normalizes
all times back to the item start time.

Most of your processing logic will end up going into the processVals()
routine. I've put various examples of how to access the parsed tokens by
field name, and some helper methods for converting to and from seconds and
hh:mm:ss or mm:ss times.

-- Paul


from pyparsing import *

data = """
Item_1 TAPE_1 1 00:23 8:23

Item_2 TAPE_1 2 8:23 9:41

Item_3 TAPE_1 3 9:41 10:41
Item_3 TAPE_1 4 10:47 11:19
Item_3 TAPE_1 5 11:21 11:55
Item_3 TAPE_1 6 11:58 12:10
Item_3 TAPE_1 7 12:15 12:45 Defect in analog tape sound.
Item_3 TAPE_1 8 12:58 24:20 Defect in analog tape sound.

Item_4 TAPE_1 9 24:33
Item_4 TAPE_1 10 25:48
Item_4 TAPE_1 11 29:48
Item_4 TAPE_1 12 31:46
Item_4 TAPE_1 13 34:17 Electronic sounds.
Item_4 TAPE_1 14 35:21
Item_4 TAPE_1 15 36:06
Item_4 TAPE_1 16 37:01 01:37:38
"""

def toSecs(tstr):
fields = tstr.split(":")
secs = int(fields[-1])
secs += int(fields[-2])*60
if len(fields)>2: secs += int(fields[-3])*60*60
return secs

def secsToTime(secs):
s = secs % 60
m = ((secs - s) / 60 ) % 60
h = (secs >= 3600 and (secs - s - m*60 ) / 3600 or 0)
return "%02d:%02d:%02d" % (h,m,s)

# globals for normalizing timestamps
lastItem = ""
itemStart = 0

# put logic here for processing various parse fields
def processVals(s,l,t):
global lastItem,itemStart
print t.item,t.tape,t.recnum
if not t.item == lastItem :
lastItem = t.item
itemStart = toSecs(t.start)

startSecs = toSecs(t.start)
print secsToTime(startSecs),"(%s)" % secsToTime(startSecs-itemStart)

if t.end:
endSecs = toSecs(t.end)
print secsToTime(endSecs),"(%s)" % secsToTime(endSecs-itemStart)
print endSecs-startSecs,"elapsed seconds"
print secsToTime(endSecs-startSecs),"elapsed time"
else:
print "<no end time>"
print t.comment
print

# define structure of a line of data - sorry about the clunkiness of the
optional trailing fields
integer = Word(nums)
timestr = Combine(integer + ":" + integer + Optional(":" + integer))
dataline = ( Combine("Item_"+integer).setResultsName("item") +
Combine("TAPE_"+integer).setResultsName("tape") +
integer.setResultsName("recnum") +
timestr.setResultsName("start") +
Optional(~LineEnd() + timestr, default="").setResultsName("end")
+
Optional(~LineEnd() + empty +
restOfLine,default="-").setResultsName("comment") )

# set up parse handler that will process the actual fields
dataline.setParseAction(processVals)

# now parse the little buggers
OneOrMore(dataline).parseString(data)

will print out:

Item_1 TAPE_1 1
00:00:23 (00:00:00)
00:08:23 (00:08:00)
480 elapsed seconds
00:08:00 elapsed time
-

Item_2 TAPE_1 2
00:08:23 (00:00:00)
00:09:41 (00:01:18)
78 elapsed seconds
00:01:18 elapsed time
-

Item_3 TAPE_1 3
00:09:41 (00:00:00)
00:10:41 (00:01:00)
60 elapsed seconds
00:01:00 elapsed time
-

Item_3 TAPE_1 4
00:10:47 (00:01:06)
00:11:19 (00:01:38)
32 elapsed seconds
00:00:32 elapsed time
-

Item_3 TAPE_1 5
00:11:21 (00:01:40)
00:11:55 (00:02:14)
34 elapsed seconds
00:00:34 elapsed time
-
....
 
S

sp1d3rx

man, now that is beautifully done. Paul, I wish I knew about pyparsing
a while ago. I could have used it in a few projects. :)
 
P

Paul McGuire

man, now that is beautifully done. Paul, I wish I knew about pyparsing
a while ago. I could have used it in a few projects. :)
Thanks for the compliment! I'll be the first to admit that pyparsing can be
a bit persnickety in some applications, especially when you have to trap on
end-of-line, or have some columnar-dependent syntax. pyparsing really works
best with data that has some syntactic structure to it, but is not
whitespace sensitive (pyparsing's default behavior is to skip over
whitespace). I don't recommend it when the data is well-formed - regexp's
or even just string.split() will blow it away from a performance (and
simplicity) standpoint. For instance, if the OP knew that all his data was
itemId, tapeId, recordNum, startTime, endTime, comment, he could have most
easily parsed it with

item, tape, recordNum, startTime, endTime, comment = dataline.split(None,5)

It's just those optional and variable-formatted fields that getcha. :)

-- Paul

pyparsing can be downloaded at http://pyparsing.sourceforge.net.
 
K

kpp9c

paul that is awesome.... so much better than what i did which was lamo
brute force method. I formmatted and reformatted my input data and
stuffed it in a HUGE dictionary.... it was stupid and kludgy.... i hope
to study all these approaches and learn something.... here's what i
came up with ... with my pea sized brain...

#!/usr/bin/env python

# New in version 2.3 is the 'datetime' module (see standard library
reference)
# http://www.python.org/doc/lib/module-datetime.html

import datetime

inseqs = { (1) : ['DAT_1', '01', '00:00:23', '00:08:23'],
(2) : ['DAT_1', '02', '00:08:23', '00:09:41'],
(513) : ['DAT_75', '10', '00:59:55', '01:11:05'],
(514) : ['DAT_75', '11', '01:11:05', '01:16:15'],
(515) : ['DAT_75', '12', '01:16:15', '01:34:15'],
(516) : ['DAT_75', '13', '01:34:15', '01:45:15'],
(517) : ['DAT_75', '14', '01:45:15', '01:48:00'] }


mykeys = inseqs.keys() # first make a copy of the keys
mykeys.sort() # now sort that copy in place

for key in mykeys:
event = inseqs[key]
print '\n','Item #', key, event
TD = datetime.timedelta
h, m, s = event[2].split(':')
zero_adjust = TD(hours=int(h), minutes=int(m),seconds=int(s))
#
print ' Q___ ', key, event[:2], ': ',
for item in event[2:]:
hrs, mins, secs, = item.split(':')
time1 = TD(hours=int(hrs), minutes=int(mins),seconds=int(secs))
print time1 - zero_adjust,

print
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top