Python new user question - file writeline error

J

James

Hello,

I'm a newbie to Python & wondering someone can help me with this...

I have this code:
--------------------------
#! /usr/bin/python

import sys

month ={'JAN':1,'FEB':2,'MAR':3,'APR':4,'MAY':5,'JUN':6,'JUL':7,'AUG':
8,'SEP':9,'OCT':10,'NOV':11,'DEC':12}
infile=file('TVA-0316','r')
outfile=file('tmp.out','w')

for line in infile:
item = line.split(',')
dob = item[6].split('/')
dob = dob[2]+'-'+str(month[dob[1]])+'-'+dob[0]
lbdt = item[8].split('/')
lbdt = lbdt[2]+'-'+str(month[lbdt[1]])+'-'+lbdt[0]
lbrc = item[10].split('/')
lbrc = lbrc[2]+'-'+str(month[lbrc[1]])+'-'+lbrc[0]
lbrp = item[14].split('/')
lbrp = lbrp[2]+'-'+str(month[lbrp[1]])+'-'+lbrp[0]
item[6] = dob
item[8] = lbdt
item[10]=lbrc
item[14]=lbrp
list = ','.join(item)
outfile.writelines(list)
infile.close
outfile.close
-----------------------------

And the data file(TVA-0316) looks like this:
-----------------------------
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,AST,19,U/L,5,40,,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,GGT,34,U/L,11,32,h,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,ALT,31,U/L,5,29,h,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,ALKP,61,U/L,40,135,,
-----------------------------

Basically I'm reading in each line and converting all date fields (05/
MAR/1950) to different format (1950-03-05) in order to load into MySQL
table.

I have two issues:
1. the outfile doesn't complete with no error message. when I check
the last line in the python interpreter, it has read and processed the
last line, but the output file stopped before.
2. Is this the best way to do this in Python?
3. (Out of scope) is there a way to load this CSV file directly into
MySQL data field without converting the format?

Thank you.

James
 
B

Bruno Desthuilliers

James a écrit :
Hello,

I'm a newbie to Python & wondering someone can help me with this...

I have this code:
--------------------------
#! /usr/bin/python

import sys

month ={'JAN':1,'FEB':2,'MAR':3,'APR':4,'MAY':5,'JUN':6,'JUL':7,'AUG':
8,'SEP':9,'OCT':10,'NOV':11,'DEC':12}
infile=file('TVA-0316','r')
outfile=file('tmp.out','w')

for line in infile:
item = line.split(',')

CSV format ?
http://docs.python.org/lib/module-csv.html
dob = item[6].split('/')
dob = dob[2]+'-'+str(month[dob[1]])+'-'+dob[0]

Why did you use integers as values in the month dict if it's for using
them as strings ?
lbdt = item[8].split('/')
lbdt = lbdt[2]+'-'+str(month[lbdt[1]])+'-'+lbdt[0]
lbrc = item[10].split('/')
lbrc = lbrc[2]+'-'+str(month[lbrc[1]])+'-'+lbrc[0]
lbrp = item[14].split('/')
lbrp = lbrp[2]+'-'+str(month[lbrp[1]])+'-'+lbrp[0]

This may help too:
http://docs.python.org/lib/module-datetime.html
item[6] = dob
item[8] = lbdt
item[10]=lbrc
item[14]=lbrp
list = ','.join(item)

Better to avoid using builtin types names as identifiers. And FWIW, this
is *not* a list...
outfile.writelines(list)

You want file.writeline() or file.write(). And you have to manually add
the newline.
infile.close

You're not actually *calling* infile.close - just getting a reference on
the file.close method. The parens are not optional in Python, they are
the call operator.
outfile.close
Idem.

-----------------------------

And the data file(TVA-0316) looks like this:
-----------------------------
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,AST,19,U/L,5,40,,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,GGT,34,U/L,11,32,h,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,ALT,31,U/L,5,29,h,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,ALKP,61,U/L,40,135,,
-----------------------------

Basically I'm reading in each line and converting all date fields (05/
MAR/1950) to different format (1950-03-05) in order to load into MySQL
table.

I have two issues:
1. the outfile doesn't complete with no error message. when I check
the last line in the python interpreter, it has read and processed the
last line, but the output file stopped before.

Use the csv module and cleanly close your files, then come back if you
still have problems.
2. Is this the best way to do this in Python?

Err... What to say... Obviously, no.
 
S

Shawn Milo

Hello,

I'm a newbie to Python & wondering someone can help me with this...

I have this code:
--------------------------
#! /usr/bin/python

import sys

month ={'JAN':1,'FEB':2,'MAR':3,'APR':4,'MAY':5,'JUN':6,'JUL':7,'AUG':
8,'SEP':9,'OCT':10,'NOV':11,'DEC':12}
infile=file('TVA-0316','r')
outfile=file('tmp.out','w')

for line in infile:
item = line.split(',')
dob = item[6].split('/')
dob = dob[2]+'-'+str(month[dob[1]])+'-'+dob[0]
lbdt = item[8].split('/')
lbdt = lbdt[2]+'-'+str(month[lbdt[1]])+'-'+lbdt[0]
lbrc = item[10].split('/')
lbrc = lbrc[2]+'-'+str(month[lbrc[1]])+'-'+lbrc[0]
lbrp = item[14].split('/')
lbrp = lbrp[2]+'-'+str(month[lbrp[1]])+'-'+lbrp[0]
item[6] = dob
item[8] = lbdt
item[10]=lbrc
item[14]=lbrp
list = ','.join(item)
outfile.writelines(list)
infile.close
outfile.close
-----------------------------

And the data file(TVA-0316) looks like this:
-----------------------------
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,AST,19,U/L,5,40,,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,GGT,34,U/L,11,32,h,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,ALT,31,U/L,5,29,h,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,ALKP,61,U/L,40,135,,
-----------------------------

Basically I'm reading in each line and converting all date fields (05/
MAR/1950) to different format (1950-03-05) in order to load into MySQL
table.

I have two issues:
1. the outfile doesn't complete with no error message. when I check
the last line in the python interpreter, it has read and processed the
last line, but the output file stopped before.
2. Is this the best way to do this in Python?
3. (Out of scope) is there a way to load this CSV file directly into
MySQL data field without converting the format?

Thank you.

James


Your script worked for me. I'm not sure what the next step is in
troubleshooting it. Is it possible that your whitespace isn't quite
right? I had to reformat it, but I assume it was because of the way
cut & paste worked from Gmail.

I usually use Perl for data stuff like this, but I don't see why
Python wouldn't be a great solution. However, I would re-write it
using regexes, to seek and replace sections that are formatted like a
date, rather than breaking it into a variable for each field, changing
each date individually, then putting them back together.

As for how MySQL likes having dates formatted in CSV input: I can't
help there, but I'm sure someone else can.

I'm pretty new to Python myself, but if you'd like help with a
Perl/regex solution, I'm up for it. For that matter, whipping up a
Python/regex solution would probably be good for me. Let me know.

Shawn
 
J

James

I'm a newbie to Python & wondering someone can help me with this...
I have this code:
import sys
month ={'JAN':1,'FEB':2,'MAR':3,'APR':4,'MAY':5,'JUN':6,'JUL':7,'AUG':
8,'SEP':9,'OCT':10,'NOV':11,'DEC':12}
infile=file('TVA-0316','r')
outfile=file('tmp.out','w')
for line in infile:
item = line.split(',')
dob = item[6].split('/')
dob = dob[2]+'-'+str(month[dob[1]])+'-'+dob[0]
lbdt = item[8].split('/')
lbdt = lbdt[2]+'-'+str(month[lbdt[1]])+'-'+lbdt[0]
lbrc = item[10].split('/')
lbrc = lbrc[2]+'-'+str(month[lbrc[1]])+'-'+lbrc[0]
lbrp = item[14].split('/')
lbrp = lbrp[2]+'-'+str(month[lbrp[1]])+'-'+lbrp[0]
item[6] = dob
item[8] = lbdt
item[10]=lbrc
item[14]=lbrp
list = ','.join(item)
outfile.writelines(list)
infile.close
outfile.close
-----------------------------
And the data file(TVA-0316) looks like this:
-----------------------------
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,AST,19,U/L,5,40,,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,GGT,34,U/L,11,32,h,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,ALT,31,U/L,5,29,h,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/
NOV/2006,V1,,,21/NOV/2006,ALKP,61,U/L,40,135,,
-----------------------------
Basically I'm reading in each line and converting all date fields (05/
MAR/1950) to different format (1950-03-05) in order to load into MySQL
table.
I have two issues:
1. the outfile doesn't complete with no error message. when I check
the last line in the python interpreter, it has read and processed the
last line, but the output file stopped before.
2. Is this the best way to do this in Python?
3. (Out of scope) is there a way to load this CSV file directly into
MySQL data field without converting the format?
Thank you.

Your script worked for me. I'm not sure what the next step is in
troubleshooting it. Is it possible that your whitespace isn't quite
right? I had to reformat it, but I assume it was because of the way
cut & paste worked from Gmail.

I usually use Perl for data stuff like this, but I don't see why
Python wouldn't be a great solution. However, I would re-write it
using regexes, to seek and replace sections that are formatted like a
date, rather than breaking it into a variable for each field, changing
each date individually, then putting them back together.

As for how MySQL likes having dates formatted in CSV input: I can't
help there, but I'm sure someone else can.

I'm pretty new to Python myself, but if you'd like help with a
Perl/regex solution, I'm up for it. For that matter, whipping up a
Python/regex solution would probably be good for me. Let me know.

Shawn

Thank you very much for your kind offer.
I'm also coming from Perl myself - heard many good things about Python
so I'm trying it out - but it seems harder than I thought :(

James
 
J

Jerry Hill

I have this code: ....
....
1. the outfile doesn't complete with no error message. when I check
the last line in the python interpreter, it has read and processed the
last line, but the output file stopped before.

You need to call the close methods on your file objects like this:
outfile.close()

If you leave off the parentheses, you get the method object, but don't
do anything with it.
2. Is this the best way to do this in Python?

I would parse your dates using the python time module, like this:

Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
IDLE 1.2
>>> import time
>>> line = r'06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/NOV/2006,V1,,,21/NOV/2006,AST,19,U/L,5,40,,'
>>> item = line.split(',')
>>> time.strftime('%a, %d %b %Y', timedate) 'Sun, 05 Mar 1950'
>>> dob = item[6]
>>> dob_time = time.strptime(dob, '%d/%b/%Y')
>>> dob_time (1950, 3, 5, 0, 0, 0, 6, 64, -1)
>>> time.strftime('%Y-%m-%d', dob_time)
'1950-03-05'

See the docs for the time module here:
http://docs.python.org/lib/module-time.html
Using that will probably result in code that's quite a bit easier to
read if you ever have to come back to it.

You also might want to investigate the csv module
(http://docs.python.org/lib/module-csv.html) for a bunch of tools
specifically tailored to working with files full of comma separated
values like your input files.
 
B

Bruno Desthuilliers

James a écrit :
(snip)



Thank you very much for your kind offer.
I'm also coming from Perl myself - heard many good things about Python
so I'm trying it out - but it seems harder than I thought :(

If I may comment, Python is not Perl, and trying to solve things the
Perl way, while still possible, may not be the best idea (I don't mean
Perl is a bad idea in itself - just that it's another language with
another way to do things).

Here, doing the parsing oneself - either manually as james did or with
regexps - is certainly not as easy as with Perl, and IMHO not the
simplest way to go, when the csv module can take care of parsing and
formatting CSV files and the datetime module of parsing and formatting
dates.

Just my 2 cents...
 
D

Dennis Lee Bieber

I have two issues:
1. the outfile doesn't complete with no error message. when I check
the last line in the python interpreter, it has read and processed the
last line, but the output file stopped before.

Well, unlike some languages, you MUST include () on function calls,
otherwise all you get is the reference to the function itself.
2. Is this the best way to do this in Python?

Not really... Investigate the csv reader/writer module for the file
records, and either the time or datetime modules for date parsing and
formatting operations.
3. (Out of scope) is there a way to load this CSV file directly into
MySQL data field without converting the format?
Well, mysqlimport and "LOAD DATA" both can be configured for CSV
(they default to tab separated). But while MySQL does have a
date_format() function for output that can produce alpha-months, they
don't have one for input.

So... If one were to be brutal, one could import the file into a
temporary table, defining the date fields as strings. One could
thereafter formulate a "select .... into ..." which uses some nasty
substringing on the date field to create an acceptable date string
format...

concat(right(strdate, 4), "-", find_in_set(substr(strdate, 3, 3),
"JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"), "-", left(strdate,
2))

Of course, you'll need that for /each/ date field in the select <G>
It also presumes 0-filled day (for the left() invocation -- though I
suspect one could use a locate() on the "/" to determine where the
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
S

Shawn Milo

To the list:

I have come up with something that's working fine. However, I'm fairly
new to Python, so I'd really appreciate any suggestions on how this
can be made more Pythonic.

Thanks,
Shawn






Okay, here's what I have come up with:


#! /usr/bin/python

import sys
import re

month ={'JAN':1,'FEB':2,'MAR':3,'APR':4,'MAY':5,'JUN':6,'JUL':7,'AUG':8,'SEP':9,'OCT':10,'NOV':11,'DEC':12}
infile=file('TVA-0316','r')
outfile=file('tmp.out','w')

def formatDatePart(x):
"take a number and transform it into a two-character string,
zero padded"
x = str(x)
while len(x) < 2:
x = "0" + x
return x

regex = re.compile(r",\d{2}/[A-Z]{3}/\d{4},")

for line in infile:
matches = regex.findall(line)
for someDate in matches:

dayNum = formatDatePart(someDate[1:3])
monthNum = formatDatePart(month[someDate[4:7]])
yearNum = formatDatePart(someDate[8:12])

newDate = ",%s-%s-%s," % (yearNum,monthNum,dayNum)
line = line.replace(someDate, newDate)

outfile.writelines(line)

infile.close
outfile.close
 
G

Gabriel Genellina

I have come up with something that's working fine. However, I'm fairly
new to Python, so I'd really appreciate any suggestions on how this
can be made more Pythonic.

A few comments:

You don't need the formatDatePart function; delete it, and replace
newDate = ",%s-%s-%s," % (yearNum,monthNum,dayNum)
with
newDate = ",%04.4d-%02.2d-%02.2d," % (yearNum,monthNum,dayNum)

and before:
dayNum, monthNum, yearNum = [int(num) for num in
someDate[1:-1].split('/')]

And this: outfile.writelines(line)
should be: outfile.write(line)
(writelines works almost by accident here).

You forget again to use () to call the close methods:
infile.close()
outfile.close()

I don't like the final replace, but for a script like this I think
it's OK.
 
S

Shawn Milo

I have come up with something that's working fine. However, I'm fairly
new to Python, so I'd really appreciate any suggestions on how this
can be made more Pythonic.

A few comments:

You don't need the formatDatePart function; delete it, and replace
newDate = ",%s-%s-%s," % (yearNum,monthNum,dayNum)
with
newDate = ",%04.4d-%02.2d-%02.2d," % (yearNum,monthNum,dayNum)

and before:
dayNum, monthNum, yearNum = [int(num) for num in
someDate[1:-1].split('/')]

And this: outfile.writelines(line)
should be: outfile.write(line)
(writelines works almost by accident here).

You forget again to use () to call the close methods:
infile.close()
outfile.close()

I don't like the final replace, but for a script like this I think
it's OK.


Gabriel,

Thanks for the comments! The new version is below. I thought it made a
little more sense to format the newDate = ... line the way I have it
below, although I did incorporate your suggestions. Also, the
formatting options you provided seemed to specify not only string
padding, but also decimal places, so I changed it. Please let me know
if there is some other meaning behind the way you did it.

As for not liking the replace line, what would you suggest instead?

Shawn

#! /usr/bin/python

import sys
import re

month ={'JAN':1,'FEB':2,'MAR':3,'APR':4,'MAY':5,'JUN':6,'JUL':7,'AUG':8,'SEP':9,'OCT':10,'NOV':11,'DEC':12}
infile=file('TVA-0316','r')
outfile=file('tmp.out','w')

regex = re.compile(r",\d{2}/[A-Z]{3}/\d{4},")

for line in infile:
matches = regex.findall(line)
for someDate in matches:

dayNum = someDate[1:3]
monthNum = month[someDate[4:7]]
yearNum = someDate[8:12]

newDate = ",%04d-%02d-%02d," %
(int(yearNum),int(monthNum),int(dayNum))
line = line.replace(someDate, newDate)

outfile.write(line)

infile.close()
outfile.close()
 
J

Jussi Salmela

Shawn Milo kirjoitti:
To the list:

I have come up with something that's working fine. However, I'm fairly
new to Python, so I'd really appreciate any suggestions on how this
can be made more Pythonic.

Thanks,
Shawn






Okay, here's what I have come up with:

What follows may feel harsh but you asked for it ;)
#! /usr/bin/python

import sys
import re

month
={'JAN':1,'FEB':2,'MAR':3,'APR':4,'MAY':5,'JUN':6,'JUL':7,'AUG':8,'SEP':9,'OCT':10,'NOV':11,'DEC':12}

infile=file('TVA-0316','r')
outfile=file('tmp.out','w')

def formatDatePart(x):
"take a number and transform it into a two-character string,
zero padded"
If a comment or doc string is misleading one would be better off without
it entirely:
"take a number": the function can in fact take (at least)
any base type
"transform it": the function doesn't transform x to anything
although the name of the variable x is the same
as the argument x
"two-character string": to a string of at least 2 chars
"zero padded": where left/right???
x = str(x)
while len(x) < 2:
x = "0" + x
You don't need loops for these kind of things. One possibility is to
replace the whole body with:
return str(x).zfill(2)
return x

regex = re.compile(r",\d{2}/[A-Z]{3}/\d{4},")

for line in infile:
matches = regex.findall(line)
for someDate in matches:
Empty lines are supposed to make code more readable. The above empty
line does the contrary by separating the block controlled by the for
and the for statement
dayNum = formatDatePart(someDate[1:3])
monthNum = formatDatePart(month[someDate[4:7]])
yearNum = formatDatePart(someDate[8:12])
You don't need the formatDatePart function at all:
newDate = ",%4s-%02d-%2s," % \
(someDate[8:12],month[someDate[4:7]],someDate[1:3])
newDate = ",%s-%s-%s," % (yearNum,monthNum,dayNum)
line = line.replace(someDate, newDate)

outfile.writelines(line)

infile.close
outfile.close
You have not read the answers given to the OP, have you. Because if you
had, your code would be:
infile.close()
outfile.close()
The reason your version seems to be working, is that you probably
execute your code from the command-line and exiting from Python to
command-line closes the files, even if you don't.

Cheers,
Jussi
 
S

Shawn Milo

Shawn Milo kirjoitti:
To the list:

I have come up with something that's working fine. However, I'm fairly
new to Python, so I'd really appreciate any suggestions on how this
can be made more Pythonic.

Thanks,
Shawn






Okay, here's what I have come up with:

What follows may feel harsh but you asked for it ;)
#! /usr/bin/python

import sys
import re

month
={'JAN':1,'FEB':2,'MAR':3,'APR':4,'MAY':5,'JUN':6,'JUL':7,'AUG':8,'SEP':9,'OCT':10,'NOV':11,'DEC':12}

infile=file('TVA-0316','r')
outfile=file('tmp.out','w')

def formatDatePart(x):
"take a number and transform it into a two-character string,
zero padded"
If a comment or doc string is misleading one would be better off without
it entirely:
"take a number": the function can in fact take (at least)
any base type
"transform it": the function doesn't transform x to anything
although the name of the variable x is the same
as the argument x
"two-character string": to a string of at least 2 chars
"zero padded": where left/right???
x = str(x)
while len(x) < 2:
x = "0" + x
You don't need loops for these kind of things. One possibility is to
replace the whole body with:
return str(x).zfill(2)
return x

regex = re.compile(r",\d{2}/[A-Z]{3}/\d{4},")

for line in infile:
matches = regex.findall(line)
for someDate in matches:
Empty lines are supposed to make code more readable. The above empty
line does the contrary by separating the block controlled by the for
and the for statement
dayNum = formatDatePart(someDate[1:3])
monthNum = formatDatePart(month[someDate[4:7]])
yearNum = formatDatePart(someDate[8:12])
You don't need the formatDatePart function at all:
newDate = ",%4s-%02d-%2s," % \
(someDate[8:12],month[someDate[4:7]],someDate[1:3])
newDate = ",%s-%s-%s," % (yearNum,monthNum,dayNum)
line = line.replace(someDate, newDate)

outfile.writelines(line)

infile.close
outfile.close
You have not read the answers given to the OP, have you. Because if you
had, your code would be:
infile.close()
outfile.close()
The reason your version seems to be working, is that you probably
execute your code from the command-line and exiting from Python to
command-line closes the files, even if you don't.

Cheers,
Jussi


Jussi,

Thanks for the feedback. I received similar comments on a couple of
those items, and posted a newer version an hour or two ago. I think
the only thing missing there is a friendly blank line after my "for
line in infile:" statement.

Please let me know if there is anything else.

Shawn
 
B

Bruno Desthuilliers

Shawn Milo a écrit :
To the list:

I have come up with something that's working fine. However, I'm fairly
new to Python, so I'd really appreciate any suggestions on how this
can be made more Pythonic.

Thanks,
Shawn






Okay, here's what I have come up with:


#! /usr/bin/python

import sys
import re

month
={'JAN':1,'FEB':2,'MAR':3,'APR':4,'MAY':5,'JUN':6,'JUL':7,'AUG':8,'SEP':9,'OCT':10,'NOV':11,'DEC':12}

infile=file('TVA-0316','r')
outfile=file('tmp.out','w')

def formatDatePart(x):
"take a number and transform it into a two-character string,
zero padded"
x = str(x)
while len(x) < 2:
x = "0" + x
return x

x = "%02d" % x

regex = re.compile(r",\d{2}/[A-Z]{3}/\d{4},")

regexps are not really pythonic - we tend to use them only when we have
no better option. When it comes to parsing CSV files and/or dates, we do
have better solution : the csv module and the datetime module....
for line in infile:
matches = regex.findall(line)
for someDate in matches:

dayNum = formatDatePart(someDate[1:3])
monthNum = formatDatePart(month[someDate[4:7]])
yearNum = formatDatePart(someDate[8:12])

newDate = ",%s-%s-%s," % (yearNum,monthNum,dayNum)
line = line.replace(someDate, newDate)


outfile.writelines(line)

infile.close
outfile.close

I wonder why some of us took time to answer your first question. You
obviously forgot to read these answers.
 
J

James

Shawn Milo a écrit :


To the list:
I have come up with something that's working fine. However, I'm fairly
new to Python, so I'd really appreciate any suggestions on how this
can be made more Pythonic.

Okay, here's what I have come up with:
#! /usr/bin/python
import sys
import re


def formatDatePart(x):
"take a number and transform it into a two-character string,
zero padded"
x = str(x)
while len(x) < 2:
x = "0" + x
return x

x = "%02d" % x
regex = re.compile(r",\d{2}/[A-Z]{3}/\d{4},")

regexps are not really pythonic - we tend to use them only when we have
no better option. When it comes to parsing CSV files and/or dates, we do
have better solution : the csv module and the datetime module....
for line in infile:
matches = regex.findall(line)
for someDate in matches:
dayNum = formatDatePart(someDate[1:3])
monthNum = formatDatePart(month[someDate[4:7]])
yearNum = formatDatePart(someDate[8:12])
newDate = ",%s-%s-%s," % (yearNum,monthNum,dayNum)
line = line.replace(someDate, newDate)
outfile.writelines(line)
infile.close
outfile.close

I wonder why some of us took time to answer your first question. You
obviously forgot to read these answers.

No offense - but the fact that 're' module is available, doesn't that
mean we can use it? (Pythonic or not - not sure what is really
pythonic at this stage of learning...)
Like Perl, I'm sure there are more than one way to solve problems in
Python.

I appreciate everyone's feedback - I definitely got more than
expected, but it feels comforting that people do care about writing
better codes! :)
 
G

Gabriel Genellina

I have come up with something that's working fine. However, I'm fairly
new to Python, so I'd really appreciate any suggestions on how this
can be made more Pythonic.

A few comments:

You don't need the formatDatePart function; delete it, and replace
newDate = ",%s-%s-%s," % (yearNum,monthNum,dayNum)
with
newDate = ",%04.4d-%02.2d-%02.2d," % (yearNum,monthNum,dayNum)

and before:
dayNum, monthNum, yearNum = [int(num) for num in
someDate[1:-1].split('/')]

And this: outfile.writelines(line)
should be: outfile.write(line)
(writelines works almost by accident here).

You forget again to use () to call the close methods:
infile.close()
outfile.close()

I don't like the final replace, but for a script like this I think
it's OK.


Gabriel,

Thanks for the comments! The new version is below. I thought it made a
little more sense to format the newDate = ... line the way I have it
below, although I did incorporate your suggestions.

Looks pretty good for me!
Just one little thing I would change, the variables monthNum, dayNum etc.;
the suffix might indicate that they're numbers, but they're strings
instead. So I would move the int(...) a few lines above, where the
variables are defined.
But that's just a cosmetic thing and just a matter of taste.
Also, the
formatting options you provided seemed to specify not only string
padding, but also decimal places, so I changed it. Please let me know
if there is some other meaning behind the way you did it.

No, it has no meaning, at least for this range of values.
As for not liking the replace line, what would you suggest instead?

You already have scanned the line to find the matching fragment; the match
object knows exactly where it begins and ends; so one could replace it
with the reformatted value without searching again, wich takes some more
time, at least in principle.
But this makes the code a bit more complex, and it would only make sense
if you were to process millions of lines, and even then, the execution
might be I/O-bound so you would gain nothing at the end.
That's why I think it's OK as it is now.
 
D

Dennis Lee Bieber

I appreciate everyone's feedback - I definitely got more than
expected, but it feels comforting that people do care about writing
better codes! :)

For historical completeness, I'm including this here -- the original
went via 1:1 email (as I don't have the ability to post to the group
from work) -- so take the phrasing in that view...

-=-=-=-=-=-=-
Consider the following program (attached so Outlook doesn't mess
up the spacing or capitalization), along with the input data file (based
upon your original post) and the output obtained from running this
program.

<<dateconv.py>>
<<dbsample.csv>> <<dbconverted.csv>>
I think you'll find that this short bit of code does what you
want, quite easily, and without lots of bit-twiddling or cryptic
variables. The CSV module handles all the splitting and joining,
strptime parses the DD-MMM-YYYY format date, and passes the parsed
values to strftime which formats it into YYYY-MM-DD.
-=-=-=-=-=-=-=- dateconv.py
import csv
import time

DATA = "dbsample.csv"
CONV = "dbconverted.csv"

DATES = [6, 8, 10, 14] #fields with dates to process

fin = open(DATA, "rb")
fout = open(CONV, "wb")

rdr = csv.reader(fin)
wtr = csv.writer(fout)

for record in rdr:
for df in DATES:
record[df] = time.strftime(
"%Y-%m-%d",
time.strptime(record[df],
"%d/%b/%Y") )
wtr.writerow(record)

fin.close()
fout.close()
-=-=-=-=-=-=-=-=- dbsample.csv
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/NOV/2006,V1,,,21/NOV/2006,AST,19,U/L,5,40,,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/NOV/2006,V1,,,21/NOV/2006,GGT,34,U/L,11,32,h,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/NOV/2006,V1,,,21/NOV/2006,ALT,31,U/L,5,29,h,
06-0588,03,701,03701,0000046613,JJB,05/MAR/1950,M,20/NOV/2006,08:50,21/NOV/2006,V1,,,21/NOV/2006,ALKP,61,U/L,40,135,,
-=-=-=-=-=-=-=-=- dbconverted.csv
06-0588,03,701,03701,0000046613,JJB,1950-03-05,M,2006-11-20,08:50,2006-11-21,V1,,,2006-11-21,AST,19,U/L,5,40,,
06-0588,03,701,03701,0000046613,JJB,1950-03-05,M,2006-11-20,08:50,2006-11-21,V1,,,2006-11-21,GGT,34,U/L,11,32,h,
06-0588,03,701,03701,0000046613,JJB,1950-03-05,M,2006-11-20,08:50,2006-11-21,V1,,,2006-11-21,ALT,31,U/L,5,29,h,
06-0588,03,701,03701,0000046613,JJB,1950-03-05,M,2006-11-20,08:50,2006-11-21,V1,,,2006-11-21,ALKP,61,U/L,40,135,,

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,060
Latest member
BuyKetozenseACV

Latest Threads

Top