Need help converting text to csv format

C

Chuck Connors

Hey guys. I'm working on a little program to help my wife catalog her/
our coupons. I found a good resource but need help formatting the
text data so that I can import it into a mysql database. Here's the
data format:

409220000003 Life Fitness Products $1 (12-13-08) (CVS)
546500181141 Oust Air Sanitizer, any B1G1F up to $3.49 (1-17-09) .35
each
518000159258 Pillsbury Crescent Dinner Rolls, any .25 (2-14-09)
518000550406 Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
Cinnamon Rolls, etc. .40 (2-14-09)

The first value (large number) is the UPC, the next element is the
coupon description, followed by a date in parenthesis. Those are the
only three elements I am concerned with. Can someone help me in
reformatting this:

409220000003 Life Fitness Products $1 (12-13-08) (CVS)
546500181141 Oust Air Sanitizer, any B1G1F up to $3.49 (1-17-09) .35
each
518000159258 Pillsbury Crescent Dinner Rolls, any .25 (2-14-09)
518000550406 Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
Cinnamon Rolls, etc. .40 (2-14-09)

into something like this:

"409220000003","Life Fitness Products $1","12-13-08"
"546500181141","Oust Air Sanitizer, any B1G1F up to $3.49","1-17-09"
"518000159258","Pillsbury Crescent Dinner Rolls, any .25","2-14-09"
"518000550406","Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
Cinnamon Rolls, etc. .40","2-14-09"

Any help, pseudo code, or whatever push in the right direction would
be most appreciated. I am a novice Python programmer but I do have a
good bit of PHP programming experience.

Thanks for your time....
 
G

George Sakkis

Any help, pseudo code, or whatever push in the right direction would
be most appreciated.  I am a novice Python programmer but I do have a
good bit of PHP programming experience.

I'm wondering if PHP experience precludes the ability to use a search
engine before asking for help...
 
C

Chuck Connors

I'm wondering if PHP experience precludes the ability to use a search
engine before asking for help...

Thanks for the push in the right direction, friend.
 
T

Tim Chase

409220000003 Life Fitness Products $1 (12-13-08) (CVS)
546500181141 Oust Air Sanitizer, any B1G1F up to $3.49 (1-17-09) .35
each
518000159258 Pillsbury Crescent Dinner Rolls, any .25 (2-14-09)
518000550406 Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
Cinnamon Rolls, etc. .40 (2-14-09)

into something like this:

"409220000003","Life Fitness Products $1","12-13-08"
"546500181141","Oust Air Sanitizer, any B1G1F up to $3.49","1-17-09"
"518000159258","Pillsbury Crescent Dinner Rolls, any .25","2-14-09"
"518000550406","Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
Cinnamon Rolls, etc. .40","2-14-09"

Any help, pseudo code, or whatever push in the right direction would
be most appreciated. I am a novice Python programmer but I do have a
good bit of PHP programming experience.

A regexp should be able to split this fairly neatly:

import re
r = re.compile(r"^(\d+)\s+(.*)\((\d{1,2}-\d{1,2}-\d{2,4})\).*")
out = file('out.csv', 'w')
for i, line in enumerate(file('in.txt')):
m = r.match(line)
if not m:
print "Line %i is malformed" % (i+1)
continue
out.write(','.join(
'"%s"' % item.strip().replace('"', '""')
for item in m.groups()
))
out.write('\n')
out.close()

-tkc
 
J

Joe Strout

The first value (large number) is the UPC, the next element is the
coupon description, followed by a date in parenthesis. Those are the
only three elements I am concerned with. Can someone help me in
reformatting this:

409220000003 Life Fitness Products $1 (12-13-08) (CVS)

Well, you could use regex to parse out those three parts... on the
other hand, str.partition might be a little easier if you're not
already used to regex. Try this:
'12-13-08'

So three lines of code gets you your three fields.
into something like this:

"409220000003","Life Fitness Products $1","12-13-08"

There are lots of ways to do that. Here's one:
>>> qfields = ['"' + fld.strip() + '"' for fld in (num,desc,date)]
>>> out = qfields.join(',')
>>> out
'"409220000003","Life Fitness Products $1 ","12-13-08"'

Cheers,
- Joe

P.S. Pay no attention to the grumpy graybeards who want to drive new
users away from Python. There are quite a few friendly and helpful
people here, too. :)
 
S

Steve Holden

George said:
I'm wondering if PHP experience precludes the ability to use a search
engine before asking for help...

I'm wondering why you bothered to write that. Next time, save yourself
the ten seconds and just skip to the next message. The world will be a
better place.

though-we-all-get-crabby-sometimes-ly y'rs - steve
 
C

Chuck Connors

Wow! What a change in direction from the previous post. Thank you
both for the help and the explanations. This will work great!
 
T

Tim Chase

qfields = ['"' + fld.strip() + '"' for fld in (num,desc,date)]
Just a quick note here to prevent the confusion of the OP...this
should be

','.join(qfields)

-tkc
 
T

Tim Golden

Tim said:
qfields = ['"' + fld.strip() + '"' for fld in (num,desc,date)]
out = qfields.join(',')

Just a quick note here to prevent the confusion of the OP...this should be

','.join(qfields)

To be honest, it's so easy to use the stdlib csv module
that I'd always recommend that, especially as it covers
all those annoying corner cases with embedded commas and
quotes and stuff. And it's tested to hell and back.


In general, for a list of field tuples:

<code>
import csv

fields = [
("Tim Golden", 40, "inf,o1"),
("Fred Smith", 25, "info2"),
("Joe O'Reilly", 55, 'inf,"blah",o3'), ## lots of messiness
]

ofile = open ("data.csv", "wb")
try:
writer = csv.writer (ofile)
writer.writerows ([[str (i) for i in f] for f in fields])
finally:
ofile.close ()

</code>
 
J

Joe Strout

Tim said:
qfields = ['"' + fld.strip() + '"' for fld in (num,desc,date)]
out = qfields.join(',')
Just a quick note here to prevent the confusion of the OP...this
should be
','.join(qfields)

Thanks Tim #1, for pointing out my error (my standard procedure with
join is to [1] do it backwards, [2] curse myself, and [3] do it
correctly, but it looks like I only copied step 1 into my email).
To be honest, it's so easy to use the stdlib csv module
that I'd always recommend that, especially as it covers
all those annoying corner cases with embedded commas and
quotes and stuff. And it's tested to hell and back.

And thanks Tim #2, for pointing that out. I had frankly forgotten
about it, but that's good advice.
finally:
ofile.close ()

A follow-up question here... is it really necessary to close things
like files in Python? I've been slumming it in the REALbasic
community for the last decade, where you generally don't worry about
such things, as any object that represents something "open" will
automatically "close" itself when it dies (and since a closed object
in those cases is useless, I'd rather not have it around after it's
closed anyway). Is the same true in Python, or do we need to
explicitly close things?

Thanks,
- Joe
 
T

Tim Chase

Tim said:
Tim said:
qfields = ['"' + fld.strip() + '"' for fld in (num,desc,date)]
out = qfields.join(',')
Just a quick note here to prevent the confusion of the OP...this should be

','.join(qfields)

To be honest, it's so easy to use the stdlib csv module
that I'd always recommend that, especially as it covers
all those annoying corner cases with embedded commas and
quotes and stuff. And it's tested to hell and back.

[cedes point to TimG]

yes, the CSV module has some wonderful stuff in it, and I
regularly use it for *reading* CSV files. But for writing them,
it's often just as fast (for my purposes) to simply code my 1st
post's quickie as it is to scrounge in the docs/docstrings to
remember how to let the CSV do the creation. But TimG is correct
that using csv is the _right_ way to do it...gotta leave SOME
work for the OP :)

too-many-tims-on-c.l.p'ly yers,

-tkc
 
T

Tim Golden

Joe said:
A follow-up question here... is it really necessary to close things like
files in Python? I've been slumming it in the REALbasic community for
the last decade, where you generally don't worry about such things, as
any object that represents something "open" will automatically "close"
itself when it dies (and since a closed object in those cases is
useless, I'd rather not have it around after it's closed anyway). Is the
same true in Python, or do we need to explicitly close things?


Implementation dependent. (Because it depends on what kind
of garbage collection or object finalisation happens). Like
most people, I imagine, in ad-hoc code I'll just do things
like:

<code
import csv

writer = csv.writer (open ("data.csv", "wb"))
writer.writerow (['blah', blah'])

</code>

If I exit the interpreter here, I'm pretty much safe.
But if I add os.startfile ("data.csv"), I'll likely
get nothing or a file lock.

I believe that in other implementations -- Jython,
for example -- you cannot rely on the file closing
itself.

In general in posting public code, especially to
newcomers, I make the effort to use a try-finally
or a with statement.

TJG
 
T

Tim Golden

Tim said:
yes, the CSV module has some wonderful stuff in it, and I regularly use
it for *reading* CSV files. But for writing them, it's often just as
fast (for my purposes) to simply code my 1st post's quickie as it is to
scrounge in the docs/docstrings to remember how to let the CSV do the
creation. But TimG is correct that using csv is the _right_ way to do
it...gotta leave SOME work for the OP :)

Well I was going to point out that it's not that hard to remember
the functions, when I realised that I needn't have str ()-ed all
those values since the module does that for you. As it happens I
usually find it easy enough to remember.

But I think it's one of those
modules where you tend to think: why bother? it's only a
",".join (str (i) for i in f) away. Until you hit embedded quotes
or irregular delimiters. And you realise you've spawned a
monster genexp. And *then* you reach for "import csv" :)

TJG
 
M

MRAB

Tim said:
Joe said:
A follow-up question here... is it really necessary to close things
like files in Python? I've been slumming it in the REALbasic
community for the last decade, where you generally don't worry about
such things, as any object that represents something "open" will
automatically "close" itself when it dies (and since a closed object
in those cases is useless, I'd rather not have it around after it's
closed anyway). Is the same true in Python, or do we need to
explicitly close things?


Implementation dependent. (Because it depends on what kind
of garbage collection or object finalisation happens). Like
most people, I imagine, in ad-hoc code I'll just do things like:

<code
import csv

writer = csv.writer (open ("data.csv", "wb"))
writer.writerow (['blah', blah'])

</code>

If I exit the interpreter here, I'm pretty much safe.
But if I add os.startfile ("data.csv"), I'll likely
get nothing or a file lock.

I believe that in other implementations -- Jython,
for example -- you cannot rely on the file closing
itself.
The file will be closed automatically when the file object is
garbage-collected.

CPython uses reference-counting, so the file object is garbage-collected
as soon as there are no references to it.

Jython (and IronPython?) are garbage-collected in the background, so the
file object is garbage-collected at some point (and you don't know when
that will be!) when there are no longer any references to it.
 
M

M.-A. Lemburg

Hey guys. I'm working on a little program to help my wife catalog her/
our coupons. I found a good resource but need help formatting the
text data so that I can import it into a mysql database. Here's the
data format:

409220000003 Life Fitness Products $1 (12-13-08) (CVS)
546500181141 Oust Air Sanitizer, any B1G1F up to $3.49 (1-17-09) .35
each
518000159258 Pillsbury Crescent Dinner Rolls, any .25 (2-14-09)
518000550406 Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
Cinnamon Rolls, etc. .40 (2-14-09)

The first value (large number) is the UPC, the next element is the
coupon description, followed by a date in parenthesis. Those are the
only three elements I am concerned with. Can someone help me in
reformatting this:

409220000003 Life Fitness Products $1 (12-13-08) (CVS)
546500181141 Oust Air Sanitizer, any B1G1F up to $3.49 (1-17-09) .35
each
518000159258 Pillsbury Crescent Dinner Rolls, any .25 (2-14-09)
518000550406 Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
Cinnamon Rolls, etc. .40 (2-14-09)

into something like this:

"409220000003","Life Fitness Products $1","12-13-08"
"546500181141","Oust Air Sanitizer, any B1G1F up to $3.49","1-17-09"
"518000159258","Pillsbury Crescent Dinner Rolls, any .25","2-14-09"
"518000550406","Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
Cinnamon Rolls, etc. .40","2-14-09"

Any help, pseudo code, or whatever push in the right direction would
be most appreciated. I am a novice Python programmer but I do have a
good bit of PHP programming experience.

Have a look at the csv module:

http://www.python.org/doc/2.5.2/lib/module-csv.html

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Nov 21 2008)________________________________________________________________________
2008-11-12: Released mxODBC.Connect 0.9.3 http://python.egenix.com/

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
 
J

Joe Strout

The file will be closed automatically when the file object is
garbage-collected.

CPython uses reference-counting, so the file object is garbage-
collected as soon as there are no references to it.

Jython (and IronPython?) are garbage-collected in the background, so
the file object is garbage-collected at some point (and you don't
know when that will be!) when there are no longer any references to
it.

Thanks all for this clarification. That is an important distinction.
(REALbasic uses reference-counting, so you can guarantee that the file
will be closed as soon as it loses its last reference, but I see that
we shouldn't count on that in the Python world.)

Best,
- Joe
 
G

George Sakkis

I'm wondering why you bothered to write that. Next time, save yourself
the ten seconds and just skip to the next message. The world will be a
better place.

RTFM and doing an elementary search is an even better way to making
the world a better place.

George
 
R

Richard Riley

George Sakkis said:
RTFM and doing an elementary search is an even better way to making
the world a better place.

George

So you will be replying in this tone to each and every question which
has an answer discoverable by Google and in depth knowledge of the "FM"
then I assume?
 
R

r0g

Chuck said:
Hey guys. I'm working on a little program to help my wife catalog her/
our coupons. I found a good resource but need help formatting the
text data so that I can import it into a mysql database. Here's the
data format:

409220000003 Life Fitness Products $1 (12-13-08) (CVS)
546500181141 Oust Air Sanitizer, any B1G1F up to $3.49 (1-17-09) .35
each
518000159258 Pillsbury Crescent Dinner Rolls, any .25 (2-14-09)
518000550406 Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
Cinnamon Rolls, etc. .40 (2-14-09)

The first value (large number) is the UPC, the next element is the
coupon description, followed by a date in parenthesis. Those are the
only three elements I am concerned with. Can someone help me in
reformatting this:

409220000003 Life Fitness Products $1 (12-13-08) (CVS)
546500181141 Oust Air Sanitizer, any B1G1F up to $3.49 (1-17-09) .35
each
518000159258 Pillsbury Crescent Dinner Rolls, any .25 (2-14-09)
518000550406 Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
Cinnamon Rolls, etc. .40 (2-14-09)

into something like this:

"409220000003","Life Fitness Products $1","12-13-08"
"546500181141","Oust Air Sanitizer, any B1G1F up to $3.49","1-17-09"
"518000159258","Pillsbury Crescent Dinner Rolls, any .25","2-14-09"
"518000550406","Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
Cinnamon Rolls, etc. .40","2-14-09"

Any help, pseudo code, or whatever push in the right direction would
be most appreciated. I am a novice Python programmer but I do have a
good bit of PHP programming experience.

Thanks for your time....


Hi Chuck,

Don't be put off, not everyone here is rude and what you are asking
isn't so trivial as to justify that flaming RTFM horsecrap. You are
moving from php to python - this is cause for celebration! Welcome :)

Anyway I can see two ways of attacking this, you could either write a
finite state machine or use regular expressions. My preference would be
regular expressions.

The following should do what you want although I haven't tested it much
and you might want to lookout for irregularities in your source text
like linebreaks etc...

import re
regex = re.compile("([0-9]{3,13}) ([^(]*) \(([^)]*)")

for each in regex.findall(source_data):
print '"'+each[0]+'","'+each[1]+'","'+each[2]+'"'


I don't know if you already know regular expressions from some other
language so I won't bother explaining this one other than to say you may
note Pythons regex syntax is generally more liberal than many other
languages implementations. If this is all new to you and you do need
this explaining just let me know and I will happily run you through it.


Also, you say your eventual target is a SQL database. If the database is
local (or reachable via a network) then you are probably going to be
better off using python to connect to it directly using the MySQLdb
module...

import MySQLdb

db = MySQLdb.connect(host="localhost",
user="uname",
passwd="pwd",
db="dbname")

cursor = db.cursor()
cursor.execute("INSERT YOU SQL QUERY HERE")
db.commit ()


Hope this helps,

Roger.
 
G

George Sakkis

So you will be replying in this tone to each and every question which
has an answer discoverable by Google and in depth knowledge of the "FM"
then I assume?

No, only to those which is blatantly obvious that they didn't put any
effort whatsoever to find an answer on their own; "python csv" has
1,420,000 hits in Google, the first one linking to the csv module
docs. Feel free to check out my 4-year posting history in c.l.py to
get a more accurate idea of the usefulness and tone of my posts.

George
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top