Newbie needs help with regex strings

  • Thread starter Catalina Scott A Contr AFCA/EVEO
  • Start date
C

Catalina Scott A Contr AFCA/EVEO

I have a file with lines in the following format.

pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon'
Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'

I would like to pull out some of the values and write them to a csv
file.

For line in filea
pie = regex
quantity = regex
cooked = regex
ingredients = regex
fileb.write (quantity,pie,cooked,ingredients)

How can I retreive the values and assign them to a name?

Thank you
Scott
 
P

Paul McGuire

This isn't a regex solution, but uses pyparsing instead. Pyparsing
helps you construct recursive-descent parsers, and maintains a code
structure that is easy to compose, read, understand, maintain, and
remember what you did 6-months after you wrote it in the first place.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul


data = """pie=apple,quantity=1,cooked=yes,ingredients='sugar and
cinnamon'
Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and
sugar'"""

from pyparsing import CaselessLiteral, Literal, Word, alphas, nums,
oneOf, quotedString, \
Group, Dict, delimitedList, removeQuotes

# define basic elements for parsing
pieName = Word(alphas)
qty = Word(nums)
yesNo = oneOf("yes no",caseless=True)
EQUALS = Literal("=").suppress()

# define separate pie attributes
pieEntry = CaselessLiteral("pie") + EQUALS + pieName
qtyEntry = CaselessLiteral("quantity") + EQUALS + qty
cookedEntry = CaselessLiteral("cooked") + EQUALS + yesNo
ingredientsEntry = CaselessLiteral("ingredients") + EQUALS +
quotedString.setParseAction(removeQuotes)
priceEntry = CaselessLiteral("price") + EQUALS + qty

# define overall list of alternative attributes
pieAttribute = pieEntry | qtyEntry | cookedEntry | ingredientsEntry |
priceEntry

# define each line as a list of attributes (comma delimiter is the
default), grouping results by attribute
pieDataFormat = delimitedList( Group(pieAttribute) )

# parse each line in the input string, and create a dict of the results
for line in data.split("\n"):
pieData = pieDataFormat.parseString(line)
pieDict = dict( pieData.asList() )
print pieDict

''' prints out:
{'cooked': 'yes', 'ingredients': 'sugar and cinnamon', 'pie': 'apple',
'quantity': '1'}
{'ingredients': 'peaches,powdered sugar', 'pie': 'peach', 'quantity':
'2'}
{'cooked': 'no', 'price': '5', 'ingredients': 'cherries and sugar',
'pie': 'cherry', 'quantity': '3'}
'''
 
C

Christopher Subich

Paul said:
This isn't a regex solution, but uses pyparsing instead. Pyparsing
helps you construct recursive-descent parsers, and maintains a code
structure that is easy to compose, read, understand, maintain, and
remember what you did 6-months after you wrote it in the first place.

Download pyparsing at http://pyparsing.sourceforge.net.


For the example listed, pyparsing is even overkill; the OP should
probably use the csv module.
 
D

Dennis Benzinger

Catalina said:
I have a file with lines in the following format.

pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon'
Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'

I would like to pull out some of the values and write them to a csv
file.

For line in filea
pie = regex
quantity = regex
cooked = regex
ingredients = regex
fileb.write (quantity,pie,cooked,ingredients)

How can I retreive the values and assign them to a name?

Thank you
Scott

Try this:

import re
import StringIO

filea_string = """pie=apple,quantity=1,cooked=yes,ingredients='sugar and
cinnamon'
pie=peach,quantity=2,ingredients='peaches,powdered sugar'
pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'
"""

FIELDS = ("pie", "quantity", "cooked", "ingredients", "price")

field_regexes = {}

for field in FIELDS:
field_regexes[field] = re.compile("%s=([^,\n]*)" % field)

for line in StringIO.StringIO(filea_string):

field_values = {}

for field in FIELDS:
match_object = field_regexes[field].search(line)

if match_object is not None:
field_values[field] = match_object.group(1)

print field_values
#fileb.write (quantity,pie,cooked,ingredients)



Bye,
Dennis
 
D

Dennis Benzinger

Christopher said:
Paul McGuire wrote:

[...]
For the example listed, pyparsing is even overkill; the OP should
probably use the csv module.

But the OP wants to parse lines with key=value pairs, not simply lines
with comma separated values. Using the csv module will just separate the
key=value pairs and you would still have to take them apart.

Bye,
Dennis
 
M

Michael Spencer

Dennis said:
Christopher said:
Paul McGuire wrote:

[...]
For the example listed, pyparsing is even overkill; the OP should
probably use the csv module.

But the OP wants to parse lines with key=value pairs, not simply lines
with comma separated values. Using the csv module will just separate the
key=value pairs and you would still have to take them apart.

Bye,
Dennis
that, and csv.reader has another problem with this task:
>>> csv.reader(["Pie=peach,quantity=2,ingredients='peaches,powdered sugar'"],
quotechar = "'").next()
['Pie=peach', 'quantity=2', "ingredients='peaches", "powdered sugar'"]

i.e., it doesn't allow separators within fields unless either the *whole* field
is quoted:
>>> csv.reader(["Pie=peach,quantity=2,'ingredients=peaches,powdered sugar'"],
quotechar = "'").next()
['Pie=peach', 'quantity=2', 'ingredients=peaches,powdered sugar']
or the separator is escaped:
>>> csv.reader(["Pie=peach,quantity=2,ingredients='peaches\,powdered sugar'"],
quotechar = "'", escapechar = "\\").next()
['Pie=peach', 'quantity=2', "ingredients='peaches,powdered sugar'"]

Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top