parse a string of parameters and values

B

bsneddon

I have a problem that I can come up with a brute force solution to
solve but it occurred to me that there may be an
"one-- and preferably only one --obvious way to do it".

I am going to read a text file that is an export from a control
system.
It has lines with information like

base=1 name="first one" color=blue

I would like to put this info into a dictionary for processing.
I have looked at optparse and getopt maybe they are the answer but
there could
be and very straight forward way to do this task.

Thanks for your help
 
S

Steven D'Aprano

I have a problem that I can come up with a brute force solution to solve
but it occurred to me that there may be an
"one-- and preferably only one --obvious way to do it".

I'm not sure that "brute force" is the right description here. Generally,
"brute force" is used for situations where you check every single
possible value rather than calculate the answer directly. One classical
example is guessing the password that goes with an account. The brute
force attack is to guess every imaginable password -- eventually you'll
find the matching one. A non-brute force attack is to say "I know the
password is a recent date", which reduces the space of possible passwords
from many trillions to mere millions.

So I'm not sure that brute force is an appropriate description for this
problem. One way or another you have to read every line in the file.
Whether you read them or you farm the job out to some pre-existing
library function, they still have to be read.

I am going to read a text file that is an export from a control system.
It has lines with information like

base=1 name="first one" color=blue

I would like to put this info into a dictionary for processing.

Have you looked at the ConfigParser module?

Assuming that ConfigParser isn't suitable, you can do this if each
key=value pair is on its own line:


d = {}
for line in open(filename, 'r'):
if not line.strip():
# skip blank lines
continue
key, value = line.split('=', 1)
d[key.strip()] = value.strip()


If you have multiple keys per line, you need a more sophisticated way of
splitting them. Something like this should work:

d = {}
for line in open(filename, 'r'):
if not line.strip():
continue
terms = line.split('=')
keys = terms[0::2] # every second item starting from the first
values = terms[1::2] # every second item starting from the second
for key, value in zip(keys, values):
d[key.strip()] = value.strip()
 
J

John Machin

Steven D'Aprano said:
On Sat, 12 Dec 2009 16:16:32 -0800, bsneddon wrote:
I am going to read a text file that is an export from a control system.
It has lines with information like

base=1 name="first one" color=blue

I would like to put this info into a dictionary for processing.

Have you looked at the ConfigParser module?

Assuming that ConfigParser isn't suitable, you can do this if each
key=value pair is on its own line:
[snip]
If you have multiple keys per line, you need a more sophisticated way of
splitting them. Something like this should work:

d = {}
for line in open(filename, 'r'):
if not line.strip():
continue
terms = line.split('=')
keys = terms[0::2] # every second item starting from the first
values = terms[1::2] # every second item starting from the second
for key, value in zip(keys, values):
d[key.strip()] = value.strip()

There appears to be a problem with the above snippet, or you have a strange
interpretation of "put this info into a dictionary":

| >>> line = 'a=1 b=2 c=3 d=4'
| >>> d = {}
| >>> terms = line.split('=')
| >>> print terms
| ['a', '1 b', '2 c', '3 d', '4']
| >>> keys = terms[0::2] # every second item starting from the first
| >>> values = terms[1::2] # every second item starting from the second
| >>> for key, value in zip(keys, values):
| ... d[key.strip()] = value.strip()
| ...
| >>> print d
| {'a': '1 b', '2 c': '3 d'}
| >>>

Perhaps you meant

terms = re.split(r'[= ]', line)

which is an improvement, but this fails on cosmetic spaces e.g. a = 1 b = 2 ...

Try terms = filter(None, re.split(r'[= ]', line))

Now we get to the really hard part: handling the name="first one" in the OP's
example. The splitting approach has run out of steam.

The OP will need to divulge what is the protocol for escaping the " character if
it is present in the input. If nobody knows of a packaged solution to his
particular scheme, then he'll need to use something like pyparsing.
 
S

Steven D'Aprano

Steven D'Aprano <steve <at> REMOVE-THIS-cybersource.com.au> writes: [snip]
If you have multiple keys per line, you need a more sophisticated way
of splitting them. Something like this should work:
[...]
There appears to be a problem with the above snippet, or you have a
strange interpretation of "put this info into a dictionary":


D'oh!

In my defence, I said it "should" work, not that it did work!
 
P

Peter Otten

bsneddon said:
I have a problem that I can come up with a brute force solution to
solve but it occurred to me that there may be an
"one-- and preferably only one --obvious way to do it".

I am going to read a text file that is an export from a control
system.
It has lines with information like

base=1 name="first one" color=blue

I would like to put this info into a dictionary for processing.
I have looked at optparse and getopt maybe they are the answer but
there could
be and very straight forward way to do this task.

Thanks for your help

Have a look at shlex:
import shlex
s = 'base=1 name="first one" color=blue equal="alpha=beta" empty'
dict(t.partition("=")[::2] for t in shlex.split(s))
{'color': 'blue', 'base': '1', 'name': 'first one', 'empty': '', 'equal':
'alpha=beta'}

Peter
 
B

bsneddon

bsneddon said:
I have a problem that I can come up with a brute force solution to
solve but it occurred to me that there may be an
 "one-- and preferably only one --obvious way to do it".
I am going to read a text file that is an export from a control
system.
It has lines with information like
base=1 name="first one" color=blue
I would like to put this info into a dictionary for processing.
I have looked at optparse and getopt maybe they are the answer but
there could
be and very straight forward way to do this task.
Thanks for your help

Have a look at shlex:
import shlex
s = 'base=1 name="first one" color=blue equal="alpha=beta" empty'
dict(t.partition("=")[::2] for t in shlex.split(s))

{'color': 'blue', 'base': '1', 'name': 'first one', 'empty': '', 'equal':
'alpha=beta'}

Peter

Thanks to all for your input.

It seems I miss stated the problem. Text is always quoted so blue
above -> "blue".

Peter,

The part I was missing was t.partition("=") and slicing skipping by
two.
It looks like a normal split will work for me to get the arguments I
need.
To my way of thinking your is very clean any maybe the "--obvious way
to do it"
Although it was not obvious to me until seeing your post.

Bill
 
G

Gabriel Genellina

bsneddon wrote:
I am going to read a text file that is an export from a control
system.
It has lines with information like

base=1 name="first one" color=blue

I would like to put this info into a dictionary for processing.
import shlex
s = 'base=1 name="first one" color=blue equal="alpha=beta" empty'
dict(t.partition("=")[::2] for t in shlex.split(s))
{'color': 'blue', 'base': '1', 'name': 'first one', 'empty': '', 'equal':
'alpha=beta'}

Brilliant!
 
T

Tim Chase

Gabriel said:
Peter Otten escribió:
bsneddon said:
I am going to read a text file that is an export from a control
system.
It has lines with information like

base=1 name="first one" color=blue

I would like to put this info into a dictionary for processing.
import shlex
s = 'base=1 name="first one" color=blue equal="alpha=beta" empty'
dict(t.partition("=")[::2] for t in shlex.split(s))
{'color': 'blue', 'base': '1', 'name': 'first one', 'empty': '', 'equal':
'alpha=beta'}

Brilliant!

The thing I appreciated about Peter's solution was learning a
purpose for .partition() as I've always just used .split(), so I
would have done something like
shlex.split(s))
{'color': 'blue', 'base': '1', 'name': 'first one', 'empty': '',
'equal': 'alpha=beta'}

Using .partition() makes that a lot cleaner. However, it looks
like .partition() was added in 2.5, so for my code stuck in 2.4
deployments, I'll stick with the uglier .split()

-tkc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top