paseline(my favorite simple script): does something similar exist?

RickMuller · Oct 12, 2006

One of my all-time favorite scripts is parseline, which is printed
below

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
for i in range(len(format)):
f = format
trans = xlat.get(f,'None')
if trans: result.append(trans(words))
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

This takes a line of text, splits it, and then applies simple
formatting characters to return different python types. For example,
given the line

H 0.000 0.000 0.000

I can call parseline(line,'sfff') and it will return the string 'H',
and three floats. If I wanted to omit the first, I could just call
parseline(line,'xfff'). If I only wanted the first 0.000, I could call
parseline(line,'xf'). Clearly I don't do all of my parsing this way,
but I find parseline useful in a surprising number of applications.

I'm posting this here because (1) I'm feeling smug at what a bright
little coder I am, and (2) (in a more realistic and humble frame of
mind) I realize that many many people have probably found solutions to
similar needs, and I'd imaging that many are better than the above. I
would love to hear how other people do similar things.

Rick

Paul Rubin · Oct 12, 2006

RickMuller said:
def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
for i in range(len(format)):
f = format
trans = xlat.get(f,'None')
if trans: result.append(trans(words))
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

Untested, but maybe more in current Pythonic style:

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
for f,w in zip(format, words):
trans = xlat[f]
if trans is not None:
result.append(trans(w))
return result

Differences:
- doesn't ignore improper format characters, raises exception instead
- always returns values in a list, including as an empty list if
there's no values
- uses iterator protocol and zip to avoid ugly index variable
and subscripts

Pierre Quentel · Oct 12, 2006

Hi Rick,

Nice little script indeed !

You probably mean

trans = xlat.get(f,None) instead of
trans = xlat.get(f,'None')

in the case where an invalid format character is supplied. The string
'None' evaluates to True, so that trans(words) raises an exception

A variant, with a list comprehension instead of the for loop :

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
result = [ xlat[f](w) for f,w in zip(format,words)
if xlat.get(f,None) ]
if not result: return None
if len(result) == 1: return result[0]
return result

Regards,
Pierre

Istvan Albert · Oct 12, 2006

RickMuller said:
One of my all-time favorite scripts is parseline, which is printed

here is another way to write that:

def parseline(line, format):
trans = {'x':lambda x:None,'s':str,'f':float,'d':int,'i':int}
return [ trans[f](w) for f,w in zip(format, line.split() ) ]
['A', 1, None, 3.0]

I.

Pierre Quentel · Oct 12, 2006

parseline( 'A 1 22 3 6', 'sdxf')

['A', 1, None, 3.0]

Yes, but in this case the OP expects to get ['A',1,3.0]

A shorter version :

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = [ xlat[f](w) for f,w in zip(format,line.split())
if xlat.get(f,None) ]
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

Pierre

Neil Cerutti · Oct 12, 2006

parseline( 'A 1 22 3 6', 'sdxf')

Click to expand...

['A', 1, None, 3.0]

Click to expand...

Yes, but in this case the OP expects to get ['A',1,3.0]

A shorter version :

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = [ xlat[f](w) for f,w in zip(format,line.split())
if xlat.get(f,None) ]
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

I don't like the name, since it actually seems to be parsing a
string.

skip · Oct 12, 2006

Rick> def parseline(line,format):
Rick> xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
Rick> result = []
Rick> words = line.split()
Rick> for i in range(len(format)):
Rick> f = format
Rick> trans = xlat.get(f,'None')
Rick> if trans: result.append(trans(words))
Rick> if len(result) == 0: return None
Rick> if len(result) == 1: return result[0]
Rick> return result

Note that your setting and testing of the trans variable is problematic. If
you're going to use xlat.get(), either spell None correctly or take the
default:

trans = xlat.get(f)
if trans:
result.append(trans(words))

As Paul indicated though, it would also be better to not to silently let
unrecognized format characters pass. I probably wouldn't let KeyError float
up to the caller though:

trans = xlat.get(f)
if trans:
result.append(trans(words))
else:
raise ValueError, "unrecognized format character %s" % f

Finally, you might consider doing the splitting outside of this function and
pass in a list. That way you could (for example) easily pass in a row of
values read by the csv module's reader class (untested):

def format(words, fmt):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
for i in range(len(fmt)):
f = fmt
trans = xlat.get(f)
if trans:
result.append(trans(words))
else:
raise ValueError, "unrecognized format character %s" % f
return result

Rick> I'm posting this here because (1) I'm feeling smug at what a
Rick> bright little coder I am, and (2) (in a more realistic and humble
Rick> frame of mind) I realize that many many people have probably found
Rick> solutions to similar needs, and I'd imaging that many are better
Rick> than the above. I would love to hear how other people do similar
Rick> things.

It seems quite clever to me.

Skip

RickMuller · Oct 12, 2006

Wow! 6 responses in just a few minutes. Thanks for all of the great
feedback!

Fredrik Lundh · Oct 12, 2006

RickMuller said:
I'm posting this here because (1) I'm feeling smug at what a bright
little coder I am

if you want to show off, and use a more pythonic interface, you can do
it with a lot fewer lines. here's one example:

def parseline(line, *types):
result = [c(x) for (x, c) in zip(line.split(), types) if c] or [None]
return len(result) != 1 and result or result[0]

text = "H 0.000 0.000 0.000"

print parseline(text, str, float, float, float)
print parseline(text, None, float, float, float)
print parseline(text, None, float)

etc. and since you know how many items you'll get back from the
function, you might as well go for the one-liner version, and do
the unpacking on the way out:

def parseline(line, *types):
return [c(x) for (x, c) in zip(line.split(), types) if c] or [None]

text = "H 0.000 0.000 0.000"

[tag, value] = parseline(text, str, float)
[value] = parseline(text, None, float)

</F>

Gerard Flanagan · Oct 12, 2006

RickMuller said:
One of my all-time favorite scripts is parseline, which is printed
below

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
for i in range(len(format)):
f = format
trans = xlat.get(f,'None')
if trans: result.append(trans(words))
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

This takes a line of text, splits it, and then applies simple
formatting characters to return different python types. For example,
given the line

H 0.000 0.000 0.000

I can call parseline(line,'sfff') and it will return the string 'H',
and three floats. If I wanted to omit the first, I could just call
parseline(line,'xfff'). If I only wanted the first 0.000, I could call
parseline(line,'xf'). [...]
I would love to hear how other people do similar things.

Rick

MAP = {'s':str,'f':float,'d':int,'i':int}

def parseline( line, format, separator=' '):
''' ['A', 1, 3.0]
'''
mapping = [ (i, MAP[f]) for (i,f) in enumerate(format) if f != 'x'
]
parts = line.split(separator)
return [f(parts) for (i,f) in mapping]

def parseline2( line, format):
''' ['A', 1, 3.0]
'''
return [f(line.split()) for (i,f) in [(i, MAP[f]) for (i,f) in
enumerate(format) if f != 'x']]

def parselines(lines, format, separator=' '):
'''

>>> lines = [ 'A 1 2 3 4', 'B 5 6 7 8', 'C 9 10 11 12']
>>> list(parselines(lines, 'sdxf'))

Click to expand...

Click to expand...

Click to expand...

[['A', 1, 3.0], ['B', 5, 7.0], ['C', 9, 11.0]]
'''
mapping = [ (i, MAP[f]) for (i,f) in enumerate(format) if f != 'x'
]
for line in lines:
parts = line.split(separator)
yield [f(parts) for (i,f) in mapping]

import doctest
doctest.testmod(verbose=True)

RickMuller · Oct 15, 2006

Amazing! There were lots of great suggestions to my original post, but
I this is my favorite.

Rick

Fredrik said:
RickMuller said:

I'm posting this here because (1) I'm feeling smug at what a bright
little coder I am

Click to expand...

if you want to show off, and use a more pythonic interface, you can do
it with a lot fewer lines. here's one example:

def parseline(line, *types):
result = [c(x) for (x, c) in zip(line.split(), types) if c] or [None]
return len(result) != 1 and result or result[0]

text = "H 0.000 0.000 0.000"

print parseline(text, str, float, float, float)
print parseline(text, None, float, float, float)
print parseline(text, None, float)

etc. and since you know how many items you'll get back from the
function, you might as well go for the one-liner version, and do
the unpacking on the way out:

def parseline(line, *types):
return [c(x) for (x, c) in zip(line.split(), types) if c] or [None]

text = "H 0.000 0.000 0.000"

[tag, value] = parseline(text, str, float)
[value] = parseline(text, None, float)

</F>

Python point location of intersect between two lines	0	Feb 28, 2018
Generator using item[n-1] + item[n] memory	0	Feb 14, 2014
Over my head with descriptors	5	Dec 14, 2006
Metaclass conflict TypeError exception: problem demonstration script	0	Feb 23, 2009
Personal archive tool, looking for suggestions on improving the code	5	Jul 27, 2010
No-syntax Web-programming-IDE (was: Does turtle graphics have the wrong associations?)	0	Nov 22, 2009
A data transformation framework. A presentation inviting commentary.	0	Aug 21, 2013
ANN: script to visualize python profiling data with kcachegrind	3	Sep 17, 2003

paseline(my favorite simple script): does something similar exist?

RickMuller

Paul Rubin

Pierre Quentel

Istvan Albert

Pierre Quentel

Neil Cerutti

skip

RickMuller

Fredrik Lundh

Gerard Flanagan

RickMuller

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads