paseline(my favorite simple script): does something similar exist?

R

RickMuller

One of my all-time favorite scripts is parseline, which is printed
below

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
for i in range(len(format)):
f = format
trans = xlat.get(f,'None')
if trans: result.append(trans(words))
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

This takes a line of text, splits it, and then applies simple
formatting characters to return different python types. For example,
given the line

H 0.000 0.000 0.000

I can call parseline(line,'sfff') and it will return the string 'H',
and three floats. If I wanted to omit the first, I could just call
parseline(line,'xfff'). If I only wanted the first 0.000, I could call
parseline(line,'xf'). Clearly I don't do all of my parsing this way,
but I find parseline useful in a surprising number of applications.

I'm posting this here because (1) I'm feeling smug at what a bright
little coder I am, and (2) (in a more realistic and humble frame of
mind) I realize that many many people have probably found solutions to
similar needs, and I'd imaging that many are better than the above. I
would love to hear how other people do similar things.

Rick
 
P

Paul Rubin

RickMuller said:
def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
for i in range(len(format)):
f = format
trans = xlat.get(f,'None')
if trans: result.append(trans(words))
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result


Untested, but maybe more in current Pythonic style:

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
for f,w in zip(format, words):
trans = xlat[f]
if trans is not None:
result.append(trans(w))
return result

Differences:
- doesn't ignore improper format characters, raises exception instead
- always returns values in a list, including as an empty list if
there's no values
- uses iterator protocol and zip to avoid ugly index variable
and subscripts
 
P

Pierre Quentel

Hi Rick,

Nice little script indeed !

You probably mean
trans = xlat.get(f,None) instead of
trans = xlat.get(f,'None')

in the case where an invalid format character is supplied. The string
'None' evaluates to True, so that trans(words) raises an exception

A variant, with a list comprehension instead of the for loop :

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
result = [ xlat[f](w) for f,w in zip(format,words)
if xlat.get(f,None) ]
if not result: return None
if len(result) == 1: return result[0]
return result

Regards,
Pierre
 
I

Istvan Albert

RickMuller said:
One of my all-time favorite scripts is parseline, which is printed

here is another way to write that:

def parseline(line, format):
trans = {'x':lambda x:None,'s':str,'f':float,'d':int,'i':int}
return [ trans[f](w) for f,w in zip(format, line.split() ) ]
['A', 1, None, 3.0]


I.
 
P

Pierre Quentel

parseline( 'A 1 22 3 6', 'sdxf')
['A', 1, None, 3.0]

Yes, but in this case the OP expects to get ['A',1,3.0]

A shorter version :

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = [ xlat[f](w) for f,w in zip(format,line.split())
if xlat.get(f,None) ]
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

Pierre
 
N

Neil Cerutti

parseline( 'A 1 22 3 6', 'sdxf')
['A', 1, None, 3.0]

Yes, but in this case the OP expects to get ['A',1,3.0]

A shorter version :

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = [ xlat[f](w) for f,w in zip(format,line.split())
if xlat.get(f,None) ]
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

I don't like the name, since it actually seems to be parsing a
string.
 
S

skip

Rick> def parseline(line,format):
Rick> xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
Rick> result = []
Rick> words = line.split()
Rick> for i in range(len(format)):
Rick> f = format
Rick> trans = xlat.get(f,'None')
Rick> if trans: result.append(trans(words))
Rick> if len(result) == 0: return None
Rick> if len(result) == 1: return result[0]
Rick> return result

Note that your setting and testing of the trans variable is problematic. If
you're going to use xlat.get(), either spell None correctly or take the
default:

trans = xlat.get(f)
if trans:
result.append(trans(words))

As Paul indicated though, it would also be better to not to silently let
unrecognized format characters pass. I probably wouldn't let KeyError float
up to the caller though:

trans = xlat.get(f)
if trans:
result.append(trans(words))
else:
raise ValueError, "unrecognized format character %s" % f

Finally, you might consider doing the splitting outside of this function and
pass in a list. That way you could (for example) easily pass in a row of
values read by the csv module's reader class (untested):

def format(words, fmt):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
for i in range(len(fmt)):
f = fmt
trans = xlat.get(f)
if trans:
result.append(trans(words))
else:
raise ValueError, "unrecognized format character %s" % f
return result

Rick> I'm posting this here because (1) I'm feeling smug at what a
Rick> bright little coder I am, and (2) (in a more realistic and humble
Rick> frame of mind) I realize that many many people have probably found
Rick> solutions to similar needs, and I'd imaging that many are better
Rick> than the above. I would love to hear how other people do similar
Rick> things.

It seems quite clever to me.

Skip
 
F

Fredrik Lundh

RickMuller said:
I'm posting this here because (1) I'm feeling smug at what a bright
little coder I am

if you want to show off, and use a more pythonic interface, you can do
it with a lot fewer lines. here's one example:

def parseline(line, *types):
result = [c(x) for (x, c) in zip(line.split(), types) if c] or [None]
return len(result) != 1 and result or result[0]

text = "H 0.000 0.000 0.000"

print parseline(text, str, float, float, float)
print parseline(text, None, float, float, float)
print parseline(text, None, float)

etc. and since you know how many items you'll get back from the
function, you might as well go for the one-liner version, and do
the unpacking on the way out:

def parseline(line, *types):
return [c(x) for (x, c) in zip(line.split(), types) if c] or [None]

text = "H 0.000 0.000 0.000"

[tag, value] = parseline(text, str, float)
[value] = parseline(text, None, float)

</F>
 
G

Gerard Flanagan

RickMuller said:
One of my all-time favorite scripts is parseline, which is printed
below

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
for i in range(len(format)):
f = format
trans = xlat.get(f,'None')
if trans: result.append(trans(words))
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

This takes a line of text, splits it, and then applies simple
formatting characters to return different python types. For example,
given the line

H 0.000 0.000 0.000

I can call parseline(line,'sfff') and it will return the string 'H',
and three floats. If I wanted to omit the first, I could just call
parseline(line,'xfff'). If I only wanted the first 0.000, I could call
parseline(line,'xf'). [...]
I would love to hear how other people do similar things.

Rick


MAP = {'s':str,'f':float,'d':int,'i':int}

def parseline( line, format, separator=' '):
''' ['A', 1, 3.0]
'''
mapping = [ (i, MAP[f]) for (i,f) in enumerate(format) if f != 'x'
]
parts = line.split(separator)
return [f(parts) for (i,f) in mapping]

def parseline2( line, format):
''' ['A', 1, 3.0]
'''
return [f(line.split()) for (i,f) in [(i, MAP[f]) for (i,f) in
enumerate(format) if f != 'x']]

def parselines(lines, format, separator=' '):
'''
>>> lines = [ 'A 1 2 3 4', 'B 5 6 7 8', 'C 9 10 11 12']
>>> list(parselines(lines, 'sdxf'))
[['A', 1, 3.0], ['B', 5, 7.0], ['C', 9, 11.0]]
'''
mapping = [ (i, MAP[f]) for (i,f) in enumerate(format) if f != 'x'
]
for line in lines:
parts = line.split(separator)
yield [f(parts) for (i,f) in mapping]


import doctest
doctest.testmod(verbose=True)
 
R

RickMuller

Amazing! There were lots of great suggestions to my original post, but
I this is my favorite.

Rick

Fredrik said:
RickMuller said:
I'm posting this here because (1) I'm feeling smug at what a bright
little coder I am

if you want to show off, and use a more pythonic interface, you can do
it with a lot fewer lines. here's one example:

def parseline(line, *types):
result = [c(x) for (x, c) in zip(line.split(), types) if c] or [None]
return len(result) != 1 and result or result[0]

text = "H 0.000 0.000 0.000"

print parseline(text, str, float, float, float)
print parseline(text, None, float, float, float)
print parseline(text, None, float)

etc. and since you know how many items you'll get back from the
function, you might as well go for the one-liner version, and do
the unpacking on the way out:

def parseline(line, *types):
return [c(x) for (x, c) in zip(line.split(), types) if c] or [None]

text = "H 0.000 0.000 0.000"

[tag, value] = parseline(text, str, float)
[value] = parseline(text, None, float)

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top