input record seperator (equivalent of "$|" of perl)

L

les_ander

Hi,
I know that i can do readline() from a file object.
However, how can I read till a specific seperator?
for exmple,
if my files are

name
profession
id
#
name2
profession3
id2

I would like to read this file as a record.
I can do this in perl by defining a record seperator;
is there an equivalent in python?
thanks
 
F

Fredrik Lundh

I know that i can do readline() from a file object.
However, how can I read till a specific seperator?

for exmple,
if my files are

name
profession
id
#
name2
profession3
id2

I would like to read this file as a record.
I can do this in perl by defining a record seperator;
is there an equivalent in python?

not really; you have to do it manually.

if the file isn't too large, consider reading all of it, and splitting on the
separator:

for record in file.read().split(separator):
print record # process record

if you're using a line-oriented separator, like in your example, tools like
itertools.groupby can be quite handy:

from itertools import groupby

def is_separator(line):
return line[:1] == "#"

for sep, record in groupby(file, is_separator):
if not sep:
print list(record) # process record

or you could just spell things out:

record = []
for line in file:
if line[0] == "#":
if record:
print record # process record
record = []
else:
record.append(line)
if record:
print record # process the last record, if any

</F>
 
K

Keith Dart

Hi,
I know that i can do readline() from a file object.
However, how can I read till a specific seperator?
for exmple,
if my files are

name
profession
id
#
name2
profession3
id2

I would like to read this file as a record.
I can do this in perl by defining a record seperator;
is there an equivalent in python?
thanks

I don't think so. But in the pyNMS package
(http://sourceforge/net/projects/pynms) there is a module called
"expect", and a class "Expect". With that you can wrap a file object and
use the Expect.read_until() method to do what you want.
 
D

Doug Holton

Hi,
I know that i can do readline() from a file object.
However, how can I read till a specific seperator?
for exmple,
if my files are

name
profession
id
#
name2
profession3
id2

I would like to read this file as a record.
I can do this in perl by defining a record seperator;
is there an equivalent in python?
thanks

To actually answer your question, there is no equivalent to $| in python.

You need to hand code your own record parser, or else read in the whole
contents of the file and use the string split method to chop it up into
fields.
 
M

M.E.Farmer

What about a generator and xreadlines for those really large files:

py>def recordbreaker(recordpath, seperator='#'):
.... rec = open(recordpath ,'r')
.... xrecord = rec.xreadlines()
.... a =[]
.... for line in xrecord:
.... sep = line.find(seperator)
.... if sep != -1:
.... a.append(line[:sep])
.... out = ''.join(a)
.... a =[]
.... a.append(line[sep+1:])
.... yield out
.... else:
.... a.append(line)
.... if a:
.... yield ''.join(a)
.... rec.close()
....
py>records = recordbreaker('/tmp/myrecords.txt')
py>for item in records:
.... print item

M.E.Farmer
 
F

Fredrik Lundh

M.E.Farmer said:
What about a generator and xreadlines for those really large files:

when you loop over a file object, Python uses a generator and a xreadlines-
style buffering system to read data as you go. (if you check the on-line help,
you'll notice that xreadlines itself is only provided for compatibility reasons).

or in other words, the examples I posted a couple of hours ago uses no more
memory than your version.

</F>
 
M

M.E.Farmer

Fredrik,
Thanks didn't realize that about reading a file on a for loop. Slick!
By the way the code I posted was an attempt at not building a
monolithic memory eating list like you did to return the values in your
second example.
Kinda thought it would be nice to read them as needed instead of all at
once.
I dont have itertools yet. That module looks like it rocks.
thanks for the pointers,
M.E.Farmer
 
S

Scott David Daniels

M.E.Farmer said:
I dont have itertools yet. That module looks like it rocks.
thanks for the pointers,
M.E.Farmer

If you have python 2.3 or 2.4, you have itertools.


--Scott David Daniels
(e-mail address removed)
 
M

M.E.Farmer

Yea I should have mentioned I am running python 2.2.2.
Can it be ported to python 2.2.2?
Till they get python 2.4 all up and running....I'll wait a bit.
Thanks for the info,
M.E.Farmer
 
N

Nick Coghlan

I would like to read this file as a record.
I can do this in perl by defining a record seperator;
is there an equivalent in python?

Depending on your exact use case, you may also get some mileage out of using the
csv module with a custom delimeter.

Py> from csv import reader
Py> parsed = reader(demo, delimiter='|')
Py> for line in parsed: print line
....
['a', 'b', 'c', 'd']
['1', '2', '3', '4']

Cheers,
Nick.

P.S. 'demo' was created via:
Py> from tempfile import TemporaryFile
Py> demo = TemporaryFile()
Py> demo.write(txt)
Py> demo.seek(0)
Py> demo.read()
'a|b|c|d\n1|2|3|4'
Py> demo.seek(0)
 
?

=?ISO-8859-1?Q?G=E1bor_Farkas?=

Scott said:
If you have python 2.3 or 2.4, you have itertools.
for me it seems that 2.3 does not have itertools.groupby.
it has itertools, but not itertools.groupby.

activepython-2.4:(win)['__doc__', '__name__', 'chain', 'count', 'cycle', 'dropwhile',
'groupby', 'ifilter', 'ifilterfalse', 'imap', 'islice', 'izip',
'repeat', 'starmap', 'takewhile', 'tee']


python-2.3:(linux)['__doc__', '__file__', '__name__', 'chain', 'count', 'cycle',
'dropwhile', 'ifilter', 'ifilterfalse', 'imap', 'islice', 'izip',
'repeat', 'starmap', 'takewhile']


gabor
 
S

Scott David Daniels

Gábor Farkas said:
for me it seems that 2.3 does not have itertools.groupby.
it has itertools, but not itertools.groupby.

True. The 2.4 document says that itertools.groupby() is equivalent to:

class groupby(object):
def __init__(self, iterable, key=None):
if key is None:
key = lambda x: x
self.keyfunc = key
self.it = iter(iterable)
self.tgtkey = self.currkey = self.currvalue = xrange(0)
def __iter__(self):
return self
def next(self):
while self.currkey == self.tgtkey:
self.currvalue = self.it.next() # Exit on StopIteration
self.currkey = self.keyfunc(self.currvalue)
self.tgtkey = self.currkey
return (self.currkey, self._grouper(self.tgtkey))
def _grouper(self, tgtkey):
while self.currkey == tgtkey:
yield self.currvalue
self.currvalue = self.it.next() # Exit on StopIteration
self.currkey = self.keyfunc(self.currvalue)

So you could always just use that code.

--Scott David Daniels
(e-mail address removed)
 
T

Terry Reedy

'separate' (se-parate == take a-part) and its derivatives are perhaps the
most frequently misspelled English word on clp. Seems to be 'par' for the
course. It has 2 e's bracketing 2 a's. It derives from the Latin
'parare', as does pare, so 'par' is the essential root of the word.

My gripe for the day, just to let non-native writers know what not to
imitate.

tjr
 
F

Fredrik Lundh

Scott said:
True. The 2.4 document says that itertools.groupby() is equivalent to:

class groupby(object):
So you could always just use that code.

the right way to do that is to use the Python version as a fallback:

try:
from itertools import groupby
except ImportError:
class groupby(object):
...

</F>
 
P

Peter Otten

Terry said:
'separate' (se-parate == take a-part) and its derivatives are perhaps the
most frequently misspelled English word on clp. Seems to be 'par' for the
course. It has 2 e's bracketing 2 a's. It derives from the Latin
'parare', as does pare, so 'par' is the essential root of the word.

My gripe for the day, just to let non-native writers know what not to
imitate.

I hereby suggest seperate/(separate+seperate) as the hamburger standard (see
http://www.oanda.com/products/bigmac/bigmac.shtml) for technical
communities. Some data points, adjectives only, s.e.e.o.:

the web: 4%
python: 9%
slashdot: 26%
perl: 29% *

Now draw your conclusions...

Peter

(*) Do you really believe I would have posted that if the outcome were in
favour of perl? No, definately not... lest you run out of gripes...
 
F

Fredrik Lundh

Terry said:
My gripe for the day, just to let non-native writers know what not to imitate.

are there any non-native languages where separate are spelled seperate?

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top