input record seperator (equivalent of "$|" of perl)

les_ander · Dec 19, 2004

Hi,
I know that i can do readline() from a file object.
However, how can I read till a specific seperator?
for exmple,
if my files are

name
profession
id
#
name2
profession3
id2

I would like to read this file as a record.
I can do this in perl by defining a record seperator;
is there an equivalent in python?
thanks

Fredrik Lundh · Dec 19, 2004

I know that i can do readline() from a file object.
However, how can I read till a specific seperator?

for exmple,
if my files are

name
profession
id
#
name2
profession3
id2

I would like to read this file as a record.
I can do this in perl by defining a record seperator;
is there an equivalent in python?

not really; you have to do it manually.

if the file isn't too large, consider reading all of it, and splitting on the
separator:

for record in file.read().split(separator):
print record # process record

if you're using a line-oriented separator, like in your example, tools like
itertools.groupby can be quite handy:

from itertools import groupby

def is_separator(line):
return line[:1] == "#"

for sep, record in groupby(file, is_separator):
if not sep:
print list(record) # process record

or you could just spell things out:

record = []
for line in file:
if line[0] == "#":
if record:
print record # process record
record = []
else:
record.append(line)
if record:
print record # process the last record, if any

</F>

Keith Dart · Dec 19, 2004

Hi,
I know that i can do readline() from a file object.
However, how can I read till a specific seperator?
for exmple,
if my files are

name
profession
id
#
name2
profession3
id2

I would like to read this file as a record.
I can do this in perl by defining a record seperator;
is there an equivalent in python?
thanks

I don't think so. But in the pyNMS package
(http://sourceforge/net/projects/pynms) there is a module called
"expect", and a class "Expect". With that you can wrap a file object and
use the Expect.read_until() method to do what you want.

Doug Holton · Dec 19, 2004

Hi,
I know that i can do readline() from a file object.
However, how can I read till a specific seperator?
for exmple,
if my files are

name
profession
id
#
name2
profession3
id2

I would like to read this file as a record.
I can do this in perl by defining a record seperator;
is there an equivalent in python?
thanks

To actually answer your question, there is no equivalent to $| in python.

You need to hand code your own record parser, or else read in the whole
contents of the file and use the string split method to chop it up into
fields.

M.E.Farmer · Dec 20, 2004

What about a generator and xreadlines for those really large files:

py>def recordbreaker(recordpath, seperator='#'):
.... rec = open(recordpath ,'r')
.... xrecord = rec.xreadlines()
.... a =[]
.... for line in xrecord:
.... sep = line.find(seperator)
.... if sep != -1:
.... a.append(line[:sep])
.... out = ''.join(a)
.... a =[]
.... a.append(line[sep+1:])
.... yield out
.... else:
.... a.append(line)
.... if a:
.... yield ''.join(a)
.... rec.close()
....
py>records = recordbreaker('/tmp/myrecords.txt')
py>for item in records:
.... print item

M.E.Farmer

Fredrik Lundh · Dec 20, 2004

M.E.Farmer said:
What about a generator and xreadlines for those really large files:

when you loop over a file object, Python uses a generator and a xreadlines-
style buffering system to read data as you go. (if you check the on-line help,
you'll notice that xreadlines itself is only provided for compatibility reasons).

or in other words, the examples I posted a couple of hours ago uses no more
memory than your version.

</F>

M.E.Farmer · Dec 20, 2004

Fredrik,
Thanks didn't realize that about reading a file on a for loop. Slick!
By the way the code I posted was an attempt at not building a
monolithic memory eating list like you did to return the values in your
second example.
Kinda thought it would be nice to read them as needed instead of all at
once.
I dont have itertools yet. That module looks like it rocks.
thanks for the pointers,
M.E.Farmer

Scott David Daniels · Dec 20, 2004

M.E.Farmer said:
I dont have itertools yet. That module looks like it rocks.
thanks for the pointers,
M.E.Farmer

If you have python 2.3 or 2.4, you have itertools.

--Scott David Daniels
(e-mail address removed)

M.E.Farmer · Dec 20, 2004

Yea I should have mentioned I am running python 2.2.2.
Can it be ported to python 2.2.2?
Till they get python 2.4 all up and running....I'll wait a bit.
Thanks for the info,
M.E.Farmer

Nick Coghlan · Dec 20, 2004

I would like to read this file as a record.
I can do this in perl by defining a record seperator;
is there an equivalent in python?

Depending on your exact use case, you may also get some mileage out of using the
csv module with a custom delimeter.

Py> from csv import reader
Py> parsed = reader(demo, delimiter='|')
Py> for line in parsed: print line
....
['a', 'b', 'c', 'd']
['1', '2', '3', '4']

Cheers,
Nick.

P.S. 'demo' was created via:
Py> from tempfile import TemporaryFile
Py> demo = TemporaryFile()
Py> demo.write(txt)
Py> demo.seek(0)
Py> demo.read()
'a|b|c|d\n1|2|3|4'
Py> demo.seek(0)

=?ISO-8859-1?Q?G=E1bor_Farkas?= · Dec 20, 2004

Scott said:
If you have python 2.3 or 2.4, you have itertools.

for me it seems that 2.3 does not have itertools.groupby.
it has itertools, but not itertools.groupby.

activepython-2.4

win)['__doc__', '__name__', 'chain', 'count', 'cycle', 'dropwhile',
'groupby', 'ifilter', 'ifilterfalse', 'imap', 'islice', 'izip',
'repeat', 'starmap', 'takewhile', 'tee']

python-2.3

linux)['__doc__', '__file__', '__name__', 'chain', 'count', 'cycle',
'dropwhile', 'ifilter', 'ifilterfalse', 'imap', 'islice', 'izip',
'repeat', 'starmap', 'takewhile']

gabor

Scott David Daniels · Dec 20, 2004

Gábor Farkas said:
for me it seems that 2.3 does not have itertools.groupby.
it has itertools, but not itertools.groupby.

True. The 2.4 document says that itertools.groupby() is equivalent to:

class groupby(object):
def __init__(self, iterable, key=None):
if key is None:
key = lambda x: x
self.keyfunc = key
self.it = iter(iterable)
self.tgtkey = self.currkey = self.currvalue = xrange(0)
def __iter__(self):
return self
def next(self):
while self.currkey == self.tgtkey:
self.currvalue = self.it.next() # Exit on StopIteration
self.currkey = self.keyfunc(self.currvalue)
self.tgtkey = self.currkey
return (self.currkey, self._grouper(self.tgtkey))
def _grouper(self, tgtkey):
while self.currkey == tgtkey:
yield self.currvalue
self.currvalue = self.it.next() # Exit on StopIteration
self.currkey = self.keyfunc(self.currvalue)

So you could always just use that code.

--Scott David Daniels
(e-mail address removed)

Terry Reedy · Dec 20, 2004

'separate' (se-parate == take a-part) and its derivatives are perhaps the
most frequently misspelled English word on clp. Seems to be 'par' for the
course. It has 2 e's bracketing 2 a's. It derives from the Latin
'parare', as does pare, so 'par' is the essential root of the word.

My gripe for the day, just to let non-native writers know what not to
imitate.

tjr

Fredrik Lundh · Dec 20, 2004

Scott said:
True. The 2.4 document says that itertools.groupby() is equivalent to:

class groupby(object):

So you could always just use that code.

the right way to do that is to use the Python version as a fallback:

try:
from itertools import groupby
except ImportError:
class groupby(object):
...

</F>

Peter Otten · Dec 20, 2004

Terry said:
'separate' (se-parate == take a-part) and its derivatives are perhaps the
most frequently misspelled English word on clp. Seems to be 'par' for the
course. It has 2 e's bracketing 2 a's. It derives from the Latin
'parare', as does pare, so 'par' is the essential root of the word.

My gripe for the day, just to let non-native writers know what not to
imitate.

I hereby suggest seperate/(separate+seperate) as the hamburger standard (see
http://www.oanda.com/products/bigmac/bigmac.shtml) for technical
communities. Some data points, adjectives only, s.e.e.o.:

the web: 4%
python: 9%
slashdot: 26%
perl: 29% *

Now draw your conclusions...

Peter

(*) Do you really believe I would have posted that if the outcome were in
favour of perl? No, definately not... lest you run out of gripes...

Reinhold Birkenfeld · Dec 21, 2004

Peter said:
I hereby suggest seperate/(separate+seperate) as the hamburger standard (see
http://www.oanda.com/products/bigmac/bigmac.shtml) for technical
communities. Some data points, adjectives only, s.e.e.o.:

the web: 4%
python: 9%
slashdot: 26%
perl: 29% *

How did you get these data points?

Reinhold

Fredrik Lundh · Dec 21, 2004

Terry said:
My gripe for the day, just to let non-native writers know what not to imitate.

are there any non-native languages where separate are spelled seperate?

</F>

John Machin · Dec 21, 2004

Nick Coghlan wrote:
[snip]

delimeter.

Hey, Terry, another varmint over here!

Steven Bethard · Dec 21, 2004

John said:
Nick Coghlan wrote:
[snip]

delimeter.

Click to expand...

Hey, Terry, another varmint over here!

No, no. He's talking about a deli-meter. It's the metric standard for
measuring subs and sandwiches.

Steve

Peter Otten · Dec 21, 2004

Reinhold said:
How did you get these data points?

I copied the numbers from these pages:

http://www.google.com/search?q=separate
http://groups-beta.google.com/group/comp.lang.python/search?group=comp.lang.python&q=separate
http://www.google.com/search?q=site:slashdot.org+separate
http://groups-beta.google.com/group/comp.lang.perl.misc/search?group=comp.lang.perl.misc&q=separate

Same thing for the "alternative" spelling.

Peter

Collecting multiple items and saving to one list item, for eventual storage as a record.	8	Mar 5, 2023
Make an <input type="text"> input-ready for input	8	Aug 17, 2023
"input-group-text" help	7	Aug 10, 2023
Reversing output of user input by using while loop...	2	Sep 1, 2022
Python battle game help	2	Feb 23, 2023
I would like to use awk to calculate the total number of records processed	1	Aug 25, 2022
Select files based on text list of filenames(part of the name:date) with condition	0	May 4, 2022
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022

input record seperator (equivalent of "$|" of perl)

les_ander

Fredrik Lundh

Keith Dart

Doug Holton

M.E.Farmer

Fredrik Lundh

M.E.Farmer

Scott David Daniels

M.E.Farmer

Nick Coghlan

=?ISO-8859-1?Q?G=E1bor_Farkas?=

Scott David Daniels

Terry Reedy

Fredrik Lundh

Peter Otten

Reinhold Birkenfeld

Fredrik Lundh

John Machin

Steven Bethard

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads