input record seperator (equivalent of "$|" of perl)

Discussion in 'Python' started by les_ander@yahoo.com, Dec 19, 2004.

  1. Guest

    Hi,
    I know that i can do readline() from a file object.
    However, how can I read till a specific seperator?
    for exmple,
    if my files are

    name
    profession
    id
    #
    name2
    profession3
    id2

    I would like to read this file as a record.
    I can do this in perl by defining a record seperator;
    is there an equivalent in python?
    thanks
    , Dec 19, 2004
    #1
    1. Advertising

  2. wrote:

    > I know that i can do readline() from a file object.
    > However, how can I read till a specific seperator?
    >
    > for exmple,
    > if my files are
    >
    > name
    > profession
    > id
    > #
    > name2
    > profession3
    > id2
    >
    > I would like to read this file as a record.
    > I can do this in perl by defining a record seperator;
    > is there an equivalent in python?


    not really; you have to do it manually.

    if the file isn't too large, consider reading all of it, and splitting on the
    separator:

    for record in file.read().split(separator):
    print record # process record

    if you're using a line-oriented separator, like in your example, tools like
    itertools.groupby can be quite handy:

    from itertools import groupby

    def is_separator(line):
    return line[:1] == "#"

    for sep, record in groupby(file, is_separator):
    if not sep:
    print list(record) # process record

    or you could just spell things out:

    record = []
    for line in file:
    if line[0] == "#":
    if record:
    print record # process record
    record = []
    else:
    record.append(line)
    if record:
    print record # process the last record, if any

    </F>
    Fredrik Lundh, Dec 19, 2004
    #2
    1. Advertising

  3. Keith Dart Guest

    wrote:
    > Hi,
    > I know that i can do readline() from a file object.
    > However, how can I read till a specific seperator?
    > for exmple,
    > if my files are
    >
    > name
    > profession
    > id
    > #
    > name2
    > profession3
    > id2
    >
    > I would like to read this file as a record.
    > I can do this in perl by defining a record seperator;
    > is there an equivalent in python?
    > thanks
    >


    I don't think so. But in the pyNMS package
    (http://sourceforge/net/projects/pynms) there is a module called
    "expect", and a class "Expect". With that you can wrap a file object and
    use the Expect.read_until() method to do what you want.



    --
    -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Keith Dart <>
    public key: ID: F3D288E4
    =====================================================================
    Keith Dart, Dec 19, 2004
    #3
  4. Doug Holton Guest

    wrote:

    > Hi,
    > I know that i can do readline() from a file object.
    > However, how can I read till a specific seperator?
    > for exmple,
    > if my files are
    >
    > name
    > profession
    > id
    > #
    > name2
    > profession3
    > id2
    >
    > I would like to read this file as a record.
    > I can do this in perl by defining a record seperator;
    > is there an equivalent in python?
    > thanks
    >


    To actually answer your question, there is no equivalent to $| in python.

    You need to hand code your own record parser, or else read in the whole
    contents of the file and use the string split method to chop it up into
    fields.
    Doug Holton, Dec 19, 2004
    #4
  5. M.E.Farmer Guest

    What about a generator and xreadlines for those really large files:

    py>def recordbreaker(recordpath, seperator='#'):
    .... rec = open(recordpath ,'r')
    .... xrecord = rec.xreadlines()
    .... a =[]
    .... for line in xrecord:
    .... sep = line.find(seperator)
    .... if sep != -1:
    .... a.append(line[:sep])
    .... out = ''.join(a)
    .... a =[]
    .... a.append(line[sep+1:])
    .... yield out
    .... else:
    .... a.append(line)
    .... if a:
    .... yield ''.join(a)
    .... rec.close()
    ....
    py>records = recordbreaker('/tmp/myrecords.txt')
    py>for item in records:
    .... print item

    M.E.Farmer

    wrote:
    > Hi,
    > I know that i can do readline() from a file object.
    > However, how can I read till a specific seperator?
    > for exmple,
    > if my files are
    >
    > name
    > profession
    > id
    > #
    > name2
    > profession3
    > id2
    >
    > I would like to read this file as a record.
    > I can do this in perl by defining a record seperator;
    > is there an equivalent in python?
    > thanks
    M.E.Farmer, Dec 20, 2004
    #5
  6. "M.E.Farmer" wrote:

    > What about a generator and xreadlines for those really large files:


    when you loop over a file object, Python uses a generator and a xreadlines-
    style buffering system to read data as you go. (if you check the on-line help,
    you'll notice that xreadlines itself is only provided for compatibility reasons).

    or in other words, the examples I posted a couple of hours ago uses no more
    memory than your version.

    </F>
    Fredrik Lundh, Dec 20, 2004
    #6
  7. M.E.Farmer Guest

    Fredrik,
    Thanks didn't realize that about reading a file on a for loop. Slick!
    By the way the code I posted was an attempt at not building a
    monolithic memory eating list like you did to return the values in your
    second example.
    Kinda thought it would be nice to read them as needed instead of all at
    once.
    I dont have itertools yet. That module looks like it rocks.
    thanks for the pointers,
    M.E.Farmer
    M.E.Farmer, Dec 20, 2004
    #7
  8. M.E.Farmer wrote:
    > I dont have itertools yet. That module looks like it rocks.
    > thanks for the pointers,
    > M.E.Farmer
    >


    If you have python 2.3 or 2.4, you have itertools.


    --Scott David Daniels
    Scott David Daniels, Dec 20, 2004
    #8
  9. M.E.Farmer Guest

    Yea I should have mentioned I am running python 2.2.2.
    Can it be ported to python 2.2.2?
    Till they get python 2.4 all up and running....I'll wait a bit.
    Thanks for the info,
    M.E.Farmer

    Scott David Daniels wrote:
    > M.E.Farmer wrote:
    > > I dont have itertools yet. That module looks like it rocks.
    > > thanks for the pointers,
    > > M.E.Farmer
    > >

    >
    > If you have python 2.3 or 2.4, you have itertools.
    >
    >
    > --Scott David Danies
    >
    M.E.Farmer, Dec 20, 2004
    #9
  10. Nick Coghlan Guest

    wrote:
    > I would like to read this file as a record.
    > I can do this in perl by defining a record seperator;
    > is there an equivalent in python?


    Depending on your exact use case, you may also get some mileage out of using the
    csv module with a custom delimeter.

    Py> from csv import reader
    Py> parsed = reader(demo, delimiter='|')
    Py> for line in parsed: print line
    ....
    ['a', 'b', 'c', 'd']
    ['1', '2', '3', '4']

    Cheers,
    Nick.

    P.S. 'demo' was created via:
    Py> from tempfile import TemporaryFile
    Py> demo = TemporaryFile()
    Py> demo.write(txt)
    Py> demo.seek(0)
    Py> demo.read()
    'a|b|c|d\n1|2|3|4'
    Py> demo.seek(0)

    --
    Nick Coghlan | | Brisbane, Australia
    ---------------------------------------------------------------
    http://boredomandlaziness.skystorm.net
    Nick Coghlan, Dec 20, 2004
    #10
  11. Scott David Daniels wrote:
    > M.E.Farmer wrote:
    >
    >> I dont have itertools yet. That module looks like it rocks.
    >> thanks for the pointers,
    >> M.E.Farmer
    >>

    >
    > If you have python 2.3 or 2.4, you have itertools.
    >

    for me it seems that 2.3 does not have itertools.groupby.
    it has itertools, but not itertools.groupby.

    activepython-2.4:(win)
    >>> import itertools
    >>> dir(itertools)

    ['__doc__', '__name__', 'chain', 'count', 'cycle', 'dropwhile',
    'groupby', 'ifilter', 'ifilterfalse', 'imap', 'islice', 'izip',
    'repeat', 'starmap', 'takewhile', 'tee']


    python-2.3:(linux)
    >>> import itertools
    >>> dir(itertools)

    ['__doc__', '__file__', '__name__', 'chain', 'count', 'cycle',
    'dropwhile', 'ifilter', 'ifilterfalse', 'imap', 'islice', 'izip',
    'repeat', 'starmap', 'takewhile']


    gabor
    =?ISO-8859-1?Q?G=E1bor_Farkas?=, Dec 20, 2004
    #11
  12. Gábor Farkas wrote:
    > Scott David Daniels wrote:
    >> If you have python 2.3 or 2.4, you have itertools.

    > for me it seems that 2.3 does not have itertools.groupby.
    > it has itertools, but not itertools.groupby.


    True. The 2.4 document says that itertools.groupby() is equivalent to:

    class groupby(object):
    def __init__(self, iterable, key=None):
    if key is None:
    key = lambda x: x
    self.keyfunc = key
    self.it = iter(iterable)
    self.tgtkey = self.currkey = self.currvalue = xrange(0)
    def __iter__(self):
    return self
    def next(self):
    while self.currkey == self.tgtkey:
    self.currvalue = self.it.next() # Exit on StopIteration
    self.currkey = self.keyfunc(self.currvalue)
    self.tgtkey = self.currkey
    return (self.currkey, self._grouper(self.tgtkey))
    def _grouper(self, tgtkey):
    while self.currkey == tgtkey:
    yield self.currvalue
    self.currvalue = self.it.next() # Exit on StopIteration
    self.currkey = self.keyfunc(self.currvalue)

    So you could always just use that code.

    --Scott David Daniels
    Scott David Daniels, Dec 20, 2004
    #12
  13. Terry Reedy Guest

    Re: input record sepArator (not sepErator)

    'separate' (se-parate == take a-part) and its derivatives are perhaps the
    most frequently misspelled English word on clp. Seems to be 'par' for the
    course. It has 2 e's bracketing 2 a's. It derives from the Latin
    'parare', as does pare, so 'par' is the essential root of the word.

    My gripe for the day, just to let non-native writers know what not to
    imitate.

    tjr
    Terry Reedy, Dec 20, 2004
    #13
  14. Scott David Daniels wrote:

    > True. The 2.4 document says that itertools.groupby() is equivalent to:
    >
    > class groupby(object):


    > So you could always just use that code.


    the right way to do that is to use the Python version as a fallback:

    try:
    from itertools import groupby
    except ImportError:
    class groupby(object):
    ...

    </F>
    Fredrik Lundh, Dec 20, 2004
    #14
  15. Peter Otten Guest

    Re: input record sepArator (not sepErator)

    Terry Reedy wrote:

    > 'separate' (se-parate == take a-part) and its derivatives are perhaps the
    > most frequently misspelled English word on clp. Seems to be 'par' for the
    > course. It has 2 e's bracketing 2 a's. It derives from the Latin
    > 'parare', as does pare, so 'par' is the essential root of the word.
    >
    > My gripe for the day, just to let non-native writers know what not to
    > imitate.


    I hereby suggest seperate/(separate+seperate) as the hamburger standard (see
    http://www.oanda.com/products/bigmac/bigmac.shtml) for technical
    communities. Some data points, adjectives only, s.e.e.o.:

    the web: 4%
    python: 9%
    slashdot: 26%
    perl: 29% *

    Now draw your conclusions...

    Peter

    (*) Do you really believe I would have posted that if the outcome were in
    favour of perl? No, definately not... lest you run out of gripes...
    Peter Otten, Dec 20, 2004
    #15
  16. Re: input record sepArator (not sepErator)

    Peter Otten wrote:
    > Terry Reedy wrote:
    >
    >> 'separate' (se-parate == take a-part) and its derivatives are perhaps the
    >> most frequently misspelled English word on clp. Seems to be 'par' for the
    >> course. It has 2 e's bracketing 2 a's. It derives from the Latin
    >> 'parare', as does pare, so 'par' is the essential root of the word.
    >>
    >> My gripe for the day, just to let non-native writers know what not to
    >> imitate.

    >
    > I hereby suggest seperate/(separate+seperate) as the hamburger standard (see
    > http://www.oanda.com/products/bigmac/bigmac.shtml) for technical
    > communities. Some data points, adjectives only, s.e.e.o.:
    >
    > the web: 4%
    > python: 9%
    > slashdot: 26%
    > perl: 29% *


    How did you get these data points?

    Reinhold

    --
    [Windows ist wie] die Bahn: Man muss sich um nichts kuemmern, zahlt fuer
    jede Kleinigkeit einen Aufpreis, der Service ist mies, Fremde koennen
    jederzeit einsteigen, es ist unflexibel und zu allen anderen Verkehrs-
    mitteln inkompatibel. -- Florian Diesch in dcoulm
    Reinhold Birkenfeld, Dec 21, 2004
    #16
  17. Re: input record sepArator (not sepErator)

    Terry Reedy wrote:

    > My gripe for the day, just to let non-native writers know what not to imitate.


    are there any non-native languages where separate are spelled seperate?

    </F>
    Fredrik Lundh, Dec 21, 2004
    #17
  18. John Machin Guest

    Re: input record sepArator (equivalent of "$|" of perl)

    Nick Coghlan wrote:
    [snip]
    > delimeter.


    Hey, Terry, another varmint over here!
    John Machin, Dec 21, 2004
    #18
  19. Re: input record sepArator (equivalent of "$|" of perl)

    John Machin wrote:
    > Nick Coghlan wrote:
    > [snip]
    >
    >>delimeter.

    >
    > Hey, Terry, another varmint over here!


    No, no. He's talking about a deli-meter. It's the metric standard for
    measuring subs and sandwiches. ;)

    Steve
    Steven Bethard, Dec 21, 2004
    #19
  20. Peter Otten Guest

    Re: input record sepArator (not sepErator)

    Reinhold Birkenfeld wrote:

    >> the web: 4%
    >> python: 9%
    >> slashdot: 26%
    >> perl: 29% *

    >
    > How did you get these data points?


    I copied the numbers from these pages:

    http://www.google.com/search?q=separate
    http://groups-beta.google.com/group/comp.lang.python/search?group=comp.lang.python&q=separate
    http://www.google.com/search?q=site:slashdot.org separate
    http://groups-beta.google.com/group/comp.lang.perl.misc/search?group=comp.lang.perl.misc&q=separate

    Same thing for the "alternative" spelling.

    Peter
    Peter Otten, Dec 21, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. xmldig
    Replies:
    0
    Views:
    522
    xmldig
    Nov 30, 2005
  2. Replies:
    3
    Views:
    3,935
    barryman9000
    Jul 16, 2008
  3. Thousand Seperator

    , Mar 14, 2008, in forum: Python
    Replies:
    5
    Views:
    344
    Jeroen Ruigrok van der Werven
    Mar 14, 2008
  4. greymaus

    Record seperator

    greymaus, Aug 26, 2011, in forum: Python
    Replies:
    10
    Views:
    372
    greymaus
    Aug 28, 2011
  5. eddie wang

    thousand seperator for a number

    eddie wang, Apr 16, 2004, in forum: ASP General
    Replies:
    2
    Views:
    153
    Bullschmidt
    Apr 19, 2004
Loading...

Share This Page