Making a copy (not reference) of a file handle,or starting stdin over at line 0

Discussion in 'Python' started by Shawn Milochik, Aug 17, 2007.

  1. I wrote a script which will convert a tab-delimited file to a
    fixed-width file, or a fixed-width file into a tab-delimited. It reads
    a config file which defines the field lengths, and uses it to convert
    either way.

    Here's an example of the config file:

    1:6,7:1,8:9,17:15,32:10

    This converts a fixed-width file to a tab-delimited where the first
    field is the first six characters of the file, the second is the
    seventh, etc. Conversely, it converts a tab-delimited file to a file
    where the first six characters are the first tab field, right-padded
    with spaces, and so on.

    What I want to do is look at the file and decide whether to run the
    function to convert the file to tab or FW. Here is what works
    (mostly):

    x = inputFile.readline().split("\t")
    inputFile.seek(0)

    if len(x) > 1:
    toFW(inputFile)
    else:
    toTab(inputFile)


    The problem is that my file accepts the input file via stdin (pipe) or
    as an argument to the script. If I send the filename as an argument,
    everything works perfectly.

    If I pipe the input file into the script, it is unable to seek() it. I
    tried making a copy of inputFile and doing a readline() from it, but
    being a reference, it makes no difference.

    How can I check a line (or two) from my input file (or stdin stream)
    and still be able to process all the records with my function?

    Thanks,
    Shawn
    Shawn Milochik, Aug 17, 2007
    #1
    1. Advertising

  2. Shawn Milochik

    Peter Otten Guest

    Re: Making a copy (not reference) of a file handle, or starting stdin over at line 0

    Shawn Milochik wrote:

    > How can I check a line (or two) from my input file (or stdin stream)
    > and still be able to process all the records with my function?


    One way:

    from itertools import chain
    firstline = instream.next()
    head = [firstline]

    # loop over entire file
    for line in chain(head, instream):
    process(line)


    You can of course read more than one line as long as you append it to the
    head list. Here's an alternative:

    from itertools import tee
    a, b = tee(instream)

    for line in a:
    # determine file format,
    # break when done

    # this is crucial for memory efficiency
    # but may have no effect in implementations
    # other than CPython
    del a

    # loop over entire file
    for line in b:
    # process line


    Peter
    Peter Otten, Aug 17, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. G Kannan
    Replies:
    1
    Views:
    1,234
    Eric J. Roode
    Oct 11, 2003
  2. Dietrich
    Replies:
    1
    Views:
    641
    Joe Smith
    Jul 22, 2004
  3. Leon
    Replies:
    2
    Views:
    524
  4. RG
    Replies:
    20
    Views:
    996
    Nobody
    Aug 12, 2010
  5. Max Williams
    Replies:
    5
    Views:
    124
    Brian Candler
    May 28, 2009
Loading...

Share This Page