Question about file objects...

Discussion in 'Python' started by J, Dec 2, 2009.

  1. J

    J Guest

    Something that came up in class...

    when you are pulling data from a file using f.next(), the file is read
    one line at a time.

    What was explained to us is that Python iterates the file based on a
    carriage return as the delimiter.
    But what if you have a file that has one line of text, but that one
    line has 16,000 items that are comma delimited?

    Is there a way to read the file, one item at a time, delimited by
    commas WITHOUT having to read all 16,000 items from that one line,
    then split them out into a list or dictionary??

    Cheers
    Jeff

    --

    Ogden Nash - "The trouble with a kitten is that when it grows up,
    it's always a cat." -
    http://www.brainyquote.com/quotes/authors/o/ogden_nash.html
     
    J, Dec 2, 2009
    #1
    1. Advertising

  2. J

    nn Guest

    On Dec 2, 9:14 am, J <> wrote:
    > Something that came up in class...
    >
    > when you are pulling data from a file using f.next(), the file is read
    > one line at a time.
    >
    > What was explained to us is that Python iterates the file based on a
    > carriage return as the delimiter.
    > But what if you have a file that has one line of text, but that one
    > line has 16,000 items that are comma delimited?
    >
    > Is there a way to read the file, one item at a time, delimited by
    > commas WITHOUT having to read all 16,000 items from that one line,
    > then split them out into a list or dictionary??
    >
    > Cheers
    > Jeff
    >
    > --
    >
    > Ogden Nash  - "The trouble with a kitten is that when it grows up,
    > it's always a cat." -http://www.brainyquote.com/quotes/authors/o/ogden_nash.html


    File iteration is a convenience since it is the most common case. If
    everything is on one line, you will have to handle record separators
    manually by using the .read(<number_of_bytes>) method on the file
    object and searching for the comma. If everything fits in memory the
    straightforward way would be to read the whole file with .read() and
    use .split(",") on the returned string. That should give you a nice
    list of everything.
     
    nn, Dec 2, 2009
    #2
    1. Advertising

  3. J

    J Guest

    On Wed, Dec 2, 2009 at 09:27, nn <> wrote:
    >> Is there a way to read the file, one item at a time, delimited by
    >> commas WITHOUT having to read all 16,000 items from that one line,
    >> then split them out into a list or dictionary??


    > File iteration is a convenience since it is the most common case. If
    > everything is on one line, you will have to handle record separators
    > manually by using the .read(<number_of_bytes>) method on the file
    > object and searching for the comma. If everything fits in memory the
    > straightforward way would be to read the whole file with .read() and
    > use .split(",") on the returned string. That should give you a nice
    > list of everything.


    Agreed. The confusion came because the guy teaching said that
    iterating the file is delimited by a carriage return character...
    which to me sounds like it's an arbitrary thing that can be changed...

    I was already thinking that I'd have to read it in small chunks and
    search for the delimiter i want... and reading the whole file into a
    string and then splitting that would would be nice, until the file is
    so large that it starts taking up significant amounts of memory.

    Anyway, thanks both of you for the explanations... I appreciate the help!

    Cheers
    Jeff



    --

    Charles de Gaulle - "The better I get to know men, the more I find
    myself loving dogs." -
    http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html
     
    J, Dec 2, 2009
    #3
  4. J

    Terry Reedy Guest

    J wrote:
    > On Wed, Dec 2, 2009 at 09:27, nn <> wrote:
    >>> Is there a way to read the file, one item at a time, delimited by
    >>> commas WITHOUT having to read all 16,000 items from that one line,
    >>> then split them out into a list or dictionary??

    >
    >> File iteration is a convenience since it is the most common case. If
    >> everything is on one line, you will have to handle record separators
    >> manually by using the .read(<number_of_bytes>) method on the file
    >> object and searching for the comma. If everything fits in memory the
    >> straightforward way would be to read the whole file with .read() and
    >> use .split(",") on the returned string. That should give you a nice
    >> list of everything.

    >
    > Agreed. The confusion came because the guy teaching said that
    > iterating the file is delimited by a carriage return character...


    If he said exactly that, he is not exactly correct. File iteration looks
    for line ending character(s), which depends on the system or universal
    newline setting.

    > which to me sounds like it's an arbitrary thing that can be changed...
    >
    > I was already thinking that I'd have to read it in small chunks and
    > search for the delimiter i want... and reading the whole file into a
    > string and then splitting that would would be nice, until the file is
    > so large that it starts taking up significant amounts of memory.
    >
    > Anyway, thanks both of you for the explanations... I appreciate the help!


    I would not be surprised if a generic file chunk generator were posted
    somewhere. It would be a good entry for the Python Cookbook, if not
    there already.

    tjr
     
    Terry Reedy, Dec 2, 2009
    #4
  5. J

    r0g Guest

    J wrote:
    > Something that came up in class...
    >
    > when you are pulling data from a file using f.next(), the file is read
    > one line at a time.
    >
    > What was explained to us is that Python iterates the file based on a
    > carriage return as the delimiter.
    > But what if you have a file that has one line of text, but that one
    > line has 16,000 items that are comma delimited?
    >
    > Is there a way to read the file, one item at a time, delimited by
    > commas WITHOUT having to read all 16,000 items from that one line,
    > then split them out into a list or dictionary??
    >
    > Cheers
    > Jeff
    >



    Generators are good way of dealing with that sort of thing...

    http://dalkescientific.com/writings/NBN/generators.html

    Have the generator read in large chunks from file in binary mode then
    use string searching/splitting to dole out records one at a time,
    topping up the cache when needed.

    Roger.
     
    r0g, Dec 3, 2009
    #5
  6. J

    nn Guest

    On Dec 2, 6:56 pm, Terry Reedy <> wrote:
    > J wrote:
    > > On Wed, Dec 2, 2009 at 09:27, nn <> wrote:
    > >>> Is there a way to read the file, one item at a time, delimited by
    > >>> commas WITHOUT having to read all 16,000 items from that one line,
    > >>> then split them out into a list or dictionary??

    >
    > >> File iteration is a convenience since it is the most common case. If
    > >> everything is on one line, you will have to handle record separators
    > >> manually by using the .read(<number_of_bytes>) method on the file
    > >> object and searching for the comma. If everything fits in memory the
    > >> straightforward way would be to read the whole file with .read() and
    > >> use .split(",") on the returned string. That should give you a nice
    > >> list of everything.

    >
    > > Agreed. The confusion came because the guy teaching said that
    > > iterating the file is delimited by a carriage return character...

    >
    > If he said exactly that, he is not exactly correct. File iteration looks
    > for line ending character(s), which depends on the system or universal
    > newline setting.
    >
    > > which to me sounds like it's an arbitrary thing that can be changed...

    >
    > > I was already thinking that I'd have to read it in small chunks and
    > > search for the delimiter i want...  and reading the whole file into a
    > > string and then splitting that would would be nice, until the file is
    > > so large that it starts taking up significant amounts of memory.

    >
    > > Anyway, thanks both of you for the explanations... I appreciate the help!

    >
    > I would not be surprised if a generic file chunk generator were posted
    > somewhere. It would be a good entry for the Python Cookbook, if not
    > there already.
    >
    > tjr


    There should be but writing one isn't too difficult:

    def chunker(file_obj):
    parts=['']
    while True:
    fdata=file_obj.read(8192)
    if not fdata: break
    parts=(parts[-1]+fdata).split(',')
    for col in parts[:-1]:
    yield col
    yield parts[-1]
     
    nn, Dec 3, 2009
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. bigbinc
    Replies:
    3
    Views:
    403
    Michael Borgwardt
    Nov 18, 2003
  2. Simon Elliott

    Inheritance of objects within objects

    Simon Elliott, Dec 10, 2004, in forum: C++
    Replies:
    2
    Views:
    337
    Simon Elliott
    Dec 10, 2004
  3. =?Utf-8?B?c3RldmVuIHNjYWlmZQ==?=

    form objects into class objects

    =?Utf-8?B?c3RldmVuIHNjYWlmZQ==?=, Jul 5, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    437
    =?Utf-8?B?c3RldmVuIHNjYWlmZQ==?=
    Jul 6, 2006
  4. JoeC
    Replies:
    6
    Views:
    320
    kwikius
    Oct 5, 2006
  5. 7stud
    Replies:
    11
    Views:
    700
    Dennis Lee Bieber
    Mar 20, 2007
Loading...

Share This Page