Reading from text files

Discussion in 'Python' started by Thomas Philips, Feb 22, 2004.

  1. In the course of playing around with file input and output, I came
    across some behavior that is not quite intuitive. I created a simple
    text file, test.txt, which contains only 3 lines, and which I expect
    will have 5 characters (the digits 1, 2, and 3, and two newline
    characters, the first after 1 and the second after 2). Here it is in
    all its glory:
    1
    2
    3

    However, when I read it using open()and then view it using
    >>> file.seek(0); file.read(); file.tell()

    I get:
    '1\n2\n3'
    7L

    Python thinks there are 7 characters in the file! If I type
    >>> file.seek(1); file.read() OR >>>file.seek(2); file.read()

    I get
    '\n2\n3'

    but
    >>> file.seek(3); file.read()

    gives me what I expected to get with file.seek(2); file.read()
    '2\n3'

    It appears that Python sometimes counts each of the newline escape
    sequences as 2 separate characters and at other times as 1 indivisible
    character. What is the appropriate way to think about these
    characters?

    Thomas Philips
     
    Thomas Philips, Feb 22, 2004
    #1
    1. Advertising

  2. Thomas Philips

    Jeff Epler Guest

    There are three solutions to this problem:
    1. Don't use Windows
    2. Only use offsets with file.seek() that were returned by file.tell()
    3. Open the file in binary mode

    Windows stores "\n" as a two-byte sequence in text files when written,
    and then transforms the two-byte sequence into "\n" when reading, for
    files opened as text files.

    file.seek() on Windows only knows about raw byte offsets, though, so
    if you know the first line of a file is "a\n", you can't seek to 2 to
    get to the second line, because that line actually starts at byte 3
    (The value .tell() would return after you read the first line)


    Jeff
     
    Jeff Epler, Feb 22, 2004
    #2
    1. Advertising

  3. Thomas Philips

    Paul Watson Guest

    "Thomas Philips" <> wrote in message
    news:...
    > In the course of playing around with file input and output, I came
    > across some behavior that is not quite intuitive. I created a simple
    > text file, test.txt, which contains only 3 lines, and which I expect
    > will have 5 characters (the digits 1, 2, and 3, and two newline
    > characters, the first after 1 and the second after 2). Here it is in
    > all its glory:
    > 1
    > 2
    > 3
    >
    > However, when I read it using open()and then view it using
    > >>> file.seek(0); file.read(); file.tell()

    > I get:
    > '1\n2\n3'
    > 7L
    >
    > Python thinks there are 7 characters in the file! If I type
    > >>> file.seek(1); file.read() OR >>>file.seek(2); file.read()

    > I get
    > '\n2\n3'
    >
    > but
    > >>> file.seek(3); file.read()

    > gives me what I expected to get with file.seek(2); file.read()
    > '2\n3'
    >
    > It appears that Python sometimes counts each of the newline escape
    > sequences as 2 separate characters and at other times as 1 indivisible
    > character. What is the appropriate way to think about these
    > characters?
    >
    > Thomas Philips


    If you want to actually "see" what is in the file do a directory listing and
    dump the file in hex.

    On DOS/Windows do a 'dir test.txt' command and inspect the size of the file.
    Then, do a 'debug test.txt' command. At the prompt, enter the 'r' command
    and press enter. Examine the CX register. It will have the same value as
    the size of the file. Then do a 'd' command to dump the bytes out and you
    can see exactly what is in the file.

    On UNIX/Linux use 'ls -l test.txt' to see the directory listing containing
    the size of the file. Use something like 'od -Ax -x test.txt' to see the
    contents of the file. If that command does not produce something you like,
    use 'man od' to find the parameters with which you are more comfortable.
     
    Paul Watson, Feb 23, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Darrel
    Replies:
    3
    Views:
    695
    Kevin Spencer
    Nov 11, 2004
  2. rob hadow

    JAR files reading list files

    rob hadow, May 21, 2004, in forum: Java
    Replies:
    4
    Views:
    4,800
    rob hadow
    May 21, 2004
  3. crazyprakash
    Replies:
    4
    Views:
    3,419
    adrian
    Oct 30, 2005
  4. Replies:
    4
    Views:
    984
    M.E.Farmer
    Feb 13, 2005
  5. Replies:
    0
    Views:
    802
Loading...

Share This Page