file.read problem

Discussion in 'Python' started by wscrsurfdude, Feb 17, 2006.

  1. wscrsurfdude

    wscrsurfdude Guest

    f = open('myfile,'r')
    a = f.read(5000)

    When I do this I get the first 634 bytes. I tried using the:
    f = open('myfile,'rb')
    option, but now there are a few 0x0D bytes extra in myfile. 0x0D =
    Carriage return. How can I make a program that not puts in the 0x0D
    bytes in windows.

    In linux the first 2 lines are working perfectly.
     
    wscrsurfdude, Feb 17, 2006
    #1
    1. Advertising

  2. > When I do this I get the first 634 bytes. I tried using the:
    > f = open('myfile,'rb')
    > option, but now there are a few 0x0D bytes extra in myfile. 0x0D =
    > Carriage return. How can I make a program that not puts in the 0x0D
    > bytes in windows.



    Try opening the file in 'rbU' mode. This will use universal newline mode
    and convert all carriage returns to line feeds.

    -Farshid
     
    Farshid Lashkari, Feb 17, 2006
    #2
    1. Advertising

  3. wscrsurfdude

    wscrsurfdude Guest

    >Try opening the file in 'rbU' mode. This will use universal newline mode
    >and convert all carriage returns to line feeds.


    I tried this, but as you say, now there are 0x0A bytes extra in my
    files, is there also a possibility to let all these things out, and
    just get the file.

    I am working on a script to get parts of raw data out of a file, and
    the data I read has to be the data written in the file without CR or
    LF.
     
    wscrsurfdude, Feb 17, 2006
    #3
  4. > I am working on a script to get parts of raw data out of a file, and
    > the data I read has to be the data written in the file without CR or
    > LF.


    So you just want to remove all the linefeeds? This should work then:

    data = data.replace('\n','')

    -Farshid
     
    Farshid Lashkari, Feb 17, 2006
    #4
  5. wscrsurfdude

    wscrsurfdude Guest

    Farshid Lashkari wrote:
    > > I am working on a script to get parts of raw data out of a file, and
    > > the data I read has to be the data written in the file without CR or
    > > LF.

    >
    > So you just want to remove all the linefeeds? This should work then:
    >
    > data = data.replace('\n','')
    >
    > -Farshid


    The problem is if I remove the linefeeds, I also delete readout data if
    it is 0x0A, and I don't want this, because the files I readout has to
    be a part of the original data. Another idea??

    But still my question is why is the:
    f = open('myfile,'r')
    a = f.read(5000)
    working in linux??
     
    wscrsurfdude, Feb 17, 2006
    #5
  6. "wscrsurfdude" wrote:

    > >Try opening the file in 'rbU' mode. This will use universal newline mode
    > >and convert all carriage returns to line feeds.

    >
    > I tried this, but as you say, now there are 0x0A bytes extra in my
    > files, is there also a possibility to let all these things out, and
    > just get the file.
    >
    > I am working on a script to get parts of raw data out of a file, and
    > the data I read has to be the data written in the file without CR or
    > LF.


    what kind of file are you reading? if it's a text file, it's supposed to have
    LF in it (or CR LF if you read it in binary mode); the LF:s are there to tell
    you where each line ends.

    if it's a binary file, open with mode "rb".

    </F>
     
    Fredrik Lundh, Feb 17, 2006
    #6
  7. wscrsurfdude

    wscrsurfdude Guest

    >if it's a binary file, open with mode "rb".
    You are right about opening it in the rb mode (flaw in the start post),
    but also when I do this in windows in front of every 0x0A is put a
    0x0D. I found a explanation why it is working in linux it is below in
    my post.

    But what i get of this that in windows in front of every 0x0A is put a
    0x0D as a line feed. II have to get rid of these. But if there is
    already binary data in my original file with the data 0x0D0A the 0x0D
    also is deleted, someone has an idea??

    ############################################
    The whole subject of newlines and text files is a murky area of non
    standard implementation by different operating systems. These
    differences have their roots in the early days of data communications
    and the control of mechanical teleprinters. Basically there are 3
    different ways to indicate a new line:

    Carriage Return (CR) character ('\r')
    Line Feed (LF) character ('\n')
    CR/LF pair ('\r\n').
    All three techniques are used in different operating systems. MS DOS
    (and therefore Windows) uses method 3. Unix (including Linux) uses
    method 2. Apple in its original MacOS used method 1, but now uses
    method 2 since MacOS X is really a variant of Unix.

    So how can the poor programmer cope with this multiplicity of line
    endings? In many languages she just has to do lots of tests and take
    different action per OS. In more modern languages, including Python,
    the language provides facilities for dealing with the mess for you. In
    the case of Python the assistance comes in the form of the os module
    which defines a variable called linesep which is set to whatever the
    newline character is on the current operating system. This makes adding
    newlines easy, and rstrip() takes account of the OS when it does its
    work of removing them, so really the simple way to stay sane, so far as
    newlines are concerned is: always use rstrip() to remove newlines from
    lines read from a file and always add os.linesep to strings being
    written to a file.

    That still leaves the awkward situation where a file is created on one
    OS and then processed on another, incompatible, OS and sadly, there
    isn't much we can do about that except to compare the end of the line
    with os.linesep to determine what the difference is.
    ######################################
     
    wscrsurfdude, Feb 17, 2006
    #7
  8. wscrsurfdude

    wscrsurfdude Guest

    I have the solution, the flaw was not in the opening of the file, but
    in the writing of the file. Stupid me, i opened it with mode rb, but
    wrote it with w instead of with wb

    Everybody thanks for helping me.
     
    wscrsurfdude, Feb 17, 2006
    #8
  9. On Fri, 17 Feb 2006 00:15:31 -0800, wscrsurfdude wrote:

    >
    > Farshid Lashkari wrote:
    >> > I am working on a script to get parts of raw data out of a file, and
    >> > the data I read has to be the data written in the file without CR or
    >> > LF.

    >>
    >> So you just want to remove all the linefeeds? This should work then:
    >>
    >> data = data.replace('\n','')
    >>
    >> -Farshid

    >
    > The problem is if I remove the linefeeds, I also delete readout data if
    > it is 0x0A, and I don't want this, because the files I readout has to
    > be a part of the original data. Another idea??


    Er, have I understood you correctly? You seem to be saying that some
    linefeeds are significant data, and some are not, and you want somebody to
    tell you how to remove the insignificant "linefeed = end of line"
    characters without removing the significant "linefeed = important data"
    characters.

    That's easy:

    from blackmagic import readmymind, dowhatiwant
    fp = file("data", "rb")
    readmymind()
    data = dowhatiwant(fp.read())

    You'll need Python 3.0 for the blackmagic module.

    *wink*

    Seriously, if this is your problem, then you will have no choice but to
    carefully analyse the file yourself, looking at each linefeed and tossing
    it away if it is insignificant. We can't tell you how to do that, because
    we don't know which linefeeds are data and which are not.



    > But still my question is why is the:
    > f = open('myfile,'r')
    > a = f.read(5000)
    > working in linux??


    Why shouldn't it work in Linux? The question should be, why is it not
    working in Windows? (When did "my code is broken" become the excepted
    state of affairs, and "my code works" the mystery that needs solving?)

    I wonder whether there is a ctrl-Z in your data, and Windows is
    interpreting that as end of file.


    --
    Steven.
     
    Steven D'Aprano, Feb 17, 2006
    #9
  10. "wscrsurfdude" wrote:

    > >if it's a binary file, open with mode "rb".


    > You are right about opening it in the rb mode (flaw in the start post),
    > but also when I do this in windows in front of every 0x0A is put a
    > 0x0D. I found a explanation why it is working in linux it is below in
    > my post.
    >
    > But what i get of this that in windows in front of every 0x0A is put a
    > 0x0D as a line feed. II have to get rid of these.


    if you open a file in binary mode ("rb"), you get the data that's in the
    file. no more, no less. if someone's adding CR to the files, that happens
    before you opened them in Python.

    have you, perhaps, copied binary files between the systems using FTP
    in text mode? if so, you've damaged the files, and there's no way to fix
    them, in general.

    </F>
     
    Fredrik Lundh, Feb 17, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. keithb
    Replies:
    2
    Views:
    8,016
    keithb
    Jun 7, 2006
  2. py
    Replies:
    0
    Views:
    671
  3. Replies:
    2
    Views:
    632
    Thomas Matthews
    Feb 27, 2007
  4. Sreejith K
    Replies:
    24
    Views:
    1,025
    Sreejith K
    Mar 24, 2009
  5. Alex Dowad
    Replies:
    4
    Views:
    272
    Michel Demazure
    May 1, 2010
Loading...

Share This Page