Pickled text file causing ValueError (dos/unix issue)

Discussion in 'Python' started by Aki Niimura, Jan 14, 2005.

  1. Aki Niimura

    Aki Niimura Guest

    Hello everyone,

    I started to use pickle to store the latest user settings for the tool
    I wrote. It writes out a pickled text file when it terminates and it
    restores the settings when it starts.

    It worked very nicely.

    However, I got a ValueError when I started the tool from Unix when I
    previously used the tool from Windows.

    File "/usr/local/lib/python2.3/pickle.py", line 980, in load_string
    raise ValueError, "insecure string pickle"
    ValueError: insecure string pickle

    If I do 'dos2unix <my.cfg> <my.cfg>' to convert the file, then
    everything
    becomes fine.

    I found in the Python release note saying ...
    "pickle: Now raises ValueError when an invalid pickle that contains a
    non-string repr where a string repr was expected. This behavior matches
    cPickle."

    I guess DOS text format is creating this problem.
    My question is "Is there any elegant way to deal with this?".

    I certainly can catch ValueError and run 'dos2unix' explicitly.
    But I don't like such crude solution.
    Any suggestions would be highly appreciated.

    Best regards,
    Aki Niimura
    Aki Niimura, Jan 14, 2005
    #1
    1. Advertising

  2. Aki Niimura

    Paul Rubin Guest

    Open the file on windows for writing with "wb" mode, the b is for binary.
    Paul Rubin, Jan 14, 2005
    #2
    1. Advertising

  3. Aki Niimura

    Tim Peters Guest

    [Aki Niimura]
    > I started to use pickle to store the latest user settings for the tool
    > I wrote. It writes out a pickled text file when it terminates and it
    > restores the settings when it starts.

    ....
    > I guess DOS text format is creating this problem.


    Yes.

    > My question is "Is there any elegant way to deal with this?".


    Yes: regardless of platform, always open files used for pickles in
    binary mode. That is, pass "rb" to open() when reading a pickle file,
    and "wb" to open() when writing a pickle file. Then your pickle files
    will work unchanged on all platforms. The same is true of files
    containing binary data of any kind (and despite that pickle protocol 0
    was called "text mode" for years, it's still binary data).
    Tim Peters, Jan 14, 2005
    #3
  4. Why 'r' mode anyway? (was: Re: Pickled text file causing ValueError(dos/unix issue))

    Tim Peters wrote:

    > Yes: regardless of platform, always open files used for pickles in
    > binary mode. That is, pass "rb" to open() when reading a pickle file,
    > and "wb" to open() when writing a pickle file. Then your pickle files
    > will work unchanged on all platforms. The same is true of files
    > containing binary data of any kind (and despite that pickle protocol 0
    > was called "text mode" for years, it's still binary data).


    I've been wondering why there even is the choice between binary mode
    and text mode. Why can't we just do away with the 'text mode' ?
    What does it do, anyways? At least, if it does something, I'm sure
    that it isn't something that can be done in Python itself if
    really required to do so...

    --Irmen
    Irmen de Jong, Jan 14, 2005
    #4
  5. Aki Niimura

    Tim Peters Guest

    Re: Why 'r' mode anyway? (was: Re: Pickled text file causingValueError (dos/unix issue))

    [Irmen de Jong]
    > I've been wondering why there even is the choice between binary mode
    > and text mode. Why can't we just do away with the 'text mode' ?
    > What does it do, anyways? At least, if it does something, I'm sure
    > that it isn't something that can be done in Python itself if
    > really required to do so...


    It's not Python's decision, it's the operating system's. Whether
    there's an actual difference between text mode and binary mode is up
    to the operating system, and, if there is an actual difference, every
    detail about what the difference(s) consists of is also up to the
    operating system. That differences may exist is reflected in the C
    standard, and the rules for text-mode files are more restrictive than
    most people would believe.

    On Unixish systems, there's no difference. On Windows boxes, there
    are conceptually small differences with huge consequences, and the
    distinction appears to be kept just for backward-compatibility
    reasons. On some other systems, text and binary files are entirely
    different kinds of beasts.

    If Python didn't offer text mode then it would be clumsy at best to
    use Python to write ordinary human-readable text files in the format
    that native software on Windows, and Mac Classic, and VAX (and ...)
    expects (and the native format for text mode differs across all of
    them). If Python didn't offer binary mode then it wouldn't be
    possible to use Python to process data in binary files on Windows and
    Mac Classic and VAX (and ...). If Python used its own
    platform-independent file format, then it would end up creating files
    that other programs wouldn't be able to deal with.

    Live with it <wink>.
    Tim Peters, Jan 14, 2005
    #5
  6. Aki Niimura

    Serge Orlov Guest

    Re: Why 'r' mode anyway? (was: Re: Pickled text file causing ValueError (dos/unix issue))

    Irmen de Jong wrote:
    > Tim Peters wrote:
    >
    > > Yes: regardless of platform, always open files used for pickles in
    > > binary mode. That is, pass "rb" to open() when reading a pickle

    file,
    > > and "wb" to open() when writing a pickle file. Then your pickle

    files
    > > will work unchanged on all platforms. The same is true of files
    > > containing binary data of any kind (and despite that pickle

    protocol 0
    > > was called "text mode" for years, it's still binary data).

    >
    > I've been wondering why there even is the choice between binary mode
    > and text mode. Why can't we just do away with the 'text mode' ?


    We can't because characters and bytes are not the same things. But I
    believe what you're really complaining about is that "t" mode sometimes
    mysteriously corrupts data if processed by the code that expects binary
    files. In Python 3.0 it will be fixed because file.read will have to
    return different objects: bytes for "b" mode, str for "t" mode. It
    would be great if file type was split into binfile and textfile,
    removing need for cryptic "b" and "t" modes but I'm afraid that's too
    much of a change even for Python 3.0

    Serge.
    Serge Orlov, Jan 14, 2005
    #6
  7. Re: Why 'r' mode anyway?

    Tim Peters wrote:
    > That differences may exist is reflected in the C
    > standard, and the rules for text-mode files are more restrictive than
    > most people would believe.


    Apparently. Because I know only about the Unix <-> Windows difference
    (windows converts \r\n <--> \n when using 'r' mode, right).
    So it's in the line endings.

    Is there more obscure stuff going on on the other systems you
    mentioned (Mac OS, VAX) ?

    (That means that the bug in Simplehttpserver that my patch
    839496 addressed, also occured on those systems? Or that
    the patch may be incorrect after all??)

    While your argument about why Python doesn't use its own platform-
    independent file format is sound ofcourse, I find it often a nuisance
    that platform specific things tricle trough into Python itself and
    ultimately in the programs you write. I sometimes feel that some
    parts of Python expose the underlying C/os implementation
    a bit too much. Python never claimed write once run anywhere (as
    that other language does) but it would have been nice nevertheless ;-)
    In practice it's just not possible I guess.

    Thanks,
    --Irmen
    Irmen de Jong, Jan 14, 2005
    #7
  8. Aki Niimura

    John Machin Guest

    On Fri, 14 Jan 2005 09:12:49 -0500, Tim Peters <>
    wrote:

    >[Aki Niimura]
    >> I started to use pickle to store the latest user settings for the tool
    >> I wrote. It writes out a pickled text file when it terminates and it
    >> restores the settings when it starts.

    >...
    >> I guess DOS text format is creating this problem.

    >
    >Yes.
    >
    >> My question is "Is there any elegant way to deal with this?".

    >
    >Yes: regardless of platform, always open files used for pickles in
    >binary mode. That is, pass "rb" to open() when reading a pickle file,
    >and "wb" to open() when writing a pickle file. Then your pickle files
    >will work unchanged on all platforms. The same is true of files
    >containing binary data of any kind (and despite that pickle protocol 0
    >was called "text mode" for years, it's still binary data).


    Tim, the manual as of version 2.4 does _not_ mention the need to use
    'b' on OSes where it makes a difference, not even in the examples at
    the end of the chapter. Further, it still refers to protocol 0 as
    'text' in several places. There is also a reference to protocol 0
    files being viewable in a text editor.

    In other words, enough to lead even the most careful Reader of TFM up
    the garden path :)

    Cheers,
    John
    John Machin, Jan 14, 2005
    #8
  9. Aki Niimura

    Tim Peters Guest

    [Tim Peters]
    >>Yes: regardless of platform, always open files used for pickles
    >> in binary mode. ...


    [John Machin]
    > Tim, the manual as of version 2.4 does _not_ mention the need
    > to use 'b' on OSes where it makes a difference, not even in the
    > examples at the end of the chapter. Further, it still refers to
    > protocol 0 as 'text' in several places. There is also a reference to
    > protocol 0 files being viewable in a text editor.
    >
    > In other words, enough to lead even the most careful Reader of
    > TFM up the garden path :)


    Take the next step: submit a patch with corrected text. I'm not paid
    to work on the Python docs either <0.5 wink>. (BTW, protocol 0 files
    are viewable in a text editor regardless, although the line ends may
    "look funny")
    Tim Peters, Jan 14, 2005
    #9
  10. Aki Niimura

    Tim Peters Guest

    Re: Why 'r' mode anyway?

    [Tim Peters]
    >> That differences may exist is reflected in the C
    >> standard, and the rules for text-mode files are more restrictive
    >> than most people would believe.


    [Irmen de Jong]
    > Apparently. Because I know only about the Unix <-> Windows
    > difference (windows converts \r\n <--> \n when using 'r' mode,
    > right). So it's in the line endings.


    That's one difference. The worse difference is that, in text mode on
    Windows, the first instance of chr(26) in a file is taken as meaning
    "that's the end of the file", no matter how many bytes may follow it.
    That's fine by the C standard, because everything about a text-mode
    file containing a chr(26) character is undefined.

    > Is there more obscure stuff going on on the other systems you
    > mentioned (Mac OS, VAX) ?


    I think on Mac Classic it was *just* line end differences. Native VAX
    has many file formats. "Record-based" file formats used to be very
    popular. There the OS saves meta-information in the file, such as
    each record contains an offset to the start of the next record, and
    may even contain an index structure to support random access to
    records quickly (for example, "a line" may be a record, and "read the
    last line" may go quickly). Read that in binary mode, and you'll be
    reading up the bits in the index and offsets too, etc. IIRC, Unix was
    actually quite novel at the time in insisting that all files were just
    raw byte streams to the OS.

    > (That means that the bug in Simplehttpserver that my patch
    > 839496 addressed, also occured on those systems? Or that
    > the patch may be incorrect after all??)


    Don't know, and (sorry) no time to dig.

    > While your argument about why Python doesn't use its own
    > platform- independent file format is sound of course, I find it often
    > a nuisance that platform specific things tricle trough into Python
    > itself and ultimately in the programs you write. I sometimes feel
    > that some parts of Python expose the underlying C/os
    > implementation a bit too much. Python never claimed write once
    > run anywhere (as that other language does) but it would have
    > been nice nevertheless ;-)
    > In practice it's just not possible I guess.


    It would be difficult at best. Python hides a lot of platform crap,
    but generally where it's reasonably easy to hide. It's not easy to
    hide native file conventions, partly because Python wouldn't play well
    with *other* platform software if it did.

    Remember that Guido worked on ABC before Python, and Python is in
    (small) part a reaction against the extremes of ABC. ABC was 100%
    platform-independent. You could read and write files from ABC.
    However, the only files you could read from ABC were files that were
    written by ABC -- and files written by ABC were essentially unusable
    by other software. Socket semantics were also 100% portable in ABC:
    it didn't have sockets, nor any way to extend the language to add
    them. Etc -- ABC was a self-contained universe. "Plays well with
    others" was a strong motivator for Python's design, and that often
    means playing by others' rules.
    Tim Peters, Jan 15, 2005
    #10
  11. Re: Why 'r' mode anyway?

    Tim> "Plays well with others" was a strong motivator for Python's
    Tim> design, and that often means playing by others' rules. --

    My vote for QOTW... Is it too late to slip it into the Zen of Python?

    Skip
    Skip Montanaro, Jan 15, 2005
    #11
  12. Re: Why 'r' mode anyway?

    In article <>,
    Tim Peters <> wrote:
    .
    .
    .
    >reading up the bits in the index and offsets too, etc. IIRC, Unix was
    >actually quite novel at the time in insisting that all files were just
    >raw byte streams to the OS.

    Not just "novel", but "puzzling" and even "controversial".
    It was far from clear that the Unix way could be successful.
    .
    .
    .
    >but generally where it's reasonably easy to hide. It's not easy to
    >hide native file conventions, partly because Python wouldn't play well
    >with *other* platform software if it did.
    >
    >Remember that Guido worked on ABC before Python, and Python is in
    >(small) part a reaction against the extremes of ABC. ABC was 100%
    >platform-independent. You could read and write files from ABC.
    >However, the only files you could read from ABC were files that were
    >written by ABC -- and files written by ABC were essentially unusable
    >by other software. Socket semantics were also 100% portable in ABC:
    >it didn't have sockets, nor any way to extend the language to add
    >them. Etc -- ABC was a self-contained universe. "Plays well with
    >others" was a strong motivator for Python's design, and that often
    >means playing by others' rules.


    At a slightly different level, that--not playing well enough
    with others--is what held Smalltalk back. Again, a lot of
    this stuff wasn't obvious at the time, even as late as 1990.
    I think we understand better now that languages are secondary,
    in that good developers can be productive with all sorts of
    syntaxes and semantics; as a practical matter, daily struggles
    have to do with the libraries or how the languages access what
    is outside themselves.
    Cameron Laird, Jan 15, 2005
    #12
  13. Aki Niimura

    Nick Coghlan Guest

    Re: Why 'r' mode anyway?

    Skip Montanaro wrote:
    > Tim> "Plays well with others" was a strong motivator for Python's
    > Tim> design, and that often means playing by others' rules. --
    >
    > My vote for QOTW... Is it too late to slip it into the Zen of Python?


    It would certainly fit, and the existing koans don't really cover the concept.

    Its addition also seems fitting in light of the current PEP 246 discussion which
    is *all* about playing well with others :)

    Cheers,
    Nick.

    --
    Nick Coghlan | | Brisbane, Australia
    ---------------------------------------------------------------
    http://boredomandlaziness.skystorm.net
    Nick Coghlan, Jan 15, 2005
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. david wolf
    Replies:
    5
    Views:
    15,161
    Thomas Weidenfeller
    Apr 21, 2006
  2. Dave Moore

    Dos vs Unix style text files

    Dave Moore, Feb 10, 2005, in forum: C++
    Replies:
    8
    Views:
    6,638
    Ron Natalie
    Feb 12, 2005
  3. Money

    DOS text file to unix file

    Money, Nov 23, 2006, in forum: C Programming
    Replies:
    2
    Views:
    275
    Nick Keighley
    Nov 23, 2006
  4. walterbyrd
    Replies:
    13
    Views:
    1,276
    walterbyrd
    May 13, 2009
  5. Robert Wallace

    my own perl "dos->unix"/"unix->dos"

    Robert Wallace, Jan 21, 2004, in forum: Perl Misc
    Replies:
    7
    Views:
    263
    Michele Dondi
    Jan 22, 2004
Loading...

Share This Page