Why does StringIO discard its initial value?

Leif K-Brooks · Apr 10, 2005

When StringIO gets an initial value passed to its constructor, it seems
to discard it after the first call to .write(). For instance:
'barbaz'

The obvious workaround is to call buffer.write() with the initial value
instead of passing it to StringIO's constructor, so this issue doesn't
bother me very much, but I'm still curious about it. Is this the
expected behavior, and why it isn't mentioned in the docs if so?

jepler · Apr 10, 2005

Maybe this short interactive session can give you an idea why.
3

StringIO seems to operate like a file opened with "r+" (If I've got my modes
right): it is opened for reading and writing, and positioned at the beginning.
In my example, the write of 3 bytes overwrites the first 3 bytes of the file
and leaves the rest intact. In your example your first write overwrote the
whole initial contents of the file, so you couldn't notice this effect.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFCWWqGJd01MZaTXX0RAvSuAJ9lSChyzOej2TkqLuoaWpzxopOUPACfQv8D
lWmB6rReTFep5sYMwanqF7I=
=t4F9
-----END PGP SIGNATURE-----

Raymond Hettinger · Apr 11, 2005

[Leif K-Brooks]

The obvious workaround is to call buffer.write() with the initial value
instead of passing it to StringIO's constructor,

More than just a workaround, it is the preferred approach.
That makes is easier to switch to cStringIO where initialized objects are
read-only.

Is this the
expected behavior
Yes.

, and why it isn't mentioned in the docs if so?

Per your request, the docs have been updated.

Raymond Hettinger

David Fraser · Apr 12, 2005

Raymond said:
[Leif K-Brooks]

The obvious workaround is to call buffer.write() with the initial value
instead of passing it to StringIO's constructor,

Click to expand...

More than just a workaround, it is the preferred approach.
That makes is easier to switch to cStringIO where initialized objects are
read-only.

Others may find this helpful ; it's a pure Python wrapper for cStringIO
that makes it behave like StringIO in not having initialized objects
readonly. Would it be an idea to extend cStringIO like this in the
standard library? It shouldn't lose performance if used like a standard
cStringIO, but it prevents frustration

David

class StringIO:
def __init__(self, buf = ''):
if not isinstance(buf, (str, unicode)):
buf = str(buf)
self.len = len(buf)
self.buf = cStringIO.StringIO()
self.buf.write(buf)
self.buf.seek(0)
self.pos = 0
self.closed = 0

def __iter__(self):
return self

def next(self):
if self.closed:
raise StopIteration
r = self.readline()
if not r:
raise StopIteration
return r

def close(self):
"""Free the memory buffer.
"""
if not self.closed:
self.closed = 1
del self.buf, self.pos

def isatty(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
return False

def seek(self, pos, mode = 0):
if self.closed:
raise ValueError, "I/O operation on closed file"
self.buf.seek(pos, mode)
self.pos = self.buf.tell()

def tell(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
return self.pos

def read(self, n = None):
if self.closed:
raise ValueError, "I/O operation on closed file"
if n == None:
r = self.buf.read()
else:
r = self.buf.read(n)
self.pos = self.buf.tell()
return r

def readline(self, length=None):
if self.closed:
raise ValueError, "I/O operation on closed file"
if length is not None:
r = self.buf.readline(length)
else:
r = self.buf.readline(length)
self.pos = self.buf.tell()
return r

def readlines(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
lines = self.buf.readlines()
self.pos = self.buf.tell()
return lines

def truncate(self, size=None):
if self.closed:
raise ValueError, "I/O operation on closed file"
self.buf.truncate(size)
self.pos = self.buf.tell()
self.buf.seek(0, 2)
self.len = self.buf.tell()
self.buf.seek(self.pos)

def write(self, s):
if self.closed:
raise ValueError, "I/O operation on closed file"
origpos = self.buf.tell()
self.buf.write(s)
self.pos = self.buf.tell()
if origpos + len(s) > self.len:
self.buf.seek(0, 2)
self.len = self.buf.tell()
self.buf.seek(self.pos)

def writelines(self, lines):
if self.closed:
raise ValueError, "I/O operation on closed file"
self.buf.writelines(lines)
self.pos = self.buf.tell()
self.buf.seek(0, 2)
self.len = self.buf.tell()
self.buf.seek(self.pos)

def flush(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
self.buf.flush()

def getvalue(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
return self.buf.getvalue()

Raymond Hettinger · Apr 13, 2005

[David Fraser]

Others may find this helpful ; it's a pure Python wrapper for cStringIO
that makes it behave like StringIO in not having initialized objects
readonly. Would it be an idea to extend cStringIO like this in the
standard library? It shouldn't lose performance if used like a standard
cStringIO, but it prevents frustration

IMO, that would be a step backwards. Initializing the object and then
writing to it is not a good practice. The cStringIOAPI needs to be as
file-like as possible. With files, we create an emtpy object and then
starting writing (the append mode for existing files is a different story).
Good code ought to maintain that parallelism so that it is easier to
substitute a real file for a writeable cStringIO object.

This whole thread (except for the documentation issue which has been
fixed) is about fighting the API rather than letting it be a guide to good
code.

If there were something wrong with the API, Guido would have long
since fired up the time machine and changed the timeline so that all
would be as right as rain ;-)

Raymond Hettinger

David Fraser · Apr 15, 2005

Raymond said:
[David Fraser]

Others may find this helpful ; it's a pure Python wrapper for cStringIO
that makes it behave like StringIO in not having initialized objects
readonly. Would it be an idea to extend cStringIO like this in the
standard library? It shouldn't lose performance if used like a standard
cStringIO, but it prevents frustration

Click to expand...

IMO, that would be a step backwards. Initializing the object and then
writing to it is not a good practice. The cStringIOAPI needs to be as
file-like as possible. With files, we create an emtpy object and then
starting writing (the append mode for existing files is a different story).
Good code ought to maintain that parallelism so that it is easier to
substitute a real file for a writeable cStringIO object.

This whole thread (except for the documentation issue which has been
fixed) is about fighting the API rather than letting it be a guide to good
code.

If there were something wrong with the API, Guido would have long
since fired up the time machine and changed the timeline so that all
would be as right as rain ;-)

But surely the whole point of files is that you can do more than either
creating a new file or appending to an existing one (seek, write?)

The reason I wrote this was to enable manipulating zip files inside zip
files, in memory. This is on translate.sourceforge.net - I wanted to
manipulate Mozilla XPI files, and replace file contents etc. within the
XPI. The XPI files are zip format that contains jars inside (also zip
format). I needed to alter the contents of files within the inner zip files.

The zip classes in Python can handle adding files but not replacing
them. The cStringIO is as described above.

So I created extensions to the zipfile.ZipFile class that allow it to
delete existing files, and add them again with new contents (thus
replacing them).

And I created wStringIO so that I could do this all inplace on the
existing zip files.

This all required some extra hacking because of the dual-layer zip files.

But all this as far as I see would have been really tricky using the
existing zipfile and cStringIO classes, which both assume (conceptually)
that files are either readable or new or merely appendable (for zipfile).

The problem for me was not that cStringIO classes are too similar to
files, it was that they are too dissimilar. All of this would work with
either StringIO (but too slow) or real files (but I needed it in memory
because of the zipfiles being inside other zip files).

Am I missing something?

David

Why doesn't the function get called?	1	Nov 20, 2023
why does dead code costs time?	19	Dec 5, 2012
Why doesn't threading.join() return a value?	15	Sep 2, 2011
Syntax for resetting radio button to initial value?	14	Oct 16, 2011
Verilog: Why the "maxcount" cannot keep its max value but changeswith the "count"?	4	May 31, 2014
Embedded Python : Why does thread lock here?	1	Jul 7, 2009
Why has __new__ been implemented as a static method?	7	May 3, 2014
Why defaultdict?	6	Jul 2, 2010

Why does StringIO discard its initial value?

Leif K-Brooks

jepler

Raymond Hettinger

David Fraser

Raymond Hettinger

David Fraser

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads