Mutable strings

G

Gordon Airport

Has anyone suggested introducing a mutable string type (yes, of course)
and distinguishing them from standard strings by the quote type - single
or double? As far as I know ' and " are currently interchangeable in all
circumstances (as long as they're paired) so there's no overloading to
muddy the language. Of course there could be some interesting problems
with current code that doesn't make a distinction, but it would be dead
easy to fix with a search-and-replace. And which would be the default
return type for functions returning strings...
It looks like there are ways of handling this by digging around in the
modules for more basic types, but it would be much nicer to have it
available at 'user level'.
 
A

Andy Jewell

Has anyone suggested introducing a mutable string type (yes, of course)
and distinguishing them from standard strings by the quote type - single
or double? As far as I know ' and " are currently interchangeable in all
circumstances (as long as they're paired) so there's no overloading to
muddy the language. Of course there could be some interesting problems
with current code that doesn't make a distinction, but it would be dead
easy to fix with a search-and-replace. And which would be the default
return type for functions returning strings...
It looks like there are ways of handling this by digging around in the
modules for more basic types, but it would be much nicer to have it
available at 'user level'.


Mutable strings are one thing that I missed, initially, when I first started
using Python. After a while, as the "Pythonic" way of doing things sank in,
I realised that Python doesn't *need* mutable strings.

Python strings (and integers and floats) are all immutable for a very good
reason: dictionaries can't reliably use mutable objects as keys. At first,
this seemed rather like "the tail wagging the dog"... however, once I fully
understood the % (percent) string operator, and the ability to efficiently
convert strings into lists and back, my anxiety went away. These cover most
usage of strings that might convince you you need mutability.

As for the suggestion that the kind of quote used should determine whether or
not a string is mutable, I sort of /half/ agree. On one hand, making (say)
the apostrophe mean mutable and the double quote mean immutable would break
thousands of existing applications - for end users, "a simple search and
replace" is simply not feasable! Furthermore, the meaning of the following
snippet would be subtly (and possibly dangerously) changed:

----8<-----
s1="this is an 'immutable' string"
s2='this is a "mutable" string'

s3=s1.replace("'",'"')+" and "+s2.replace('"','"') # replace quotes with
# apostrophes and vice-versa

d1={s3:(s1,s2)}
----8<-----

Q1) What type will s3 be?
Q2) What happens to s2? As it's mutable, shouldn't it do the replacement
"in-line"?
Q3) Will the assignment of d1 succeed? If it fails, wouldn't that be
confusing?

On the other hand, Python already has this type distinction for raw and
unicode strings (r"..." and u"...", respectively). If it were to be adopted,
I would be ok with an m"..." type of string, which could be barred from being
a dictionary key. This would open up a can of worms wrt the other immutable
types, too: would we end up with:

----8<-----
a=1234567m # a mutable integer
b=1234567.89m # a mutable float
c=123456789012345678901234567890Lm # a mutable long integer
d=123+456jm # a mutable complex number
e=m(1,2,3,4,5,6,"a","b","c") # a mutable tuple !!! :p
----8<-----

This could start a flame-war/heated debate on the scale of the ternary
operator PEP!

Maybe there's a project for you (and a good introduction to a practical
application for new-style Python classes to boot)!

I'm sure people have written this type of thing in the past - and in some
situations, it's bound to be useful, but I think it should be kept as a
separate module, so that you have to *declare* your usage of this /strange/
behaviour to the reader; "explicit is better than implicit".

Remember, Mohammed had to go *to* the mountain, not the other way round!

hth,
-andyj
 
P

Peter Hansen

Gordon said:
Has anyone suggested introducing a mutable string type (yes, of course)
and distinguishing them from standard strings by the quote type - single
or double? As far as I know ' and " are currently interchangeable in all
circumstances (as long as they're paired) so there's no overloading to
muddy the language. Of course there could be some interesting problems
with current code that doesn't make a distinction,
but it would be dead easy to fix with a search-and-replace.
^^^^^^^^^^^^^^^^^^^^^^^^^
No, it definitely would not. You would also have to account for
embedded quotation marks that are not escaped already, and I'm
certain there are other complications.

It might be worth your writing a PEP, however, if only so that the
idea could be killed and buried for good. ;-)

-Peter
 
G

Gordon Airport

Peter Hansen wrote:

snip good points that I suspect could be handled with a fairly simple
regex
It might be worth your writing a PEP, however, if only so that the
idea could be killed and buried for good. ;-)

Yeah, I didn't see one already but I kind of expected this response.
Still, it didn't put a stake in the heart of the ternary operator
issue.
 
G

Gordon Airport

Andy said:
Mutable strings are one thing that I missed, initially, when I first started
using Python. After a while, as the "Pythonic" way of doing things sank in,
I realised that Python doesn't *need* mutable strings.

Well...it doesn't /need/ the simple expressions that were given to alot
of things.
Python strings (and integers and floats) are all immutable for a very good
reason: dictionaries can't reliably use mutable objects as keys.

And I'm not suggesting doing away with immutable strings.
At first,
this seemed rather like "the tail wagging the dog"... however, once I fully
understood the % (percent) string operator, and the ability to efficiently
convert strings into lists and back, my anxiety went away. These cover most
usage of strings that might convince you you need mutability.

Yeah, you /can/ do everything, it's a question of clarity. You see how
often ' '.join( blah ) is the answer to people's questions here, it's
not obvious and it looks like a hack, IMO. Plus you can't do
somestring = '%s %s %s' % [ 'nine', 'bladed', 'sword' ]
The extra steps in list(somestring) ... ''.join( somestring ) are what
could be removed I guess.
As for the suggestion that the kind of quote used should determine whether or
not a string is mutable, I sort of /half/ agree. On one hand, making (say)
the apostrophe mean mutable and the double quote mean immutable would break
thousands of existing applications - for end users, "a simple search and
replace" is simply not feasable!

I'm less sure about that now, but the important point is that you would
know that all old string delimiters would be changed to the immutable
one. I'll try to come up with a regex.

Furthermore, the meaning of the following
snippet would be subtly (and possibly dangerously) changed:

----8<-----
s1="this is an 'immutable' string"
s2='this is a "mutable" string'

s3=s1.replace("'",'"')+" and "+s2.replace('"','"') # replace quotes with
# apostrophes and vice-versa

>
d1={s3:(s1,s2)}
----8<-----

Q1) What type will s3 be?
Q2) What happens to s2? As it's mutable, shouldn't it do the replacement
"in-line"?
Q3) Will the assignment of d1 succeed? If it fails, wouldn't that be
confusing?

I think these problems can be avoided if you just escape both symbols
within both types of string. This complicates the code conversion, of
course.
On the other hand, Python already has this type distinction for raw and
unicode strings (r"..." and u"...", respectively). If it were to be adopted,
I would be ok with an m"..." type of string, which could be barred from being
a dictionary key. This would open up a can of worms wrt the other immutable
types, too: would we end up with:

----8<-----
a=1234567m # a mutable integer
b=1234567.89m # a mutable float
c=123456789012345678901234567890Lm # a mutable long integer
d=123+456jm # a mutable complex number
e=m(1,2,3,4,5,6,"a","b","c") # a mutable tuple !!! :p
----8<-----

I don't understand what a mutable numeric type would be. I just want a
string type that I can directly treat as an array of characters; numeric
types aren't indexable.
This could start a flame-war/heated debate on the scale of the ternary
operator PEP!

Viva la ?:! ;-)
Maybe there's a project for you (and a good introduction to a practical
application for new-style Python classes to boot)!

I'm sure people have written this type of thing in the past - and in some
situations, it's bound to be useful, but I think it should be kept as a
separate module, so that you have to *declare* your usage of this /strange/
behaviour to the reader; "explicit is better than implicit".

Think of it as a symetry with the mutable and immutable list types we
already have. It is kind of strange, but we learn their applications and
deal with it. What's the balance of what shows up in code? I suspect
that in gross terms there's more (mutable) list use than (immutable)
tuple; mutable strings would have their place likewise.
 
D

Dennis Lee Bieber

Gordon Airport fed this fish to the penguins on Sunday 21 September
2003 02:10 pm:

Yeah, you /can/ do everything, it's a question of clarity. You see how
often ' '.join( blah ) is the answer to people's questions here, it's

Prior to the creation of string methods, you'd have done

import string

.... string.join(blah, ' ')

not obvious and it looks like a hack, IMO. Plus you can't do
somestring = '%s %s %s' % [ 'nine', 'bladed', 'sword' ]

If you know both sides have equal numbers of terms (the %s matches the
number of entries in the list) you /can/ do a minor modification to
that line:

somestring = "%s %s %s" % tuple(["nine", "bladed", "sword"])

Of course, you could also create a dictionary and store those as
attributes (though to my mind, you have a sword with one modifier
"nine-bladed"; as is it could be interpreted to mean nine
bladed-sword(s) -- though all swords are bladed...).
'nine bladed Sword'



--
 
P

Peter Hansen

Gordon said:
Peter Hansen wrote:

snip good points that I suspect could be handled with a fairly simple
regex

I'd argue the point said:
Yeah, I didn't see one already but I kind of expected this response.
Still, it didn't put a stake in the heart of the ternary operator
issue.

Apparently it served its purpose quite well. The main problem before
the PEP and vote was that there was no PEP to point to when somebody
asked about it, so you could say "asked and answered... will not happen".

Now there is, and the few times the issue has come up since, someone
has fairly quickly pointed to the PEP each time, avoiding lengthier
discussion.

-Peter
 
H

Hans-Joachim Widmaier

Andy Jewell said:
Mutable strings are one thing that I missed, initially, when I first star
ted
using Python. After a while, as the "Pythonic" way of doing things sank
in,
I realised that Python doesn't *need* mutable strings.

Mutable strings come to *my* mind whenever I have to play with huge
binary data. Working with tens of megabytes is inherently somewhat
slow.
Python strings (and integers and floats) are all immutable for a very goo
d
reason: dictionaries can't reliably use mutable objects as keys.

All understood. But then, I don't want to use my 32-MB binary blob as
a key.
however, once I fully
understood the % (percent) string operator, and the ability to efficiently
convert strings into lists and back, my anxiety went away. These cover
most usage of strings that might convince you you need mutability.

Converting said blob 'efficiently' to a list is something that I
certainly would not call 'efficiently' - if not for the conversion
itself, then for the memory consumption as list.

I don't think strings are immutable because they ought to be that way
(e.g. some CS guru teaches that "mutable strings are the root of all
evil"). They're immutable because they allow them to be used as
dictionary keys. And it was found that this doesn't affect the
usefulness of the language too much.

Still, I can see a use for mutable strings. Or better, mutable binary
data, made up of bytes. (where 'byte' is the smallest individually
addressable memory unit blabla, ... you get the meaning. Just to not
invite nit-pickers on that term.)
"explicit is better than implicit".

Yes, definitely: Let there be another type.

Byte-twiddlingly yours,
Hans-J.
 
R

Rob Tillotson

Still, I can see a use for mutable strings. Or better, mutable binary
data, made up of bytes. (where 'byte' is the smallest individually
addressable memory unit blabla, ... you get the meaning. Just to not
invite nit-pickers on that term.)


Yes, definitely: Let there be another type.

There already is one: array. Mutable blocks of bytes (or shorts,
longs, floats, etc.), usable in many places where you might otherwise
use a string (struct.unpack, writing to a file, etc.). It is not
quite a mutable string, but it does fit the bill for manipulating raw
bytes. For example, off the top of my head:
>>> import array
>>> a = array.array('B','abcdefg')
>>> a array('B', [97, 98, 99, 100, 101, 102, 103])
>>> a[2:4] = array.array('B','12345')
>>> a array('B', [97, 98, 49, 50, 51, 52, 53, 101, 102, 103])
>>> a.tostring()
'ab12345efg'

For times when you really need a mutable string, there is always
UserString.MutableString (not quite sure what version this first
appeared in) -- it isn't terribly efficient since it uses a regular
string internally to hold the data, but it gets the job done and if
you really need something faster it would be a fairly simple exercise
to rewrite it using an array instead.

--Rob
 
A

Alex Martelli

Hans-Joachim Widmaier wrote:
...
Mutable strings come to *my* mind whenever I have to play with huge
binary data. Working with tens of megabytes is inherently somewhat
slow.

But mutable strings are not the best place to keep "huge binary
data". Lists of smaller blocks, arrays of bytes, and lists of
arrays can be much more appropriate data structures.

All understood. But then, I don't want to use my 32-MB binary blob as
a key.

Since you don't in fact need to use it in any of the ways typically
applicable only to strings, it doesn't need to be a string.

Converting said blob 'efficiently' to a list is something that I
certainly would not call 'efficiently' - if not for the conversion
itself, then for the memory consumption as list.

A typical case might be one where the blob is, e.g., in fact made
up of 65K sectors of 512 bytes each. In this case, the extra memory
consumption due to keeping the blob in memory as a list of 65K small
strings rather than one big string is, I would guess, about 1%. So,
who cares? And similarly if the "substrings" are of different sizes,
just as long as you only have a few tens of thousands of such
substrings. It's quite unusual that the "intrinsic structure" of
the blob is in fact one big undifferentiated 32MB thingy -- when it
is, you're unlikely to need it in memory, or if you do you're
unlikely to be able to apply any processing mutation to it sensibly;
and for those unusual and unlikely cases, arrays of bytes are often
just fine (after all, C has nothing BUT arrays of bytes [or of other
fixed entities], yet it's quite suitable for some such processing).

I don't think strings are immutable because they ought to be that way
(e.g. some CS guru teaches that "mutable strings are the root of all
evil"). They're immutable because they allow them to be used as
dictionary keys. And it was found that this doesn't affect the
usefulness of the language too much.

Wrong. Consider Java, even back from the very first version: it had
no dictionaries on which string might be keys, yet it still decided
to make its strings immutable. This should make it obvious that the
interest of using keys as dict keys cannot possibly be the sole
motivation for the decision to make strings immutable in a language.
Rather, the deeper motivation is connected to wanting strings to be
ATOMIC, ELEMENTARY types, just like numbers; and to lots of useful
practical returns of that choice. All you lose is the "ability" to
"confuse" (type-pun) between strings and arrays of bytes in many
situations, but that's an ability best lost in many cases. It's not
an issue of "evil" -- a close-to-the-hardware low-level language
like C has excellent reasons to choose a different, close-to-HW
semantics -- but in a higher-level language I think Python's and
Java's choice to have strings immutable works better than (e.g.)
Perl's and Ruby's to have them mutable.

Still, I can see a use for mutable strings. Or better, mutable binary
data, made up of bytes. (where 'byte' is the smallest individually
addressable memory unit blabla, ... you get the meaning. Just to not
invite nit-pickers on that term.)

Just "import array" and you have your "mutable binary data made up
of bytes". So, what's the problem? Type-punning between THAT type,
and strings, is just not all that useful.

Yes, definitely: Let there be another type.

But, there IS one! So, hat's wrong with it...?!


Alex
 
L

logistix at cathoderaymission.net

Mutable strings come to *my* mind whenever I have to play with huge
binary data. Working with tens of megabytes is inherently somewhat
slow.

import array
x = arrray.array('c')

Pretty much creates a mutable string for these cases, although the
interface is a little different.
 
A

Alex Martelli

People seem to love to have literals for things. Otherwise, they feel
that a type is second-class.

Sure. I have no problem deeming "mutable strings" (array of bytes)
to be "second-class" in some vague sense, since their use is so rare
and the need for literals of that type even rarer; lacking literals for,
e.g., sets.Set "troubles" me far more;-).

I do keep daydreaming of some "user-defined semiliteral syntax"
such as, e.g. <identifier>{<balanced-parentheses tokens>} to
result in a call to (e.g.) <identifier>.__literal__ with a list (or other
sequence) of tokens as the argument, returning whatever that
call returns. But perhaps it isn't that good an idea after all (it
does imply the __literal__ classmethod or staticmethod doing
some sort of runtime compilation and execution of those tokens,
and opens the doors to the risk of some seriously nonPythonic
syntax for such "literals-initializers").


Alex
 
J

John Roth

Look at PEP's 296 and 298.

John Roth

Gordon Airport said:
Peter Hansen wrote:

snip good points that I suspect could be handled with a fairly simple
regex


Yeah, I didn't see one already but I kind of expected this response.
Still, it didn't put a stake in the heart of the ternary operator
issue.
 
G

Gordon Airport

Peter said:
I'd argue the point, but I guess until you try it, we'll never know. <wink>

Okay, I've tried it and I'll chalk it up to my inexperience with regular
expressions and sed, but I don't have anyhting to show. The general
strategy , though, is to make several passes; first escape all inner
strings, then convert all outer string delimiters (now the only ones not
escaped) to the immutable symbol. I feel like I'll wake up at 2 a.m.
with the answer, but I'll post now anyway.
(Assuming you /can/ say "every instance of A between B's" in regex...you
could always do it with a python script)

Apparently it served its purpose quite well. The main problem before
the PEP and vote was that there was no PEP to point to when somebody
asked about it, so you could say "asked and answered... will not happen".

Now there is, and the few times the issue has come up since, someone
has fairly quickly pointed to the PEP each time, avoiding lengthier
discussion.

-Peter

Fair enough. Maybe I will submit a PEP for this, I've never looked into
what's involved.
 
G

Gordon Airport

Dennis said:
Prior to the creation of string methods, you'd have done

import string

... string.join(blah, ' ')

Yes, it looks even worse that way. I guess that it's just rare to use a
literal in the code as an object...I'm having trouble thinking of other
situations where you use the ability, but I won't pretend to be an
expert in the language.
not obvious and it looks like a hack, IMO. Plus you can't do
somestring = '%s %s %s' % [ 'nine', 'bladed', 'sword' ]


If you know both sides have equal numbers of terms (the %s matches the
number of entries in the list) you /can/ do a minor modification to
that line:

somestring = "%s %s %s" % tuple(["nine", "bladed", "sword"])

I just found it strange that you couldn't do it directly without
'casting'...Probably doesn't come up much anyway. Now that I think about
it it's an assignment so it's not really relevant to the discussion of
mutable strings.
Of course, you could also create a dictionary and store those as
attributes (though to my mind, you have a sword with one modifier
"nine-bladed"; as is it could be interpreted to mean nine
bladed-sword(s) -- though all swords are bladed...).



{'attribute': 'bladed', 'modifier': 'nine', 'type': 'Sword'}


'nine bladed Sword'

All very handy, but I don't see how it could be done better with mutable
strings. I need to come up with some examples of applications.
 
A

Andrew Dalke

Rob Tillotson:
There already is one: array. Mutable blocks of bytes (or shorts,
longs, floats, etc.), usable in many places where you might otherwise
use a string (struct.unpack, writing to a file, etc.).

Even regular expressions
import array, re
t = "When in the course of human events"
s = array.array("c", t)
pat = re.compile(r"([aeiou]{2,})")
m = pat.search(s)
m.group(1) array('c', 'ou')

Andrew
(e-mail address removed)
 
H

Hans-Joachim Widmaier

Am Mon, 22 Sep 2003 15:26:56 +0200 schrieb Alex Martelli:
Sure. I have no problem deeming "mutable strings" (array of bytes)
to be "second-class" in some vague sense, since their use is so rare
and the need for literals of that type even rarer; lacking literals for,
e.g., sets.Set "troubles" me far more;-).

I do keep daydreaming of some "user-defined semiliteral syntax"
such as, e.g. <identifier>{<balanced-parentheses tokens>} to
result in a call to (e.g.) <identifier>.__literal__ with a list (or other
sequence) of tokens as the argument, returning whatever that
call returns. But perhaps it isn't that good an idea after all (it
does imply the __literal__ classmethod or staticmethod doing
some sort of runtime compilation and execution of those tokens,
and opens the doors to the risk of some seriously nonPythonic
syntax for such "literals-initializers").

[Sorry for replying so late]

After writing what I did, I kept thinking about the issue. I finally
realized the same thing - it wasn't so much the missing datatype, as you
can use array but the missing literals. Having to construct constant
values at runtime doesn't strike me as nice. I'm coming from the embedded
world (hmm, that's not entirely true, as I'm not leaving it), and doing
something efficiently is a big concern there, so doing something at
runtime what you could do upfront is considered a bad thing.

Python doesn't have and cannot be the perfect language for just
everything. But even without "mutable strings", why does it have to be so
handy even for manipulating binaries then?

I'll get over it and give array a try.

Thanks to Jeff for finding the gist of it and Alex for his analysis.
It helps.

Hans-J.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top