Symbol vs String

S

Sebestyén Gábor

Hi,

Just a dumb question: what is the real difference between { :aKey =>
"aValue" } and { "aKey" => "aValue" } ? I know the first key is a
symbol the latter is a string. I like string keys why should I use
symbols? Why symbols worth to use as keys?
Thanks,

Gábor
 
N

Nikolai Weibull

* Sebestyén Gábor (Mar 16, 2005 21:40):
Just a dumb question: what is the real difference between { :aKey =>
"aValue" } and { "aKey" => "aValue" } ? I know the first key is a
symbol the latter is a string. I like string keys why should I use
symbols? Why symbols worth to use as keys? Thanks,

Always use symbols for situations like these. The reason is that a
symbol is immutable and also that no new string needs to be created for
it if used more than once. Also, using strings as symbols and then
having the string altered will force a rehash of the table. It's all
about memory savings and execution speed,
nikolai
 
E

Eric Hodel

--Apple-Mail-7--444202421
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,

Just a dumb question: what is the real difference between { :aKey =3D>=20=
"aValue" } and { "aKey" =3D> "aValue" } ? I know the first key is a=20
symbol the latter is a string. I like string keys why should I use=20
symbols? Why symbols worth to use as keys?

Symbols take up less memory space (only allocated once for the same=20
Symbol) and have a faster #hash function (#object_id, not computed).

'x' =3D=3D 'x' # =3D> true
'x'.object_id =3D=3D 'x'.object_id # =3D> false

:x =3D=3D :x # =3D> true
:x.object_id =3D=3D :x.object_id # =3D> true

--=20
Eric Hodel - (e-mail address removed) - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

--Apple-Mail-7--444202421
content-type: application/pgp-signature; x-mac-type=70674453;
name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFCOJ8IMypVHHlsnwQRAmGeAJwLyv+ruAK/YOTfbr4qZrO1ffJSMACfXbGE
A5R0hzkC1Km6qqew0Vf+7tg=
=5Ohh
-----END PGP SIGNATURE-----

--Apple-Mail-7--444202421--
 
E

Eric Hodel

--Apple-Mail-8--443674714
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; format=flowed

Also, using strings as symbols and then having the string altered will
force a rehash of the table.

You mean this?

key = 'foo'

hash = {}

hash[key] = 5

key.gsub! /foo/, 'bar'

In this case, hash.rehash does not need to be called because Ruby
copies String hash keys:

hash.keys.first.object_id == key.object_id # => false

Also, String keys are frozen, so you can't modify them:

hash.keys.first.gsub! /foo/, 'bar' # => raises TypeError

--
Eric Hodel - (e-mail address removed) - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

--Apple-Mail-8--443674714
content-type: application/pgp-signature; x-mac-type=70674453;
name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFCOKEXMypVHHlsnwQRAiIFAJ9uhVa8qCno1+gs+9xIC75uErjf5gCeIbFM
CL5sVEygv3vRvIENHLgTr5w=
=G32V
-----END PGP SIGNATURE-----

--Apple-Mail-8--443674714--
 
N

Nikolai Weibull

* Eric Hodel (Mar 16, 2005 22:20):
[basically saying that this isn't so]

OK, so this strengthens the argument for using symbols even further, as
keys will be copied. Thanks for pointing this out,
nikolai
 
R

Robert Klemme

Nikolai Weibull said:
* Sebestyén Gábor (Mar 16, 2005 21:40):

Always use symbols for situations like these. The reason is that a
symbol is immutable and also that no new string needs to be created for
it if used more than once. Also, using strings as symbols and then
having the string altered will force a rehash of the table. It's all
about memory savings and execution speed,

I rather make the distinction on the semantic level: for example, if you
write an initializer for a class that accepts a hash to init any number of
instance fields I'd prefer to use symbols here. Also, if there is only a
certain fixed set of values allowed. I use strings if they are read from
some source and I don't know beforehand, what they might be.

Incidentally it's typical for the key like things to occur rather often,
which fits nicely with the memory and speed savings incurred by symbols.

Kind regards

robert
 
P

Peter C. Verhage

But why do Strings not behave like Symbols? I mean, why aren't all
Strings immutable? Is this because Symbols will never get garbage
collected (to make sure they can be used over and over again) and normal
Strings will? Which might mean that in some cases (lots of text
processing) immutable Strings would fill up memory?

Regards,

Peter
 
N

Nikolai Weibull

* Peter C. Verhage (Mar 16, 2005 23:40):
But why do Strings not behave like Symbols? I mean, why aren't all
Strings immutable? Is this because Symbols will never get garbage
collected (to make sure they can be used over and over again) and
normal Strings will? Which might mean that in some cases (lots of text
processing) immutable Strings would fill up memory?

Oh, no...not immutable vs. mutable strings again...

Well, if strings were immutable, then that would mean that strings could
share contents, and thus immutable strings wouldn't fill up memory. I
have suggested on the ruby-core list that Ruby should provide a second
data structure that acts like a string, namely the _rope_, and that it
be implemented in a way that allows for it to be used for tasks where
immutable "strings" are desired.

A rope is basically a string represented by a tree. Leafs of the tree
point to the subsequences of the whole string. These subsequences can
be shared with other ropes and can be generated lazily, i.e., from IO or
other generators. All that is needed is the length of the subsequence.
Every internal node keeps track of its own size and the size of its left
child. Thus, the offset of a node in the tree is the size of its left
child plus its ancestors. Ropes can be used to represent long strings
efficiently and many operations on ropes are O(1) where they are O(n) on
a string. This is offset by the fact that lookup in a rope is O(lg n)
versus O(1) for a string, but in many cases this isn't a problem.

Anyway, the rope data structure is further described in [1]. Boehm has
actually implemented this in C for his garbage collector, so see that
package for an example implementation (not though that it uses a lot of
C-hacks which makes it undesirable to use as-is). There's also a rope
data structure in STL, but it's limited to only using ropes and strings,
not IO,
nikolai (the rope and piece table lover)

[1] Hans-J Boehm, "Ropes: an Alternative to Strings", Software--Practice
and Experience, vol. 25(12), 1315--1330, Dec. 1995. Available at
http://rubyurl.com/2FRbO.
 
H

Hal Fulton

Peter said:
But why do Strings not behave like Symbols? I mean, why aren't all
Strings immutable? Is this because Symbols will never get garbage
collected (to make sure they can be used over and over again) and normal
Strings will? Which might mean that in some cases (lots of text
processing) immutable Strings would fill up memory?

Some people (such as Guido) dislike mutable strings.
Others (such as Matz, and incidentally me) like them.

Personally, my limited Java experience juggling String and
StringBuffer was enough to convince me that strings should
be mutable.


Hal
 
D

Douglas Livingstone

I like string keys

Why?

Personally I think :symbols are great, makes it much clearer when you
are reading code that you are representing something else, rather than
storing a piece of data. And you can use them without having to define
them as constants before hand. Great :)

Faster to type too.

Douglas
 
J

Jim Weirich

Just a dumb question: what is the real difference between { :aKey =>
"aValue" } and { "aKey" => "aValue" } ? I know the first key is a
symbol the latter is a string. I like string keys why should I use
symbols? Why symbols worth to use as keys?

Use Strings for their content. Use Symbols for their arbitrary uniqueness.
 
S

Sam Roberts

Quoting (e-mail address removed), on Thu, Mar 17, 2005 at 11:08:59AM +0900:
Use Strings for their content. Use Symbols for their arbitrary uniqueness.

I used to do this, but ran into problems.

Symbols are great for things related to ruby becuase the :bar form for symbol
literals accepts the same kind of chars as ruby identifiers. I use them be
preference in interacting with ruby's meta-programming APIs.

They start to fall down outside of this. For example, I tried to use
with mime types:

:text
==>:text
:video
==>:video
:eek:ctet-stream
NameError: undefined local variable or method `stream' for main:Object
from (irb):3
'octet-stream'.intern
==>:"octet-stream"

You CAN use them for things outside of the domain of ruby names, but it gets
painful if the names of those things are arbitarily unique, but have "-"
characters in their name, you first have to create a String!

You can get around this by creating constants:

OCTETSTREAM = 'octet-stream'.intern
TEXT = :text

etc., but that might not fit your API goals very well.

Anyhow, I moved back to using strings instead of symbols. The need to create a
string and intern it for things that are logically symbols but have a "-" in
them was too painful.

That was my experience, anyhow.

Cheers,
Sam
 
H

Hal Fulton

Sam said:
Quoting (e-mail address removed), on Thu, Mar 17, 2005 at 11:08:59AM +0900:



I used to do this, but ran into problems.

Symbols are great for things related to ruby becuase the :bar form for symbol
literals accepts the same kind of chars as ruby identifiers. I use them be
preference in interacting with ruby's meta-programming APIs.

They start to fall down outside of this. For example, I tried to use
with mime types:

:text
==>:text
:video
==>:video
:eek:ctet-stream
NameError: undefined local variable or method `stream' for main:Object
from (irb):3
'octet-stream'.intern
==>:"octet-stream"

You CAN use them for things outside of the domain of ruby names, but it gets
painful if the names of those things are arbitarily unique, but have "-"
characters in their name, you first have to create a String!

You can get around this by creating constants:

OCTETSTREAM = 'octet-stream'.intern
TEXT = :text

etc., but that might not fit your API goals very well.

Anyhow, I moved back to using strings instead of symbols. The need to create a
string and intern it for things that are logically symbols but have a "-" in
them was too painful.

That was my experience, anyhow.

I believe you can do things like :"octet-stream" -- but I grant
that is not much better.


Hal
 
J

Jim Weirich

Quoting (e-mail address removed), on Thu, Mar 17, 2005 at 11:08:59AM +0900:
Use Strings for their content. Use Symbols for their arbitrary
uniqueness.

I used to do this, but ran into problems. [...]
:eek:ctet-stream

NameError: undefined local variable or method `stream' for main:Object
from (irb):3
'octet-stream'.intern
==>:"octet-stream"

Why couldn't you do :eek:ctet_stream ? If your answer is because the dash comes
from outside ruby, then I would suggest that the content ("_" vs "-") is
important ... indicating that you should use strings.
 
F

Florian Gross

Hal said:
Sam said:
[...]
Anyhow, I moved back to using strings instead of symbols. The need to
create a
string and intern it for things that are logically symbols but have a
"-" in
them was too painful.

That was my experience, anyhow.

I believe you can do things like :"octet-stream" -- but I grant
that is not much better.

And there's also the %s(octet-stream) family.
 
S

Sam Roberts

Quoting (e-mail address removed), on Thu, Mar 17, 2005 at 12:02:19PM +0900:
I believe you can do things like :"octet-stream" -- but I grant
that is not much better.

But a little better, I didn't know that, thanks.

Sam
 
S

Sam Roberts

Quoting (e-mail address removed), on Thu, Mar 17, 2005 at 01:36:39PM +0900:
Quoting (e-mail address removed), on Thu, Mar 17, 2005 at 11:08:59AM +0900:
Use Strings for their content. Use Symbols for their arbitrary
uniqueness.

I used to do this, but ran into problems. [...]
:eek:ctet-stream

NameError: undefined local variable or method `stream' for main:Object
from (irb):3
'octet-stream'.intern
==>:"octet-stream"

Why couldn't you do :eek:ctet_stream ? If your answer is because the dash comes
from outside ruby, then I would suggest that the content ("_" vs "-") is
important ... indicating that you should use strings.

Maybe I don't know what you mean by "arbitrarily unique".

"_" vs "-" is no more (or less) important than "a" vs. "z".

Cheers,
Sam
 
J

Jim Weirich

Sam Roberts said:
Quoting (e-mail address removed), on Thu, Mar 17, 2005 at 01:36:39PM +0900:
Quoting (e-mail address removed), on Thu, Mar 17, 2005 at 11:08:59AM +0900:
Use Strings for their content. Use Symbols for their arbitrary
uniqueness.

I used to do this, but ran into problems. [...]
:eek:ctet-stream

NameError: undefined local variable or method `stream' for main:Object
from (irb):3
'octet-stream'.intern
==>:"octet-stream"

Why couldn't you do :eek:ctet_stream ? If your answer is because the dash
comes
from outside ruby, then I would suggest that the content ("_" vs "-") is
important ... indicating that you should use strings.

Maybe I don't know what you mean by "arbitrarily unique".

"_" vs "-" is no more (or less) important than "a" vs. "z".

If the choice if symbol names is arbitrary, then I can change the name of
the symbol everywhere that references it without changing the semantics of
the program.

For example, if any of the following choices are equally valid:
:eek:ctetstream, :OctetStream, :eek:ctet_stream, :stream_of_octets, :eek:ctets,
:fido, then the choice of name is arbitrary. Of course, some choices are
more transparent and convey meaning better, but the program will still
work even if we call the symbol :xyzzy. That's what it means to be
arbitrary.

If the choice of letters is constrained by some outside force, then it is
not arbitrary. For example, it might come to you as an attribute in an
XML message. Or perhaps you need to write it to a file, and other
programs expect that exact sequence of strings. In all these cases, the
content (sequence of letters) is important and cannot be changed without
breaking the program. When the content of the item is important, use a
string.
 
S

Sam Roberts

Quoting (e-mail address removed), on Sat, Mar 19, 2005 at 12:50:38AM +0900:
Sam Roberts said:
Quoting (e-mail address removed), on Thu, Mar 17, 2005 at 01:36:39PM +0900:
08:59AM
+0900:
Use Strings for their content. Use Symbols for their arbitrary
uniqueness.

I used to do this, but ran into problems.
[...]
:eek:ctet-stream

NameError: undefined local variable or method `stream' for
main:Object
from (irb):3
'octet-stream'.intern
==>:"octet-stream"

Why couldn't you do :eek:ctet_stream ? If your answer is because the dash
comes
from outside ruby, then I would suggest that the content ("_" vs "-") is
important ... indicating that you should use strings.

Maybe I don't know what you mean by "arbitrarily unique".

"_" vs "-" is no more (or less) important than "a" vs. "z".

If the choice if symbol names is arbitrary, then I can change the name of
the symbol everywhere that references it without changing the semantics of
the program.

For example, if any of the following choices are equally valid:
:eek:ctetstream, :OctetStream, :eek:ctet_stream, :stream_of_octets, :eek:ctets,
:fido, then the choice of name is arbitrary. Of course, some choices are
more transparent and convey meaning better, but the program will still
work even if we call the symbol :xyzzy. That's what it means to be
arbitrary.

Ah. Then, no, its not really arbitrary. More specifically, I can make it
arbitrary, but then I might be forced to make it more and more
arbitrary! If I map:

x-mailer => :xmailer

Then somebody decides to make a header

xmailer

I have to map:

xmailer => :zz_xmailer

etc. I guess I could madk a mapping table, hashing strings to
symbols, but at this point symbols aren't making my code clearer or
easier to use.

In the example of mime types, I probably could use abitray symbols.
Anybody who decides to make a new mime type called application/octet_stream or
application/octet_stream given tha application/octet-stream is a
standard name deserves to be publically humiliated. So I could use
:eek:ctetstream, arbitrarily.

I just wanted to use symbols for the efficiency, and to emphasize their
uniqueness in terms of case-sensitivity, it seemed to fit, but for
serveral reasons I discovered it didn't.

Cheers,
Sam
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top