Py-dea: Streamline string literals now!

R

Rick Johnson

Hello folks,

In a recent thread i stumbled upon an epiphany of sorts concerning
Python string literals, with implications that trickle down to all
forms of string literals used in general programming, since, for the
most part, the syntax is virtually the same!

For all our lives we have been excepting a standard for string
literals that is, quite literally, overkill. It seems all along the
syntax has been flawed, however, we have been totally unaware... until
now!

*[A brief history lesson]*

Python offers two main styles of delimiting string literals:
* single "leading and trailing" double, or single quote chars.
* triple "leading and trailing" double, or single quote chars.

The single leading group is intended for one line literals whereas the
triple is intended for multi-line literals.

Now, in my initial days of Python-ing, i thought this was a great
system. Unlike most languages that require a programmer to concatenate
strings over and over again, python "seemed" to solve the multi-line
issue -- but there is a greater issue!

*[Current Issues]*

The fact is...even with the multi-line issue solved, we still have two
forms of literal delimiters that encompass two characters resulting in
*four* possible legal combinations of the exact same string! I don't
know about you guys, but i am not a big fan of Tim Towtdi.

But the problems just keep adding up. There is also the issue of
escape sequences; which i have always found to be atrociousness noisy.
Everyone who has had to deal with double escapes in regexps raise your
hand. The frowns are deafening.

The simple fact is: The current syntax for string literals is not only
deficient, it is bloated, noisy, and confusing!

*[Solution]*

I believe that with the ubiquitous-ness of syntax highlight, string
literals only need one delimiter. In the old days (before syntax
highlight was invented) i could understand how a programmer "might"
miss a single (or even a triple!) closing delimiter; but those days
died with the king.

My proposal is to introduce a single delimiter for string literals. A
new string literal that is just as good at spanning single lines as it
is spanning multiple lines. A new literal that uses widely known
markup tag syntax instead of cryptic and noisy escape sequences. And
finally, a *literal* that is worthy of the 21st century.

Thank You.
 
C

Chris Angelico

My proposal is to introduce a single delimiter for string literals. A
new string literal that is just as good at spanning single lines as it
is spanning multiple lines. A new literal that uses widely known
markup tag syntax instead of cryptic and noisy escape sequences. And
finally, a *literal* that is worthy of the 21st century.

So all you're doing is introducing a different form of escaping. You
can already achieve this with regular expressions in several engines -
declare a different delimiter and/or escape character - in order to
dodge the issue of multiple escapes. Python already has raw strings
for the same reason (although the rules governing raw strings are
oddly complex in edge cases).

Your proposal sounds like a good idea for a specific-situation config
file, but a very bad idea for Python. If you want elaboration on that,
ask me for my arguments on CSV vs /etc/passwd. Or search the web for
the same topic, I'm sure many others have made the same points.

ChrisA
 
S

Steven D'Aprano

I believe that with the ubiquitous-ness of syntax highlight, string
literals only need one delimiter. In the old days (before syntax
highlight was invented) i could understand how a programmer "might" miss
a single (or even a triple!) closing delimiter; but those days died with
the king.

http://en.wikipedia.org/wiki/List_of_current_monarchs

Thank you Rick for another non-solution to a non-problem.

How is that fork of Python going? You promised us you would revitalise
the community with a new fork of Python that throws out all the
accumulated cruft of the language. I'm sure that the silent majority who
agree with you are hanging out for your first release. I know I am.
 
R

Rick Johnson

So all you're doing is introducing a different form of escaping.

Well that is but one of the proposals. The others include reducing
four combinations of string literals into one, and by default,
spanning multiple lines with one syntax.
You
can already achieve this with regular expressions in several engines -
declare a different delimiter and/or escape character - in order to
dodge the issue of multiple escapes.

I am specifically referring to the Python language and how the
interpreter "interprets" string literals. Wait, you do share the same
definition of "string literal" as i do, i hope?

http://en.wikipedia.org/wiki/String_literal
Python already has raw strings
for the same reason (although the rules governing raw strings are
oddly complex in edge cases).

Raw strings would be history with my proposal. No more need for raw
strings. This is the string literal to rule all string literals!
Your proposal sounds like a good idea for a specific-situation config
file, but a very bad idea for Python.

Can you give specific reasons why?
 
R

Rick Johnson

The fact is...even with the multi-line issue solved, we still have two
forms of literal delimiters that encompass two characters resulting in
*four* possible legal combinations of the exact same string! I don't
know about you guys, but i am not a big fan of Tim Towtdi.

actually i was a bit hasty with that statment and underestimated the
actual number of possiblities.

1) "this is a string"
2) 'this is a string'
3) r"this is a string"
4) r'this is a string'
5) '''this is a string'''
6) """this is a string"""
7) r'''this is a string'''
8) r"""this is a string"""

A) "this is difficult to \"eyeball parse\""
B) """this is "overkill""""
C) "that's just plain \"nuts\"!"

Now. If anyone can look at that mess and not admit it is a disaster,
well then...

I am also thinking that ANY quote char is a bad choice for string
literal delimiters. Why? Well because it is often necessary to embed
single or double quotes into a string literal. We need to use a
delimiter that is not a current delimiter elsewhere in Python, and
also, is a very rare char. I believe Mr Ewing found that perfect char
in his "Multi-line uber raw string literals!" (Just scroll down a bit
at this link)...

http://www.cosc.canterbury.ac.nz/greg.ewing/python/ideas.html

....however, requiring a programmer to start EVERY line with a marker
does not seem like fun to me. And just think what a nightmare it will
be to modify copy/pasted data with line markers! Although it does
solve the "indention" issue with doc-strings! I think just for foreign
language compatibility reasons we should stick with one of either " or
' (or maybe both), but allowing [r]""" and [r]''' is just WAY too
much! We need to trim this fat.
 
S

Steven D'Aprano

Now. If anyone can look at that mess and not admit it is a disaster,
well then...

It isn't a disaster. A disaster is when people die, lose their houses,
get tossed out into the street to starve, radioactive contamination
everywhere, floods, fire, the first born of every family being struck
down suddenly, that sort of thing. Not having eight ways to write string
literals. That's merely a convenience.

I am also thinking that ANY quote char is a bad choice for string
literal delimiters. Why? Well because it is often necessary to embed
single or double quotes into a string literal. We need to use a
delimiter that is not a current delimiter elsewhere in Python, and also,
is a very rare char. I believe Mr Ewing found that perfect char in his
"Multi-line uber raw string literals!" (Just scroll down a bit at this
link)...

http://www.cosc.canterbury.ac.nz/greg.ewing/python/ideas.html

Not surprisingly, you haven't thought this through. I'm sure Greg Ewing
has, which is why he hasn't proposed *replacing* string delimiters with
his multi-line format. That's why he proposed it as a *statement* and not
string-builder syntax.

Without an end-delimiter, how do you embed string literals in expressions?
 
C

Chris Angelico

I am also thinking that ANY quote char is a bad choice for string
literal delimiters. Why? Well because it is often necessary to embed
single or double quotes into a string literal.

Postgres allows dollar-delimited strings, which get around this issue somewhat.

http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING

But for most strings, it simply makes sense to use a quote character.
Most strings don't need both ' and " in them.

You cannot pick one character to be your ultimate delimiter, because
there will always be occasion to embed it. (If nothing else, what
happens when you emit code?) You want the delimiter to be easily typed
and recognized, and that guarantees that it'll be something that's
going to want to be emitted. It's necessary to have multiple options,
or escaping.

ChrisA
 
R

Rick Johnson

Not surprisingly, you haven't thought this through. I'm sure Greg Ewing
has, which is why he hasn't proposed *replacing* string delimiters with
his multi-line format. That's why he proposed it as a *statement* and not
string-builder syntax.

Without an end-delimiter, how do you embed string literals in expressions?

Did you even read what i wrote? And if you did, you missed the point!

My point was... while Greg's idea is nice, it is not the answer.
HOWEVER, he did find the perfect char, and that char is the pipe! -->
|

mlstr = |||
this is a
multi line sting that is
delimited by "triple pipes". Or we
could just 'single pipes' if we like, however, i think
the "triple pipe' is easier to see. Since the pipe char
is so rare in Python source, it becomes the obvious
choice. And, best of all, no more worries about
"embedded quotes". YAY!
|||

slstr = |this is a single line string|

The point is people, we should be using string delimiters that are
ANYTHING besides " and '. Stop being a sheep and use your brain!
 
D

Dominic Binks

Did you even read what i wrote? And if you did, you missed the point!

My point was... while Greg's idea is nice, it is not the answer.
HOWEVER, he did find the perfect char, and that char is the pipe! -->
|

mlstr = |||
this is a
multi line sting that is
delimited by "triple pipes". Or we
could just 'single pipes' if we like, however, i think
the "triple pipe' is easier to see. Since the pipe char
is so rare in Python source, it becomes the obvious
choice. And, best of all, no more worries about
"embedded quotes". YAY!
|||

slstr = |this is a single line string|

The point is people, we should be using string delimiters that are
ANYTHING besides " and '. Stop being a sheep and use your brain!

I disagree that quotes are a bad thing. Most programming languages use
+, -, / and * for arithmentic operations (* is closest character on a
keyboard to x that is not the letter ex). Why do they choose to do
this? It's actually a lot more work for the language implementer(s) to
do this rather than simply providing functions add(x,y), subtract(x,y),
divide(x,y) and multiply(x,y).

The answer is very simple - convention. We use these symbols in
mathematics and so it's much more convenient to follow through the
mathematical convention. The learning curve is much lower (and it's
less typing). After all experienced programmers know this is what is
going on under the hood in pretty much all programming languages.
Compilers are just making it easy for us.

While it may indeed make sense to use a different character for
delimiting strings, in English we use " to delimit spoken words in a
narrative. Since strings most closely resemble spoken words from the
point of view a programming language, using double quotes is a sensible
choice since it eases learning.

Convention is a very strong thing to argue against (not to mention huge
code breakage as a result of such a change).

Personally, I try to ensure I use consistency in the way I use the
different quoting mechanisms, but that's just a personal choice.
Sometimes that leads me to less pleasant looking strings, but I believe
the consistency of style makes it easier to read.

I think this proposal falls in the, let's make python case-insensitive
kind of idea - it's never going to happen cause it offers very little
benefit at huge cost.

And I'm not going to contribute to this thread any further cause it's a
pointless waste of my time to write it and others time to read it.
 
L

Lie Ryan

mlstr = |||
this is a
multi line sting that is
delimited by "triple pipes". Or we
could just 'single pipes' if we like, however, i think
the "triple pipe' is easier to see. Since the pipe char
is so rare in Python source, it becomes the obvious
choice. And, best of all, no more worries about
"embedded quotes". YAY!
|||

slstr = |this is a single line string|

The point is people, we should be using string delimiters that are
ANYTHING besides " and '. Stop being a sheep and use your brain!

This will incur the wrath of all linux/unix sysadmins all over the
world. You are just replacing one problem with another; in your
obliviousness, you had just missed the very obvious fact that once you
replace quotes with pipes, quotes is extremely rare in Python (in fact
you won't be seeing any quotes except inside string literals).
 
I

Ian Kelly

My point was... while Greg's idea is nice, it is not the answer.
HOWEVER, he did find the perfect char, and that char is the pipe! -->
|

mlstr = |||
this is a
multi line sting that is
delimited by "triple pipes". Or we
could just 'single pipes' if we like, however, i think
the "triple pipe' is easier to see. Since the pipe char
is so rare in Python source, it becomes the obvious
choice. And, best of all, no more worries about
"embedded quotes". YAY!
|||

slstr = |this is a single line string|

The point is people, we should be using string delimiters that are
ANYTHING besides " and '. Stop being a sheep and use your brain!

So those who do shell scripting from Python might replace this:

subprocess.call("cat {0} | awk '/foo|bar/ {print $3;}' | sort | uniq
{1}".format(infile, outfile), shell=True)

with this:

subprocess.call(|cat {0} \| awk '/foo\|bar/ {print $3;}' \| sort \|
uniq >{1}|.format(infile, outfile), shell=True)

or if we combine Rick's string escaping proposal:

subprocess.call(|cat {0} <PIPE> awk '/foo<PIPE>bar/ {print $3;}'
<PIPE> sort <PIPE> uniq <GT>{1}|.format(infile, outfile), shell=True)

Yeah, that's so much better. I especially like how nice and readable
the regex is now.
 
L

Lie Ryan

actually i was a bit hasty with that statment and underestimated the
actual number of possiblities.

1) "this is a string"
2) 'this is a string'
3) r"this is a string"
4) r'this is a string'
5) '''this is a string'''
6) """this is a string"""
7) r'''this is a string'''
8) r"""this is a string"""

you missed u"nicode" string and b"yte" string, each of them available in
both single and double quote flavor and single and triple quote flavor.
Also, it's possible to mix them together ur"unicode raw string" or
br"byte raw string", they are also in single and double quote flavor and
single and triple quote flavor. And of course, I can't believe you
forget Guido's favourite version, g"", available in musical and sirloin
cloth flavor.
 
P

python

Lie,
And of course, I can't believe you forget Guido's favourite version, g"", available in musical and sirloin cloth flavor.

LMAO! That was brilliant! :)

Cheers!
Malcolm
 
N

Nathan Rice

Quotes are obnoxious in the nesting sense because everyone uses quotes
for string delimiters. By the same token, quotes are wonderful
because not only are they intuitive to programmers, but they are
intuitive in general. Parenthesis are pretty much in the same boat...
I *HATE* them nested, but they are so intuitive that replacing them is
a non starter; Just write code that doesn't nest parenthesis.

Nathan
 
C

Chris Angelico

Quotes are obnoxious in the nesting sense because everyone uses quotes
for string delimiters.  By the same token, quotes are wonderful
because not only are they intuitive to programmers, but they are
intuitive in general.  Parenthesis are pretty much in the same boat...
I *HATE* them nested, but they are so intuitive that replacing them is
a non starter;  Just write code that doesn't nest parenthesis.

Parentheses have different starting and ending delimiters and must be
'properly nested' (ie there must be exactly-matching inner parens
inside any given set of outer parens (note that English has similar
rules - you can't mis-nest parentheses (at any depth) in either
language)). You can't guarantee the same about quoted strings -
suppose the starting delimiter were ' and the ending " (or vice
versa), it still wouldn't deal with the issue of coming across an
apostrophe inside a quoted string.

In actual fact, the real problem is that quoted strings need to be
able to contain _anything_. The only true solution to that is
length-provided strings:

s = "4spam
q = "14Hello, world!\n

This works beautifully in interchange formats, but rather poorly in
source code (or, for that matter, anything editable).

ChrisA
 
N

Nathan Rice

Parentheses have different starting and ending delimiters and must be
'properly nested' (ie there must be exactly-matching inner parens
inside any given set of outer parens (note that English has similar
rules - you can't mis-nest parentheses (at any depth) in either
language)). You can't guarantee the same about quoted strings -
suppose the starting delimiter were ' and the ending " (or vice
versa), it still wouldn't deal with the issue of coming across an
apostrophe inside a quoted string.

I think you read more into my statement than was intended. Parens are
bad like nested quotes are bad in the sense that they made statements
difficult to read and confusing.

While it is entirely possible to parse nested strings automatically in
a probabilistic manner with nearly flawless accuracy by examining
everything between the start and end of the line, generally I feel
that people are uncomfortable with probabilistic techniques in the
realm of programming :) Best just to make the user be explicit.

Nathan
 
S

Steven D'Aprano

The point is people, we should be using string delimiters that are
ANYTHING besides " and '. Stop being a sheep and use your brain!

"ANYTHING", hey?

I propose we use ئ and ร as the opening and closing string delimiters.
Problem solved!

Thank you Rick for yet another brilliant, well-thought-out idea. I look
forward to seeing your fork of Python with this change. How is it going?
I hope you aren't going to disappoint the legions of your fans who are
relying on you to save the Python community from neglect.
 
C

Chris Angelico

Thank you Rick for yet another brilliant, well-thought-out idea. I look
forward to seeing your fork of Python with this change. How is it going?
I hope you aren't going to disappoint the legions of your fans who are
relying on you to save the Python community from neglect.

But Steven, the shocking state of IDLE has utterly destroyed any
reputation Python may have had. There's not going to be any community
left to neglect, soon!

That said, though, the new string delimiters will solve many problems.
We should adopt a strict policy: all syntactic elements MUST be
comprised of non-ASCII characters, thus allowing all ASCII characters
to represent themselves literally. I'm not sure what "representing
themselves literally" would mean, since strings already have perfect
delimiters, but it seems like a good policy.

By the way, doubling the delimiter works for nesting strings. ئئ and
รร look just fine.

ChrisA
 
D

Dan Sommers

"ANYTHING", hey?

I propose we use ئ and ร as the opening and closing string delimiters.
Problem solved!

Why stop at pre-determined, literal delimiters? That's way too easy on
the parser, whether that parser is a computer or a person.

__string_delimiter__ = Ω # magic syntax; no quotes needed

x = ΩhelloΩ

__string_delimiter__ = # # that first # isn't a comment marker

x = # this isn't a comment; it's part of the string
this is a multi-line string
#

__string_delimiter__ = '' # the delimiter is delimited by whitespace
# now I can have comments, again, too

x = ''"hello"'' # now x contains two double-quote characters

I'm sure that the implementation is trivial, and it's so much easier to
write strings that contain quotes (not to mention how easy it is to read
those strings back later).
 
D

Dennis Lee Bieber

In actual fact, the real problem is that quoted strings need to be
able to contain _anything_. The only true solution to that is
length-provided strings:

s = "4spam
q = "14Hello, world!\n

This works beautifully in interchange formats, but rather poorly in
source code (or, for that matter, anything editable).
Worked in FORTRAN for decades...

4Hspam

No line feeds however, and one did have to count spaces for long strings
when part of a complex format statement...
10Hfive ,rest of statement
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,071
Latest member
MetabolicSolutionsKeto

Latest Threads

Top