Unrecognized escape sequences in string literals

Douglas Alan · Aug 13, 2009

You are making an unjustified assumption: \y is not an error.

You are making in an unjustified assumption that I ever made such an
assumption!

My claim is and has always been NOT that \y is inately an error, but
rather that treating unrecognized escape sequences as legal escape
sequences is error PRONE.

While I'm amused that you've made my own point for me, I'm less
amused that you seem to be totally incapable of seeing past your
parochial language assumptions,

Where do you get the notion that my assumptions are in any sense
"parochial"? They come from (1) a great deal of experience programming
very reliable software, and (2) having learned at least two dozen
different programming languages in my life.

I disagree with nearly everything you say in this post. I think
that a few points you make have some validity, but the vast
majority are based on a superficial and confused understanding
of language design principles.

Whatever. I've taken two graduate level classes at MIT on programming
languages design, and got an A in both classes, and designed my own
programming language as a final project, and received an A+. But I
guess I don't really know anything about the topic at all.

But it's not the only reasonable design choice, and Bash has
made a different choice, and Python has made yet a third
reasonable choice, and Pascal made yet a fourth reasonable choice.

And so did Perl and PHP, and whatever other programming language you
happen to mention. In fact, all programming languages are equally
good, so we might as well just freeze all language design as it is
now. Clearly we can do no better.

One party insisting that red is the only logical colour for a
car, and that anybody who prefers white or black or blue is
illogical, is unacceptable.

If having all cars be red saved a lot of lives, or increased gas
mileage significantly, then it might very well be the best color for a
car. But of course, that is not the case. With programming languages,
there is much more likely to be an actual fact of the matter on which
sorts of language design decisions make programmers more productive on
average, and which ones result in more reliable software.

I will certainly admit that obtaining objective data on such things is
very difficult, but it's a completely different thing that one's color
preference for their car.

|>ouglas

Aahz · Aug 14, 2009

My friend begs to differ with the above. It would be much better for
debugging if Python generated a parsing error for unrecognized escape
sequences, rather than leaving them unchanged. g++ outputs a warning
for such escape sequences, for instance. This is what I would consider
to be the correct behavior. (Actually, I think it should just generate
a fatal parsing error, but a warning is okay too.)

Well, then, the usual response applies: create a patch, discuss it on
python-ideas, and see what happens.

(That is, nobody has previously complained so vociferously IIRC, and
adding a warning is certainly within the bounds of what's theoretically
acceptable.)

Steven D'Aprano · Aug 14, 2009

"I saw `cout' being shifted "Hello world" times to the left and stopped
right there." --Steve Gonedes

Assuming that's something real, and not invented for humour, I presume
that's describing something possible in C++. Am I correct? What the hell
would it actually do???

MRAB · Aug 14, 2009

Grant said:
Yes. In C++, the "<<" operator is overloaded. Judging by the
context in which I've seen it used, it does something like
write strings to a stream.

IIRC in C++,

cout << "Hello world";

It also returns cout, so you can chain them:

cout << "Hello, " << name << '\n';

Douglas Alan · Aug 14, 2009

Yes. In C++, the "<<" operator is overloaded. Judging by the
context in which I've seen it used, it does something like
write strings to a stream.

There's a persistent rumor that it is *this* very "abuse" of
overloading that caused Java to avoid operator overloading all
together.

But then then Java went and used "+" as the string concatenation
operator. Go figure!

|>ouglas

P.S. Overloading "left shift" to mean "output" does indeed seem a bit
sketchy, but in 15 years of C++ programming, I've never seen it cause
any confusion or bugs.

Steven D'Aprano · Aug 14, 2009

I think I've spent enough time on this discussion, so I won't be directly
responding to any of your recent points -- it's clear that I'm not
persuading you that there's any justification for any behaviour for
escape sequences other than the way C++ deals with them. That's your
prerogative, of course, but I've done enough tilting at windmills for
this week, so I'll just make one final comment and then withdraw from an
unproductive argument. (I will make an effort to read any final comments
you wish to make, so feel free to reply. Just don't expect an answer to
any questions.)

Douglas, you and I clearly have a difference of opinion on this. Neither
of us have provided even the tiniest amount of objective, replicable,
reliable data on the error-proneness of the C++ approach versus that of
Python. The supposed superiority of the C++ approach is entirely
subjective and based on personal opinion instead of quantitative facts.

I prefer languages that permit anything that isn't explicitly forbidden,
so I'm happy that Python treats non-special escape sequences as valid,
and your attempts to convince me that this goes against the Zen have
entirely failed to convince me. As I've done before, I will admit that
one consequence of this design is that it makes it hard to introduce new
escape sequences to Python. Given that it's vanishingly rare to want to
do so, and that wanting to add backslashes to strings is common, I think
that's a reasonable tradeoff. Other languages may make different
tradeoffs, and that's fine by me.

Dave Angel · Aug 15, 2009

Benjamin said:
The only reason it hasn't is because people use it in "Hello World". I bet
some newbie C++ programmers get confused the first time they see << used to
shift.

Actually, I've seen it cause confusion, because of operator precedence.
The logical shift operators have a fairly high level priority, so
sometimes you need parentheses that aren't obvious. Fortunately, most
of those cases make compile errors.

C++ has about 17 levels of precedence, plus some confusing associative
rules. And operator overloading does *NOT* change precedence.

DaveA

Hendrik van Rooyen · Aug 15, 2009

Assuming that's something real, and not invented for humour, I presume
that's describing something possible in C++. Am I correct? What the hell
would it actually do???

It would shift "cout" left "Hello World" times.
It is unclear if the shift wraps around or not.

It is similar to a banana *holding his hands apart about a foot* this colour.

- Hendrik

Chris Rebert · Aug 15, 2009

It would shift "cout" left "Hello World" times.
It is unclear if the shift wraps around or not.

It is similar to a banana *holding his hands apart about a foot* this colour.

- Hendrik

I think you managed to successfully dereference the null pointer there...

Cheers,
Chris

Douglas Alan · Aug 15, 2009

Benjamin Kaplan wrote:

People typically get confused by a *lot* of things when they learn a
new language. I think the better metric is how people fare with a
language feature once they've grown accustomed to the language, and
how long it takes them to acquire this familiarity.

Actually, I've seen it cause confusion, because of operator precedence.
The logical shift operators have a fairly high level priority, so
sometimes you need parentheses that aren't obvious. Fortunately, most
of those cases make compile errors.

I've been programming in C++ so long that for me, if there's any
confusion, it's the other way around. I see "<<" or ">>" and I think I/
O. I don't immediately think shifting. Fortunately, shifting is a
pretty rare operation to actually use, which is perhaps why C++
reclaimed it for I/O.

On the other hand, you are right that the precedence of "<<" is messed
up for I/O. I've never seen a real-world case where this causes a bug
in C++ code, because the static type-checker always seems to catch the
error. In a dynamically typed language, this would be a much more
serious problem.

|>ouglas

P.S. I find it strange, however, that anyone who is not okay with
"abusing" operator overloading in this manner, wouldn't also take
umbrage at Python's overloading of "+" to work with strings and lists,
etc. Numerical addition and sequence concatenation have entirely
different semantics.

Douglas Alan · Aug 16, 2009

Douglas, you and I clearly have a difference of opinion on
this. Neither of us have provided even the tiniest amount
of objective, replicable, reliable data on the
error-proneness of the C++ approach versus that of
Python. The supposed superiority of the C++ approach is
entirely subjective and based on personal opinion instead
of quantitative facts.

Alas, this is true for nearly any engineering methodology or
philosophy, which is why, I suppose, Perl, for instance,
still has its proponents. It's virtually impossible to prove
any thesis, and these things only get decided by endless
debate that rages across decades.

I prefer languages that permit anything that isn't
explicitly forbidden, so I'm happy that Python treats
non-special escape sequences as valid,

I don't really understand what you mean by this. If Python
were to declare that "unrecognized escape sequences" were
forbidden, then they would be "explicitly forbidden". Would
you then be happy?

If not, why are you not upset that Python won't let me do

[3, 4, 5] + 2

Some other programming languages I've used certainly do.

and your attempts to convince me that this goes against
the Zen have entirely failed to convince me. As I've done
before, I will admit that one consequence of this design
is that it makes it hard to introduce new escape sequences
to Python. Given that it's vanishingly rare to want to do
so,

I'm not so convinced of that in the days of Unicode. If I
see, backslash, and then some Kanji character, what am I
supposed to make of that? For all I know, that Kanji
character might mean newline, and I'm seeing code for a
version of Python that was tweaked to be friendly to the
Japanese. And in the days where smart hand-held devices are
proliferating like crazy, there might be ever-more demand
for easy-to-use i/o that lets you control various aspects of
those devices.

|>ouglas

Steven D'Aprano · Aug 16, 2009

P.S. I find it strange, however, that anyone who is not okay with
"abusing" operator overloading in this manner, wouldn't also take
umbrage at Python's overloading of "+" to work with strings and lists,
etc. Numerical addition and sequence concatenation have entirely
different semantics.

Not to English speakers, where we frequently use 'add' to mean
concatenate, append, insert, etc.:

"add this to the end of the list"
"add the prefix 'un-' to the beginning of the word to negate it"
"add your voice to the list of those calling for change"
"add your name and address to the visitor's book"

and even in-place modifications:

"after test audiences' luke-warm response, the studio added a completely
different ending to the movie".

Personally, I would have preferred & for string and list concatenation,
but that's entirely for subjective reasons.

Douglas Alan · Aug 16, 2009

Not to English speakers, where we frequently use 'add' to mean
concatenate, append, insert, etc.:

That is certainly true, but the "+" symbol (pronounced "plus" not
"add") isn't exactly synonymous with the English word "add" and is
usually used in, technical circles, to refer to a function that at
least meets the properties of an abelian group operator.

Also, programming languages (other than Perl) should be more precise
than English. English words often have many, many meanings, but when
we are talking about types and operations on types, the operations
should generally have more specific semantics.

In any case, let's say we grant that operators should be allowed to be
as sloppy as English. Then we should have no problem with C++'s use of
"<<" for i/o. Pseudo-code has a long heritage of using "<-" to
indicate assignment, and there are a number of programming language
(e.g., APL) that use assignment to the output terminal to indicate
writing to the terminal. C++'s usage of "<<" for output is clearly
designed to be reminiscent of this, and therefore intuitive.

And intuitive it is, given the aforementioned background, at least.

So, as far as I can tell, Python has no real authority to throw stones
at C++ on this little tiny particular issue.

|>ouglas

Steven D'Aprano · Aug 16, 2009

So, as far as I can tell, Python has no real authority to throw stones
at C++ on this little tiny particular issue.

I think you're being a tad over-defensive. I asked a genuine question
about a quote in somebody's signature. That's a quote which can be found
all over the Internet, and the poster using it has (as far as I know) no
official capacity to speak for "Python" -- while Aahz is a high-profile,
well-respected Pythonista, he's not Guido.

Now that I understand what the semantics of cout << "Hello world" are, I
don't have any problem with it either. It is a bit weird, "Hello world" any programming language, and it's probably influenced by input
redirection using < in various shells.

Douglas Alan · Aug 16, 2009

I think you're being a tad over-defensive.

Defensive? Personally, I prefer Python over C++ by about a factor of
100X. I just find it a bit amusing when someone claims that some
programming language has a particular fatal flaw, when their own
apparently favorite language has the very same issue in an only
slightly different form.

the poster using it has (as far as I know) no official capacity to speak
for "Python"

I never thought he did. I wasn't speaking literally, as I'm not under
the opinion that any programming language has any literal authority or
any literal ability to throw stones.

Now that I understand what the semantics of cout << "Hello world" are, I
don't have any problem with it either. It is a bit weird, "Hello world">> cout
would probably be better, but it's hardly the strangest design in
any programming language, and it's probably influenced by input
redirection using < in various shells.

C++ also allows for reading from stdin like so:

cin >> myVar;

I think the direction of the arrows probably derives from languages
like APL, which had notation something like so:

myVar <- 3
[] <- myVar

"<-" was really a little arrow symbol (APL didn't use ascii), and the
first line above would assign the value 3 to myVar. In the second
line, the "[]" was really a little box symbol and represented the
terminal. Assigning to the box would cause the output to be printed
on the terminal, so the above would output "3". If you did this:

[] -> myVar

It would read a value into myVar from the terminal.

APL predates Unix by quite a few years.

|>ouglas

Hendrik van Rooyen · Aug 16, 2009

"Steven D'Aprano" <[email protected]> wrote:

Now that I understand what the semantics of cout << "Hello world" are, I
don't have any problem with it either. It is a bit weird, "Hello world"
any programming language, and it's probably influenced by input
redirection using < in various shells.

I find it strange that you would prefer:

"Hello world" >> cout
over:
cout << "Hello world"

The latter seems to me to be more in line with normal assignment: -
Take what is on the right and make the left the same.
I suppose it is because we read from left to right that the first one seems
better to you.
Another instance of how different we all are.

It goes down to the assembler - there are two schools:

mov a,b - for Intel like languages, this means move b to a
mov a,b - for Motorola like languages, this means move a to b

Gets confusing sometimes.

- Hendrik

Steven D'Aprano · Aug 16, 2009

I find it strange that you would prefer:

"Hello world" >> cout
over:
cout << "Hello world"

The latter seems to me to be more in line with normal assignment: - Take
what is on the right and make the left the same.

I don't like normal assignment. After nearly four decades of mathematics
and programming, I'm used to it, but I don't think it is especially good.
It confuses beginners to programming: they get one set of behaviour
drilled into them in maths class, and then in programming class we use
the same notation for something which is almost, but not quite, the same.
Consider the difference between:

y = 3 + x
x = z

as a pair of mathematics expressions versus as a pair of assignments.
What conclusion can you draw about y and z?

Even though it looks funny due to unfamiliarity, I'd love to see the
results of a teaching language that used notation like:

3 + x -> y
len(alist) -> n
Widget(1, 2, 3).magic -> obj
etc.

for assignment. My prediction is that it would be easier to learn, and
just as good for experienced coders. The only downside (apart from
unfamiliarity) is that it would be a little bit harder to find the
definition of a variable by visually skimming lines of code: your eyes
have to zig-zag back and forth to find the end of the line, instead of
running straight down the left margin looking for "myvar = ...". But it
should be easy enough to search for "-> myvar".

I suppose it is because
we read from left to right that the first one seems better to you.

Probably.

Douglas Alan · Aug 16, 2009

I don't like normal assignment. After nearly four decades of mathematics
and programming, I'm used to it, but I don't think it is especially good.
It confuses beginners to programming: they get one set of behaviour
drilled into them in maths class, and then in programming class we use
the same notation for something which is almost, but not quite, the same.
Consider the difference between:

y = 3 + x
x = z

as a pair of mathematics expressions versus as a pair of assignments.
What conclusion can you draw about y and z?

Yeah, the syntax most commonly used for assignment today sucks. In the
past, it was common to see languages with syntaxes like

y <- y + 1

or

y := y + 1

or

let y = y + 1

But these languages have mostly fallen out of favor. The popular
statistical programming language R still uses the

y <- y + 1

syntax, though.

Personally, my favorite is Lisp, which looks like

(set! y (+ y 1))

or

(let ((x 3)
(y 4))
(foo x y))

I like to be able to read everything from left to right, and Lisp does
that more than any other programming language.

I would definitely not like a language that obscures assignment by
moving it over to the right side of lines.

|>ouglas

Douglas Alan · Aug 16, 2009

For varying values of "Lisp." `set!` is Scheme.

Yes, I'm well aware!

There are probably as many different dialects of Lisp as all other
programming languages put together.

|>ouglas

Steven D'Aprano · Aug 16, 2009

I like to be able to read everything from left to right, and Lisp does
that more than any other programming language.

I would definitely not like a language that obscures assignment by
moving it over to the right side of lines.

One could argue that left-assigned-from-right assignment obscures the
most important part of the assignment, namely *what* you're assigning, in
favour of what you're assigning *to*.

In any case, after half a century of left-from-right assignment, I think
it's worth the experiment in a teaching language or three to try it the
other way. The closest to this I know of is the family of languages
derived from Apple's Hypertalk, where you do assignment with:

put somevalue into name

(Doesn't COBOL do something similar?)

Beginners found that *very* easy to understand, and it didn't seem to
make coding harder for experienced Hypercard developers.

Reversing backslashed escape sequences	3	Jul 1, 2010
Py-dea: Streamline string literals now!	21	Dec 28, 2011
Convert unicode escape sequences to unicode in a file	1	Jan 11, 2011
Unicode escapes and String literals?	24	Dec 13, 2012
retriving escape unicode sequences from files ...	8	Aug 3, 2012
retriving escape unicode sequences from files ...	8	Aug 3, 2012
Windows XP unicode and escape sequences	2	Dec 12, 2007
Non latin characters in string literals	17	Jan 3, 2010

Unrecognized escape sequences in string literals

Douglas Alan

Aahz

Steven D'Aprano

MRAB

Douglas Alan

Steven D'Aprano

Dave Angel

Hendrik van Rooyen

Chris Rebert

Douglas Alan

Douglas Alan

Steven D'Aprano

Douglas Alan

Steven D'Aprano

Douglas Alan

Hendrik van Rooyen

Steven D'Aprano

Douglas Alan

Douglas Alan

Steven D'Aprano

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads