raw string tail escape revisited

Bengt Richter · Aug 9, 2003

Why wouldn't quote-stuffing solve the problem, and let you treat \ as
an ordinary character? In a raw string, it's no good for preventing
end-of-quoting anyway, unless you want the literal \ in front of the quote
you are escaping.

Quote-stuffing is a variation on the old quote-doubling, extended to
deal with triple quotes as well (which makes it a little like HDLC bit stuffing).

IOW, treat \ as an ordinary character, and then if you don't want the
string to end, just stuff one quote character of the starting kind after
the otherwise terminating sequence. You could do this with single quoting
or triple quoting, where of course you'd need it less for triple quotes.
E.g., using uppercase R as a prefix for this kind of raw string syntax,

R'\' # just fine
R'C:\' # one of the motivations
R'''' # dumb way to do "'"
R""" <just about anything> ->[""""]<-makes 3 quotes, and we end with \"""
R""" ->[""""""""]<-two stuffing-extended triple quotes make 6 quotes."""

The tokenizer would recognize a stuffed quote mark and just discard it if present,
otherwise recognize end of string.

Just had this idea. Do I need more coffee? What did I forget?

Regards,
Bengt Richter

Jeff Epler · Aug 9, 2003

Well, one problem is that this is incompatible with all existing
R-strings, which have been in Python for comparative ages. So we'd be
forced to implement then as B'' strings (For Bengt). 16 ways to declare
string literals (single and triple, ' and ", standard, r, u, and ur)
are bad enough, I don't want to add another 8 (single and triple, ' and
", b and ub) to the mix.
$ python -c 'import this' | grep "only one"

Secondly, the price in the tokenizer for an R-string vs a regular string is
essentially zero, since after the leading r, u or ur is parsed, the
regular rule for parsing any string is used. Your rule will require
near-duplication of a 60-line segment of Parser/tokenizer.c and a new
function similar to PyString_DecodeEscape, probably another 60 lines of
C.

Finally, I'm not convinced that your description that triple-quotes and
quote-stuffing work well together. RIght now, if the parser sees
R'''' # dumb way to do "'"
it'll still be in the midst of parsing a triple-quoted raw string. How
will you be able to write a B''' string that begins with a ' if this
rule is followed? So there must be strings that you can't write with
B-quoting, just like there are strings you can't write with R-quoting
(but this time the problem is with strings that start with quotes
instead of ending with backslashes).

Jeff

Bengt Richter · Aug 9, 2003

Why wouldn't quote-stuffing solve the problem, and let you treat \ as
an ordinary character? In a raw string, it's no good for preventing
end-of-quoting anyway, unless you want the literal \ in front of the quote
you are escaping.

Quote-stuffing is a variation on the old quote-doubling, extended to
deal with triple quotes as well (which makes it a little like HDLC bit stuffing).

IOW, treat \ as an ordinary character, and then if you don't want the
string to end, just stuff one quote character of the starting kind after
the otherwise terminating sequence. You could do this with single quoting
or triple quoting, where of course you'd need it less for triple quotes.
E.g., using uppercase R as a prefix for this kind of raw string syntax,

R'\' # just fine
R'C:\' # one of the motivations
R'''' # dumb way to do "'"

Really dumb ;-/ That makes an un-terminated triple quoted string
starting with one quote. D'oh. The logic doesn't start until the beginning
delimiter - single or triple - has been passed and established. So if you
perversely wanted to use only single quotes to quote one single quote,
you couldn't. Is there one you couldn't do at all? I don't think so, since
you could always do single-quote doubling and choose the opposite quote of a leading
quote in the data. E.g., R'"""''''''' Would be a painful R'"""'+R"'''"
Actually, that could be triple quoted as R"""""""'''""", but putting an ending '"'
in that data would make a problem. Nope, R'''"""''''"''' would handle that.
But what if we add another "'"? Then the data would be ["""'''"'] Still ok,
looks like we can always start with a triple quote opposite to the end of the data:
R"""""""'''"'""" would do it. Is there an impossible case I'm missing that would have
to be split into two adjacent (thus concatenated) string representations?

Is there a reasonable use case that is messed up as the price of getting R'\' ?

Otherwise I guess it should be ok. Woke up too early and not enough ;-)

R""" <just about anything> ->[""""]<-makes 3 quotes, and we end with \"""
R""" ->[""""""""]<-two stuffing-extended triple quotes make 6 quotes."""

The tokenizer would recognize a stuffed quote mark and just discard it if present,
otherwise recognize end of string.

Just had this idea. Do I need more coffee? What did I forget?

Regards,
Bengt Richter

Regards,
Bengt Richter

Python and PEP8 - Recommendations on breaking up long lines?	19	Nov 28, 2013
Buffer pair for lexical analysis of raw binary data	3	Jun 27, 2009
Can't solve problems! please Help	0	Sep 26, 2022
raw string from mmap.read() possible?	1	Nov 9, 2003
20050119: quoting strings	1	Jan 10, 2005
Python -Vs- Ruby: A regexp match to the death!	13	Aug 9, 2010
KirbyBase : replacing string exceptions	2	Nov 23, 2009
Questions on various string literals in c++0x	1	Dec 7, 2010

raw string tail escape revisited

Bengt Richter

Jeff Epler

Bengt Richter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads