raw string tail escape revisited

B

Bengt Richter

Why wouldn't quote-stuffing solve the problem, and let you treat \ as
an ordinary character? In a raw string, it's no good for preventing
end-of-quoting anyway, unless you want the literal \ in front of the quote
you are escaping.

Quote-stuffing is a variation on the old quote-doubling, extended to
deal with triple quotes as well (which makes it a little like HDLC bit stuffing).

IOW, treat \ as an ordinary character, and then if you don't want the
string to end, just stuff one quote character of the starting kind after
the otherwise terminating sequence. You could do this with single quoting
or triple quoting, where of course you'd need it less for triple quotes.
E.g., using uppercase R as a prefix for this kind of raw string syntax,

R'\' # just fine
R'C:\' # one of the motivations
R'''' # dumb way to do "'"
R""" <just about anything> ->[""""]<-makes 3 quotes, and we end with \"""
R""" ->[""""""""]<-two stuffing-extended triple quotes make 6 quotes."""

The tokenizer would recognize a stuffed quote mark and just discard it if present,
otherwise recognize end of string.

Just had this idea. Do I need more coffee? What did I forget?

Regards,
Bengt Richter
 
J

Jeff Epler

Well, one problem is that this is incompatible with all existing
R-strings, which have been in Python for comparative ages. So we'd be
forced to implement then as B'' strings (For Bengt). 16 ways to declare
string literals (single and triple, ' and ", standard, r, u, and ur)
are bad enough, I don't want to add another 8 (single and triple, ' and
", b and ub) to the mix.
$ python -c 'import this' | grep "only one"

Secondly, the price in the tokenizer for an R-string vs a regular string is
essentially zero, since after the leading r, u or ur is parsed, the
regular rule for parsing any string is used. Your rule will require
near-duplication of a 60-line segment of Parser/tokenizer.c and a new
function similar to PyString_DecodeEscape, probably another 60 lines of
C.

Finally, I'm not convinced that your description that triple-quotes and
quote-stuffing work well together. RIght now, if the parser sees
R'''' # dumb way to do "'"
it'll still be in the midst of parsing a triple-quoted raw string. How
will you be able to write a B''' string that begins with a ' if this
rule is followed? So there must be strings that you can't write with
B-quoting, just like there are strings you can't write with R-quoting
(but this time the problem is with strings that start with quotes
instead of ending with backslashes).

Jeff
 
B

Bengt Richter

Why wouldn't quote-stuffing solve the problem, and let you treat \ as
an ordinary character? In a raw string, it's no good for preventing
end-of-quoting anyway, unless you want the literal \ in front of the quote
you are escaping.

Quote-stuffing is a variation on the old quote-doubling, extended to
deal with triple quotes as well (which makes it a little like HDLC bit stuffing).

IOW, treat \ as an ordinary character, and then if you don't want the
string to end, just stuff one quote character of the starting kind after
the otherwise terminating sequence. You could do this with single quoting
or triple quoting, where of course you'd need it less for triple quotes.
E.g., using uppercase R as a prefix for this kind of raw string syntax,

R'\' # just fine
R'C:\' # one of the motivations
R'''' # dumb way to do "'"
Really dumb ;-/ That makes an un-terminated triple quoted string
starting with one quote. D'oh. The logic doesn't start until the beginning
delimiter - single or triple - has been passed and established. So if you
perversely wanted to use only single quotes to quote one single quote,
you couldn't. Is there one you couldn't do at all? I don't think so, since
you could always do single-quote doubling and choose the opposite quote of a leading
quote in the data. E.g., R'"""''''''' Would be a painful R'"""'+R"'''"
Actually, that could be triple quoted as R"""""""'''""", but putting an ending '"'
in that data would make a problem. Nope, R'''"""''''"''' would handle that.
But what if we add another "'"? Then the data would be ["""'''"'] Still ok,
looks like we can always start with a triple quote opposite to the end of the data:
R"""""""'''"'""" would do it. Is there an impossible case I'm missing that would have
to be split into two adjacent (thus concatenated) string representations?

Is there a reasonable use case that is messed up as the price of getting R'\' ?

Otherwise I guess it should be ok. Woke up too early and not enough ;-)

R""" <just about anything> ->[""""]<-makes 3 quotes, and we end with \"""
R""" ->[""""""""]<-two stuffing-extended triple quotes make 6 quotes."""

The tokenizer would recognize a stuffed quote mark and just discard it if present,
otherwise recognize end of string.

Just had this idea. Do I need more coffee? What did I forget?

Regards,
Bengt Richter

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top