backslash plague

L

Luis P. Mendes

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

hi,

I've already read many pages on this but I'm not able to separate the
string 'R0\1.2646\1.2649\D' in four elements, using the \ as the separator.

a='R0\1.2644\1.2344\D'
re.sub(r'\'','ff',a) does nothing
and why must I write two '' after the \? If I hadn't used r I would
understand...

how should I do it?

Luis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBeVzlHn4UHCY8rB8RAqa7AJoDbHIXje4yP/pTZpOH0ZVe1MGqwwCfadOa
T8GTyeJU6Jve1405Xa9cuus=
=P+kP
-----END PGP SIGNATURE-----
 
A

Alex Martelli

Luis P. Mendes said:
I've already read many pages on this but I'm not able to separate the
string 'R0\1.2646\1.2649\D' in four elements, using the \ as the separator.

x = r'R0\1.2646\1.2649\D'
elements = x.split('\\')
and why must I write two '' after the \? If I hadn't used r I would
understand...

A raw literal can't end with an odd number of backslashes (_some_ way
has to be there to escape the quote char, after all).
how should I do it?

I think the string's split method, as above, is the simplest, fastest
way.


Alex
 
M

Mike Rovner

Luis said:
I've already read many pages on this but I'm not able to separate the
string 'R0\1.2646\1.2649\D' in four elements, using the \ as the separator.

a='R0\1.2644\1.2344\D'
a=r'R0\1.2644\1.2344\D'

re.sub(r'\'','ff',a) does nothing
'R0ff1.2644ff1.2344ffD'

and also
'R0ff1.2644ff1.2344ffD'

You said, you wanna _split_ them:
['R0', '1.2644', '1.2344', 'D']

HTH,
Mike
 
P

Paul McGuire

Luis P. Mendes said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

hi,

I've already read many pages on this but I'm not able to separate the
string 'R0\1.2646\1.2649\D' in four elements, using the \ as the separator.

a='R0\1.2644\1.2344\D'
re.sub(r'\'','ff',a) does nothing
and why must I write two '' after the \? If I hadn't used r I would
understand...

how should I do it?

Luis

Problem 1:
Did you perhaps intend:
a = r'R0\1.2644\1.2344\D'
Otherwise, your string contains this: "R0?.2644?.2344\D" - the \1 sequence
is interpreted to mean "ascii character 1", which is the ASCII <SOH>.

Problem 2: "why must I write two '' after the \?"
Even raw strings cannot handle a backslash as the final character, as this
is interpreted as being an escaped quotation character. Your assignment to
a above is a good candidate for raw strings, but futile for '\\' etc. (See
next.)

Problem 3:
If you are trying to re.sub the delimiting backslashes, then you need to
double-double them for re to process them correctly.

Try this:

a = r'R0\1.2644\1.2344\D'
bslash = '\\'
re_bslash = '\\\\'
print a.split( bslash )
print re.sub( re_bslash, 'ff', a )

gives:
['R0', '1.2644', '1.2344', 'D']
R0ff1.2644ff1.2344ffD


-- Paul
 
L

Luis P. Mendes

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So, the trick is to put the r in front of the string?!

with the r in front:

x = r'R0\1.2646\1.2649\D'
|>> elements = x.split('\\')
|>> elements
['R0', '1.2646', '1.2649', 'D'] <--what I want


without it:

y='R0\1.2646\1.2649\D'
|>> elements = y.split('\\')
|>> elements
['R0\x01.2646\x01.2649', 'D'] <-- not good


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBeWMjHn4UHCY8rB8RArvEAJ91mc423u5zY+xYQOLCXoE3ouzb4gCePix/
a745y1s3UMvW61prb6ndCUk=
=pREV
-----END PGP SIGNATURE-----
 
R

Russell Blau

Luis P. Mendes said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

hi,

I've already read many pages on this but I'm not able to separate the
string 'R0\1.2646\1.2649\D' in four elements, using the \ as the separator.

a='R0\1.2644\1.2344\D'

Here's your first problem: the name "a" is not mapped to the same string you
think it is. Python interprets the backslashes in a quoted string as escape
sequences, unless you specify a raw string by putting an "r" before the
first quotation mark. Specifically, it interprets the sequence \1 as the
character with an ASCII value of 1. So what you've done is:
R0?.2646?.2649\D

(The preceding line contains non-ASCII characters that may display oddly...)

I think what you really want is:
R0\1.2646\1.2649\D

Once you have stored your string properly, what's wrong with this?
['R0', '1.2646', '1.2649', 'D']
 
L

Luis P. Mendes

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

thanks!

I was just considering the effect of back slashes in the search/replace
criterium and not in the string itself. Thank you again!

Luis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBeWpaHn4UHCY8rB8RAtK+AJ421wDXodQ0zCM9AoDuayojh+uFUQCeJPL2
rBWMVP36h6SBKLvqcWecBw4=
=R4+I
-----END PGP SIGNATURE-----
 
A

Alex Martelli

Luis P. Mendes said:
So, the trick is to put the r in front of the string?!

If you want a literal string with backslashes in it, you either double
each backslash or use a rawliteral (r in front).
y='R0\1.2646\1.2649\D'

the third character of y, \1, is in this case the byte with value 1 in
the ASCII code, etc. Apparently that's not what you want.

But is this string going to be a literal in your code, rather than, say,
read from a file? Sounds unlikely. When you read from a file there's
no escape-sequence interpretation, so the issue of how to write
literals, raw or otherwise, is irrelevant there.


Alex
 
B

Bengt Richter

x = r'R0\1.2646\1.2649\D'
elements = x.split('\\')


A raw literal can't end with an odd number of backslashes (_some_ way
has to be there to escape the quote char, after all).
Hm, just had the thought that something analogous to HDLC bit-stuffing
could be used. IIRC bitstreams had escape flags composed of 5 successive bits,
and if you wanted to transmit 5 successive data bits, you just added an extra bit
at the end to make 6 to show that the five did not comprise a flag. The extra bits
would get dropped on decoding when a 6th 1 followed 11111 and would be recognized
as a flag otherwise.

Translating this to quoted character sequences, we could have an alternate triple
quoted raw string format, with quote-stuffing instead of escapes. I.e., to quote
three successive quote characters, we stuff a 4th quote, which the tokenizer drops
as it creates the internal byte sequence string representation, so we don't need
escapes in the usual sense.

Thus (using f prefix to indicate flagged quote-stuffing syntax) you could write:

x = f'''c:\whatever\'''

and to quote the line above (without taking advantage of alternate quotes):

q = f''' x = f''''c:\whatever\'''''''
^^^ ^^^| ^^^|^^^

where ^^^ is flag and | indicates a stuffed quote that
makes the previous otherwise-flag into three quotes in the data.
You could quote again (using same type quote for illustrative purposes
again, since oviously you could do better using both ' and "):

r = f'''f'''' x = f'''''c:\whatever\'''''''''''
^^^| ^^^| ^^^|^^^|^^^

(I think ;-)

I guess the worst-case data to quote would be a repeating pattern of
'''""" or """''' since neither type of quote character would give an
advantage, but 1-in-6 overhead is still not too bad, and it would be rare.

Is there a hole in this raw string quoting syntax?

Regards,
Bengt Richter
 
B

Bengt Richter

Hm, just had the thought that something analogous to HDLC bit-stuffing
could be used. IIRC bitstreams had escape flags composed of 5 successive bits,
and if you wanted to transmit 5 successive data bits, you just added an extra bit
at the end to make 6 to show that the five did not comprise a flag. The extra bits
would get dropped on decoding when a 6th 1 followed 11111 and would be recognized
as a flag otherwise.
BZZT! wrong ;-(
The flag is 01111110 and I believe 0 gets stuffed after the 5th 1 to make sure
the flag is not part of data between real flags.
Translating this to quoted character sequences, we could have an alternate triple
quoted raw string format, with quote-stuffing instead of escapes. I.e., to quote
three successive quote characters, we stuff a 4th quote, which the tokenizer drops
as it creates the internal byte sequence string representation, so we don't need
escapes in the usual sense.
This does not work for e.g. quoting as single quote, so it's not general at all :-(
Thus (using f prefix to indicate flagged quote-stuffing syntax) you could write:

x = f'''c:\whatever\'''

and to quote the line above (without taking advantage of alternate quotes):

q = f''' x = f''''c:\whatever\'''''''
^^^ ^^^| ^^^|^^^

where ^^^ is flag and | indicates a stuffed quote that
makes the previous otherwise-flag into three quotes in the data.
You could quote again (using same type quote for illustrative purposes
again, since oviously you could do better using both ' and "):

r = f'''f'''' x = f'''''c:\whatever\'''''''''''
^^^| ^^^| ^^^|^^^|^^^

(I think ;-)

I guess the worst-case data to quote would be a repeating pattern of
'''""" or """''' since neither type of quote character would give an
advantage, but 1-in-6 overhead is still not too bad, and it would be rare.

Is there a hole in this raw string quoting syntax?
Unfortunately, yes.

I thought of another format, but it doesn't quote previously quoted arbitrary text
without modifying at least the last character, so phooey. Might as well go to the
previously suggested mime-style delimiting, which an editor macro could do for
arbitrary selected text. It could use str(time.time()) as delimiter text without
much risk, IWT.

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top