more on unescaping escapes

B

bvdp

So, we think something is working and send of a bug fix to our client :)

I'm not sure I understand this at all and wonder if there is bug?
'c:\\Program Files\\test'

so far, so good.
'c:\\Program Files\test'

Umm, not so good? The \\ before the P is okay, but the \\t is change to \t

and
c:\Program Files est

Now, decode() converted the \\t to a \t and print expanded the \t to a tab.

I would have thought that the \\t would have the same result as the \\P ???

Obviously my brain is missing something (hopefully obvious).
 
M

MRAB

bvdp said:
So, we think something is working and send of a bug fix to our client :)

I'm not sure I understand this at all and wonder if there is bug?

'c:\\Program Files\\test'

so far, so good.

'c:\\Program Files\test'

Umm, not so good? The \\ before the P is okay, but the \\t is change to \t
Decoding changes "\\x20" to "\x20", which is the same as " ", a space.

Decoding changes "\\t" to "\t", which is a tab.

Decoding _doesn't_ change "\\P" to "\P" because that's not a valid
escape sequence.
and

c:\Program Files est

Now, decode() converted the \\t to a \t and print expanded the \t to a tab.
\t is already a tab.
I would have thought that the \\t would have the same result as the \\P ???

Obviously my brain is missing something (hopefully obvious).
Before storing the string (writing it to the file), encode it and then
replace " " with "\\x20":

C:\Program Files\test

becomes:

C:\Program Files\test

and then:

C:\\Program\x20Files\\test

After fetching the string (reading it from the file), decode it:

C:\\Program\x20Files\\test

becomes:

C:\Program Files\test
 
B

bvdp

I'm getting hopelessly lost in a series of \\\\\\\ s :)

Let's see if this makes sense:
'c:\\Program Files\test'

In this case there are still 2 '\'s before the P; but only 1 before the
't'. Now, when it comes time to open the file windows accepts double
'\'s in the filename. So, life is fine. But, the essential part here is
that we're lucky we can use '\\' or '\' in a path. What if this wasn't true?

The following shows a bit of difference:
'c:\\Program Files\test'

In this case the interpreter has changed the '\P' to '\\P'. And if one
lists the string the '\t' really is a tab. No decode() at all in any of
this.

I guess the general rule would be to double up '\'s in filenames and (in
my program's case) to use the \x20 for spaces.

Thanks.
 
T

Tim Wintle

Let's see if this makes sense:

'c:\\Program Files\test'

Hint: try running

and see what's written - I think that the interpreter adds extra "\"
characters to escape things and make things more easy to read.

i.e.

so when it displays strings in the interpreter it includes escape
characters, when it is printed though the output is straight to stdout
and isn't escaped.

Hope that helps,

Tim Wintle
 
B

bvdp

andrew said:
do you know that a string with the letter "r" in front doesn't escape
slashes? it's intended for regular expressions, but would simplify things
for you here too.

just do

a=r'c:\\Program Files\test'

Yes, I knew that. Unfortunately in my program loop I really don't have
the opportuity to use a raw string.

But, good reminder. Thanks.
 
R

Rhodri James

So, we think something is working and send of a bug fix to our client :)

I'm not sure I understand this at all and wonder if there is bug?

'c:\\Program Files\\test'

so far, so good.

'c:\\Program Files\test'

Umm, not so good? The \\ before the P is okay, but the \\t is change to
\t

Well yes, that's what you asked it to do. The "string-escape" decoder
reads the string and replaces escape sequences with the corresponding
characters. Bear in mind that it's the string as it really is that is
being operated on, not the representation of it that you displayed
above. In other words:

b = a.decode("string-escape")

is equivalent to:

b = "C:\Program Files\test"

"\P" isn't a valid escape sequence, so it doesn't get replaced. "\t"
represents a tab, so it does.
 
R

Rhodri James

So, in this case I'm assuming that the interpreter is converting the
escapes on assignment.

The compiler converts the escapes on creating its internal
representation of the string, before assignment ever gets
involved.
 
C

Chris Rebert

[problem with Python and Windows paths using backslashes]

Is there any particular reason you can't just internally use regular
forward-slashes for the paths? They work in Windows from Python in
nearly all cases and you can easily interconvert using os.pathsep if
you want the path to be pretty when you show it to (or get it from)
the user or whatever.

Cheers,
Chris
 
B

bvdp

Chris said:
[problem with Python and Windows paths using backslashes]

Is there any particular reason you can't just internally use regular
forward-slashes for the paths? They work in Windows from Python in
nearly all cases and you can easily interconvert using os.pathsep if
you want the path to be pretty when you show it to (or get it from)
the user or whatever.

Cheers,
Chris

Just because I never really thought too much about it :) I'm doing my
work on a linux box and my user is on windows ... and he's used to using
'\' ... but, you are absolutely right! Just use '/' on both systems and
be done with it. Of course I still need to use \x20 for spaces, but that
is easy.

Thanks for the suggestion!
 
B

bvdp

Bear in mind that it's the string as it really is that is
being operated on, not the representation of it that you displayed

Yes, that is the confusion ... what is displayed and what's actually in
the string.

I think I understand it all now :)

Thanks.
 
R

Rhodri James

Just because I never really thought too much about it :) I'm doing my
work on a linux box and my user is on windows ... and he's used to using
'\' ... but, you are absolutely right! Just use '/' on both systems and
be done with it. Of course I still need to use \x20 for spaces, but that
is easy.

Erm, no. "\x20" is exactly the same as " " in a string literal.
 
M

Mel

bvdp said:
Not sure if it's more clear or not :)

c:\Program Files\test

Which is all fine. And I didn't need to use decode().

So, in this case I'm assuming that the interpreter is converting the
escapes on assignment. And, in this case the string has single \s in it.

Strictly speaking, the compiler is converting the escapes when it uses the
literal to create a string value.

Mel.
 
G

Gabriel Genellina

Chris said:
[problem with Python and Windows paths using backslashes]
Is there any particular reason you can't just internally use regular
forward-slashes for the paths? They work in Windows from Python in
nearly all cases and you can easily interconvert using os.pathsep if
you want the path to be pretty when you show it to (or get it from)
the user or whatever.

Just because I never really thought too much about it :) I'm doing my
work on a linux box and my user is on windows ... and he's used to using
'\' ... but, you are absolutely right! Just use '/' on both systems and
be done with it. Of course I still need to use \x20 for spaces, but that
is easy.

Why is that? "\x20" is exactly the same as " ". It's not like %20 in URLs,
that becomes a space only after decoding.

py> '\x20' == ' '
True
py> '\x20' is ' '
True

(ok, the last line might show False, but being True means that both are
the very same object)
 
B

bvdp

Gabriel said:
Chris said:
[problem with Python and Windows paths using backslashes]
Is there any particular reason you can't just internally use regular
forward-slashes for the paths? They work in Windows from Python in
nearly all cases and you can easily interconvert using os.pathsep if
you want the path to be pretty when you show it to (or get it from)
the user or whatever.

Just because I never really thought too much about it :) I'm doing my
work on a linux box and my user is on windows ... and he's used to
using '\' ... but, you are absolutely right! Just use '/' on both
systems and be done with it. Of course I still need to use \x20 for
spaces, but that is easy.

Why is that? "\x20" is exactly the same as " ". It's not like %20 in
URLs, that becomes a space only after decoding.

py> '\x20' == ' '
True
py> '\x20' is ' '
True

(ok, the last line might show False, but being True means that both are
the very same object)

I need to use the \x20 because of my parser. I'm reading unquoted lines
from a file. The file creater needs to use the form "foo\x20bar" without
the quotes in the file so my parser can read it as a single token.
Later, the string/token needs to be decoded with the \x20 converted to a
space.

So, in my file "foo bar" (no quotes) is read as 2 tokens; "foo\x20bar"
is one.

So, it's not really a problem of what happens when you assign a string
in the form "foo bar", rather how to convert the \x20 in a string to a
space. I think the \\ just complicates the entire issue.
 
G

Gabriel Genellina

Gabriel said:
Chris Rebert wrote:
[problem with Python and Windows paths using backslashes]
Is there any particular reason you can't just internally use regular
forward-slashes for the paths? [...]

you are absolutely right! Just use '/' on both systems and be done
with it. Of course I still need to use \x20 for spaces, but that is
easy.
Why is that? "\x20" is exactly the same as " ". It's not like %20 in
URLs, that becomes a space only after decoding.

I need to use the \x20 because of my parser. I'm reading unquoted lines
from a file. The file creater needs to use the form "foo\x20bar" without
the quotes in the file so my parser can read it as a single token.
Later, the string/token needs to be decoded with the \x20 converted to a
space.

So, in my file "foo bar" (no quotes) is read as 2 tokens; "foo\x20bar"
is one.

So, it's not really a problem of what happens when you assign a string
in the form "foo bar", rather how to convert the \x20 in a string to a
space. I think the \\ just complicates the entire issue.

Just thinking, if you was reading the string from a file, why were you
worried about \\ and \ in the first place? (Ok, you moved to use / so this
is moot now).
 
B

bvdp

Gabriel said:
Gabriel said:
En Mon, 23 Feb 2009 22:46:34 -0200, bvdp <[email protected]> escribió:
Chris Rebert wrote:
[problem with Python and Windows paths using backslashes]
Is there any particular reason you can't just internally use regular
forward-slashes for the paths? [...]

you are absolutely right! Just use '/' on both systems and be done
with it. Of course I still need to use \x20 for spaces, but that is
easy.
Why is that? "\x20" is exactly the same as " ". It's not like %20 in
URLs, that becomes a space only after decoding.

I need to use the \x20 because of my parser. I'm reading unquoted
lines from a file. The file creater needs to use the form "foo\x20bar"
without the quotes in the file so my parser can read it as a single
token. Later, the string/token needs to be decoded with the \x20
converted to a space.

So, in my file "foo bar" (no quotes) is read as 2 tokens; "foo\x20bar"
is one.

So, it's not really a problem of what happens when you assign a string
in the form "foo bar", rather how to convert the \x20 in a string to a
space. I think the \\ just complicates the entire issue.

Just thinking, if you was reading the string from a file, why were you
worried about \\ and \ in the first place? (Ok, you moved to use / so
this is moot now).

Just cruft introduced while I was trying to figure it all out. Having to
figure the \\ and \x20 at same time with file and keyboard input just
confused the entire issue :) Having the user set a line like
c:\\Program\x20File ... works just fine. I'll suggest he use
c:/program\x20files to make it bit simple for HIM, not my parser.
Unfortunately, due to some bad design decisions on my part about 5 years
ago I'm afraid I'm stuck with the \x20.

Thanks.
 
A

Adam Olsen

Gabriel said:
Gabriel Genellina wrote:
En Mon, 23 Feb 2009 22:46:34 -0200, bvdp <[email protected]> escribió:
Chris Rebert wrote:
[problem with Python and Windows paths using backslashes]
 Is there any particular reason you can't just internally use regular
forward-slashes for the paths? [...]
you are absolutely right! Just use '/' on both systems and be done
with it. Of course I still need to use \x20 for spaces, but that is
easy.
Why is that? "\x20" is exactly the same as " ". It's not like %20 in
URLs, that becomes a space only after decoding.
I need to use the \x20 because of my parser. I'm reading unquoted
lines from a file. The file creater needs to use the form "foo\x20bar"
without the quotes in the file so my parser can read it as a single
token. Later, the string/token needs to be decoded with the \x20
converted to a space.
So, in my file "foo bar" (no quotes) is read as 2 tokens; "foo\x20bar"
is one.
So, it's not really a problem of what happens when you assign a string
in the form "foo bar", rather how to convert the \x20 in a string to a
space. I think the \\ just complicates the entire issue.
Just thinking, if you was reading the string from a file, why were you
worried about \\ and \ in the first place? (Ok, you moved to use / so
this is moot now).

Just cruft introduced while I was trying to figure it all out. Having to
figure the \\ and \x20 at same time with file and keyboard input just
confused the entire issue :) Having the user set a line like
c:\\Program\x20File ... works just fine. I'll suggest he use
c:/program\x20files to make it bit simple for HIM, not my parser.
Unfortunately, due to some bad design decisions on my part about 5 years
ago I'm afraid I'm stuck with the \x20.

Thanks.

You're confusing the python source with the actual contents of the
string. We already do one pass at decoding, which is why \x20 is
quite literally no different from a space:
' '

However, the interactive interpreter uses repr(x), so various
characters that are considered formatting, such as a tab, get
reescaped when printing:
1

It really is a tab that gets stored there, not the escape for one.

Finally, if you give python an unknown escape it passes it leaves it
as an escape. Then, when the interactive interpreter uses repr(x), it
is the backslash itself that gets reescaped:
['\\', 'P']

What does this all mean? If you want to test your parser with python
literals you need to escape them twice, like so:
['c', ':', '\\', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', '\\', 'x',
'2', '0', 'F', 'i', 'l', 'e', 's', '\\', '\\', 't', 'e', 's', 't']['c', ':', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', ' ', 'F', 'i',
'l', 'e', 's', '\\', 't', 'e', 's', 't']

However, there's an easier way: use raw strings, which prevent python
from unescaping anything:
['c', ':', '\\', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', '\\', 'x',
'2', '0', 'F', 'i', 'l', 'e', 's', '\\', '\\', 't', 'e', 's', 't']
 
B

bvdp

Adam said:
Gabriel said:
En Mon, 23 Feb 2009 23:31:20 -0200, bvdp <[email protected]> escribió:
Gabriel Genellina wrote:
En Mon, 23 Feb 2009 22:46:34 -0200, bvdp <[email protected]> escribió:
Chris Rebert wrote:
[problem with Python and Windows paths using backslashes]
Is there any particular reason you can't just internally use regular
forward-slashes for the paths? [...]
you are absolutely right! Just use '/' on both systems and be done
with it. Of course I still need to use \x20 for spaces, but that is
easy.
Why is that? "\x20" is exactly the same as " ". It's not like %20 in
URLs, that becomes a space only after decoding.
I need to use the \x20 because of my parser. I'm reading unquoted
lines from a file. The file creater needs to use the form "foo\x20bar"
without the quotes in the file so my parser can read it as a single
token. Later, the string/token needs to be decoded with the \x20
converted to a space.
So, in my file "foo bar" (no quotes) is read as 2 tokens; "foo\x20bar"
is one.
So, it's not really a problem of what happens when you assign a string
in the form "foo bar", rather how to convert the \x20 in a string to a
space. I think the \\ just complicates the entire issue.
Just thinking, if you was reading the string from a file, why were you
worried about \\ and \ in the first place? (Ok, you moved to use / so
this is moot now).
Just cruft introduced while I was trying to figure it all out. Having to
figure the \\ and \x20 at same time with file and keyboard input just
confused the entire issue :) Having the user set a line like
c:\\Program\x20File ... works just fine. I'll suggest he use
c:/program\x20files to make it bit simple for HIM, not my parser.
Unfortunately, due to some bad design decisions on my part about 5 years
ago I'm afraid I'm stuck with the \x20.

Thanks.

You're confusing the python source with the actual contents of the
string. We already do one pass at decoding, which is why \x20 is
quite literally no different from a space:
' '

However, the interactive interpreter uses repr(x), so various
characters that are considered formatting, such as a tab, get
reescaped when printing:
1

It really is a tab that gets stored there, not the escape for one.

Finally, if you give python an unknown escape it passes it leaves it
as an escape. Then, when the interactive interpreter uses repr(x), it
is the backslash itself that gets reescaped:
['\\', 'P']

What does this all mean? If you want to test your parser with python
literals you need to escape them twice, like so:
['c', ':', '\\', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', '\\', 'x',
'2', '0', 'F', 'i', 'l', 'e', 's', '\\', '\\', 't', 'e', 's', 't']['c', ':', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', ' ', 'F', 'i',
'l', 'e', 's', '\\', 't', 'e', 's', 't']

However, there's an easier way: use raw strings, which prevent python
from unescaping anything:
['c', ':', '\\', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', '\\', 'x',
'2', '0', 'F', 'i', 'l', 'e', 's', '\\', '\\', 't', 'e', 's', 't']

Thank you. That is very clear. Appreciate your time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top