Windows file paths, again

D

Dan Guido

I'm trying to write a few methods that normalize Windows file paths.
I've gotten it to work in 99% of the cases, but it seems like my code
still chokes on '\x'. I've pasted my code below, can someone help me
figure out a better way to write this? This seems overly complicated
for such a simple problem...


# returns normalized filepath with arguments removed
def remove_arguments(filepath):
#print "removing args from: " + filepath
(head, tail) = os.path.split(filepath)
pathext = os.environ['PATHEXT'].split(";")

while(tail != ''):
#print "trying: " + os.path.join(head,tail)

# does it just work?
if os.path.isfile(os.path.join(head, tail)):
#print "it just worked"
return os.path.join(head, tail)

# try every extension
for ext in pathext:
if os.path.isfile(os.path.join(head, tail) + ext):
return os.path.join(head, tail) + ext

# remove the last word, try again
tail = tail.split()[:-1]
tail = " ".join(tail)

return None

escape_dict={'\a':r'\a',
'\b':r'\b',
'\c':r'\c',
'\f':r'\f',
'\n':r'\n',
'\r':r'\r',
'\t':r'\t',
'\v':r'\v',
'\'':r'\'',
#'\"':r'\"',
'\0':r'\0',
'\1':r'\1',
'\2':r'\2',
'\3':r'\3',
'\4':r'\4',
'\5':r'\5',
'\6':r'\6',
'\7':r'\a', #i have no idea
'\8':r'\8',
'\9':r'\9'}

def raw(text):
"""Returns a raw string representation of text"""
new_string=''
for char in text:
try:
new_string+=escape_dict[char]
#print "escaped"
except KeyError:
new_string+=char
#print "keyerror"
#print new_string
return new_string

# returns the normalized path to a file if it exists
# returns None if it doesn't exist
def normalize_path(path):
#print "not normal: " + path

# make sure it's not blank
if(path == ""):
return None

# get rid of mistakenly escaped bytes
path = raw(path)
#print "step1: " + path

# remove quotes
path = path.replace('"', '')
#print "step2: " + path

#convert to lowercase
lower = path.lower()
#print "step3: " + lower

# expand all the normally formed environ variables
expanded = os.path.expandvars(lower)
#print "step4: " + expanded

# chop off \??\
if expanded[:4] == "\\??\\":
expanded = expanded[4:]
#print "step5: " + expanded

# strip a leading '/'
if expanded[:1] == "\\":
expanded = expanded[1:]
#print "step7: " + expanded

systemroot = os.environ['SYSTEMROOT']

# sometimes systemroot won't have %
r = re.compile('systemroot', re.IGNORECASE)
expanded = r.sub(systemroot, expanded)
#print "step8: " + expanded

# prepend the %systemroot% if its missing
if expanded[:8] == "system32" or "syswow64":
expanded = os.path.join(systemroot, expanded)
#print "step9: " + expanded

stripped = remove_arguments(expanded.lower())

# just in case you're running as LUA
# this is a race condition but you can suck it
if(stripped):
if os.access(stripped, os.R_OK):
return stripped

return None

def test_normalize():
test1 = "\??\C:\WINDOWS\system32\Drivers\CVPNDRVA.sys"
test2 = "C:\WINDOWS\system32\msdtc.exe"
test3 = "%SystemRoot%\system32\svchost.exe -k netsvcs"
test4 = "\SystemRoot\System32\drivers\vga.sys"
test5 = "system32\DRIVERS\compbatt.sys"
test6 = "C:\Program Files\ABC\DEC Windows Services\Client Services.exe"
test7 = "c:\Program Files\Common Files\Symantec Shared\SNDSrvc.exe"
test8 = "C:\WINDOWS\system32\svchost -k dcomlaunch"
test9 = ""
test10 = "SysWow64\drivers\AsIO.sys"
test11 = "\SystemRoot\system32\DRIVERS\amdsbs.sys"
test12 = "C:\windows\system32\xeuwhatever.sys" #this breaks everything

print normalize_path(test1)
print normalize_path(test2)
print normalize_path(test3)
print normalize_path(test4)
print normalize_path(test5)
print normalize_path(test6)
print normalize_path(test7)
print normalize_path(test8)
print normalize_path(test9)
print normalize_path(test10)
print normalize_path(test11)
print normalize_path(test12)
 
D

Diez B. Roggisch

Dan said:
I'm trying to write a few methods that normalize Windows file paths.
I've gotten it to work in 99% of the cases, but it seems like my code
still chokes on '\x'. I've pasted my code below, can someone help me
figure out a better way to write this? This seems overly complicated
for such a simple problem...


# returns normalized filepath with arguments removed
def remove_arguments(filepath):
#print "removing args from: " + filepath
(head, tail) = os.path.split(filepath)
pathext = os.environ['PATHEXT'].split(";")

while(tail != ''):
#print "trying: " + os.path.join(head,tail)

# does it just work?
if os.path.isfile(os.path.join(head, tail)):
#print "it just worked"
return os.path.join(head, tail)

# try every extension
for ext in pathext:
if os.path.isfile(os.path.join(head, tail) + ext):
return os.path.join(head, tail) + ext

# remove the last word, try again
tail = tail.split()[:-1]
tail = " ".join(tail)

return None

escape_dict={'\a':r'\a',
'\b':r'\b',
'\c':r'\c',
'\f':r'\f',
'\n':r'\n',
'\r':r'\r',
'\t':r'\t',
'\v':r'\v',
'\'':r'\'',
#'\"':r'\"',
'\0':r'\0',
'\1':r'\1',
'\2':r'\2',
'\3':r'\3',
'\4':r'\4',
'\5':r'\5',
'\6':r'\6',
'\7':r'\a', #i have no idea
'\8':r'\8',
'\9':r'\9'}

def raw(text):
"""Returns a raw string representation of text"""
new_string=''
for char in text:
try:
new_string+=escape_dict[char]
#print "escaped"
except KeyError:
new_string+=char
#print "keyerror"
#print new_string
return new_string

# returns the normalized path to a file if it exists
# returns None if it doesn't exist
def normalize_path(path):
#print "not normal: " + path

# make sure it's not blank
if(path == ""):
return None

# get rid of mistakenly escaped bytes
path = raw(path)
#print "step1: " + path

# remove quotes
path = path.replace('"', '')
#print "step2: " + path

#convert to lowercase
lower = path.lower()
#print "step3: " + lower

# expand all the normally formed environ variables
expanded = os.path.expandvars(lower)
#print "step4: " + expanded

# chop off \??\
if expanded[:4] == "\\??\\":
expanded = expanded[4:]
#print "step5: " + expanded

# strip a leading '/'
if expanded[:1] == "\\":
expanded = expanded[1:]
#print "step7: " + expanded

systemroot = os.environ['SYSTEMROOT']

# sometimes systemroot won't have %
r = re.compile('systemroot', re.IGNORECASE)
expanded = r.sub(systemroot, expanded)
#print "step8: " + expanded

# prepend the %systemroot% if its missing
if expanded[:8] == "system32" or "syswow64":
expanded = os.path.join(systemroot, expanded)
#print "step9: " + expanded

stripped = remove_arguments(expanded.lower())

# just in case you're running as LUA
# this is a race condition but you can suck it
if(stripped):
if os.access(stripped, os.R_OK):
return stripped

return None

def test_normalize():
test1 = "\??\C:\WINDOWS\system32\Drivers\CVPNDRVA.sys"
test2 = "C:\WINDOWS\system32\msdtc.exe"
test3 = "%SystemRoot%\system32\svchost.exe -k netsvcs"
test4 = "\SystemRoot\System32\drivers\vga.sys"
test5 = "system32\DRIVERS\compbatt.sys"
test6 = "C:\Program Files\ABC\DEC Windows Services\Client Services.exe"
test7 = "c:\Program Files\Common Files\Symantec Shared\SNDSrvc.exe"
test8 = "C:\WINDOWS\system32\svchost -k dcomlaunch"
test9 = ""
test10 = "SysWow64\drivers\AsIO.sys"
test11 = "\SystemRoot\system32\DRIVERS\amdsbs.sys"
test12 = "C:\windows\system32\xeuwhatever.sys" #this breaks everything

If I'm getting this right, what you try to do is to convert characters that
come from string-literal escape-codes to their literal representation. Why?

A simple

test12 = r"C:\windows\system32\xeuwhatever.sys"

is all you need - note the leading r. Then

test12[2] == "\\" # need escape on the right because of backslashes at end
of raw-string-literals rule.

holds.

Diez
 
D

Dan Guido

Hi Diez,

The source of the string literals is ConfigParser, so I can't just
mark them with an 'r'.

config = ConfigParser.RawConfigParser()
config.read(filename)
crazyfilepath = config.get(name, "ImagePath")
normalfilepath = normalize_path(crazyfilepath)

The ultimate origin of the strings is the _winreg function. Here I
also can't mark them with an 'r'.

regkey = OpenKey(HKEY_LOCAL_MACHINE,
"SYSTEM\\CurrentControlSet\\Services\\" + name)
crazyimagepath = QueryValueEx(regkey, "ImagePath")[0]
CloseKey(key)

--
Dan Guido



Dan said:
I'm trying to write a few methods that normalize Windows file paths.
I've gotten it to work in 99% of the cases, but it seems like my code
still chokes on '\x'. I've pasted my code below, can someone help me
figure out a better way to write this? This seems overly complicated
for such a simple problem...


# returns normalized filepath with arguments removed
def remove_arguments(filepath):
#print "removing args from: " + filepath
(head, tail) = os.path.split(filepath)
pathext = os.environ['PATHEXT'].split(";")

while(tail != ''):
#print "trying: " + os.path.join(head,tail)

# does it just work?
if os.path.isfile(os.path.join(head, tail)):
#print "it just worked"
return os.path.join(head, tail)

# try every extension
for ext in pathext:
if os.path.isfile(os.path.join(head, tail) + ext):
return os.path.join(head, tail) + ext

# remove the last word, try again
tail = tail.split()[:-1]
tail = " ".join(tail)

return None

escape_dict={'\a':r'\a',
           '\b':r'\b',
           '\c':r'\c',
           '\f':r'\f',
           '\n':r'\n',
           '\r':r'\r',
           '\t':r'\t',
           '\v':r'\v',
           '\'':r'\'',
           #'\"':r'\"',
           '\0':r'\0',
           '\1':r'\1',
           '\2':r'\2',
           '\3':r'\3',
           '\4':r'\4',
           '\5':r'\5',
           '\6':r'\6',
           '\7':r'\a', #i have no idea
           '\8':r'\8',
           '\9':r'\9'}

def raw(text):
"""Returns a raw string representation of text"""
new_string=''
for char in text:
try:
new_string+=escape_dict[char]
#print "escaped"
except KeyError:
new_string+=char
#print "keyerror"
#print new_string
return new_string

# returns the normalized path to a file if it exists
# returns None if it doesn't exist
def normalize_path(path):
#print "not normal: " + path

# make sure it's not blank
if(path == ""):
return None

# get rid of mistakenly escaped bytes
path = raw(path)
#print "step1: " + path

# remove quotes
path = path.replace('"', '')
#print "step2: " + path

#convert to lowercase
lower = path.lower()
#print "step3: " + lower

# expand all the normally formed environ variables
expanded = os.path.expandvars(lower)
#print "step4: " + expanded

# chop off \??\
if expanded[:4] == "\\??\\":
expanded = expanded[4:]
#print "step5: " + expanded

# strip a leading '/'
if expanded[:1] == "\\":
expanded = expanded[1:]
#print "step7: " + expanded

systemroot = os.environ['SYSTEMROOT']

# sometimes systemroot won't have %
r = re.compile('systemroot', re.IGNORECASE)
expanded = r.sub(systemroot, expanded)
#print "step8: " + expanded

# prepend the %systemroot% if its missing
if expanded[:8] == "system32" or "syswow64":
expanded = os.path.join(systemroot, expanded)
#print "step9: " + expanded

stripped = remove_arguments(expanded.lower())

# just in case you're running as LUA
# this is a race condition but you can suck it
if(stripped):
if os.access(stripped, os.R_OK):
return stripped

return None

def test_normalize():
test1 = "\??\C:\WINDOWS\system32\Drivers\CVPNDRVA.sys"
test2 = "C:\WINDOWS\system32\msdtc.exe"
test3 = "%SystemRoot%\system32\svchost.exe -k netsvcs"
test4 = "\SystemRoot\System32\drivers\vga.sys"
test5 = "system32\DRIVERS\compbatt.sys"
test6 = "C:\Program Files\ABC\DEC Windows Services\Client Services.exe"
test7 = "c:\Program Files\Common Files\Symantec Shared\SNDSrvc.exe"
test8 = "C:\WINDOWS\system32\svchost -k dcomlaunch"
test9 = ""
test10 = "SysWow64\drivers\AsIO.sys"
test11 = "\SystemRoot\system32\DRIVERS\amdsbs.sys"
test12 = "C:\windows\system32\xeuwhatever.sys" #this breaks everything

If I'm getting this right, what you try to do is to convert characters that
come from string-literal escape-codes to their literal representation. Why?

A simple

 test12 = r"C:\windows\system32\xeuwhatever.sys"

is all you need - note the leading r. Then

 test12[2] == "\\" # need escape on the right because of backslashes at end
of raw-string-literals rule.

holds.

Diez
 
A

Anthony Tolle

Hi Diez,

The source of the string literals is ConfigParser, so I can't just
mark them with an 'r'.

config = ConfigParser.RawConfigParser()
config.read(filename)
crazyfilepath = config.get(name, "ImagePath")
normalfilepath = normalize_path(crazyfilepath)

The ultimate origin of the strings is the _winreg function. Here I
also can't mark them with an 'r'.

regkey = OpenKey(HKEY_LOCAL_MACHINE,
"SYSTEM\\CurrentControlSet\\Services\\" + name)
crazyimagepath = QueryValueEx(regkey, "ImagePath")[0]
CloseKey(key)

I just did a quick test using Python 2.5.1 with the following script
on Windows:

# start of test.py
import ConfigParser
config = ConfigParser.RawConfigParser()
config.read("cfg.ini")
x = config.get("foo", "bar")
print x
print repr(x)
from _winreg import *
regkey = OpenKey(HKEY_LOCAL_MACHINE,
r"SYSTEM\CurrentControlSet\Services\IPSec")
x = QueryValueEx(regkey, "ImagePath")[0]
CloseKey(regkey)
print x
print repr(x)
# end of test.py


Here is the contesnts of cfg.ini:

[foo]
bar=c:\dir\file.txt


Here is the output of the script:

c:\dir\file.txt
'c:\\dir\\file.txt'
system32\DRIVERS\ipsec.sys
u'system32\\DRIVERS\\ipsec.sys'


In either case, I don't see the functions returning strings that
requires special handling. The backslashes are properly escaped in
the repr of both strings.

Something else must be going on if the strings are getting messed up
along the way.
 
D

Dan Guido

Hi Anthony,

Thanks for your reply, but I don't think your tests have any control
characters in them. Try again with a \v, a \n, or a \x in your input
and I think you'll find it doesn't work as expected.

--
Dan Guido



Hi Diez,

The source of the string literals is ConfigParser, so I can't just
mark them with an 'r'.

config = ConfigParser.RawConfigParser()
config.read(filename)
crazyfilepath = config.get(name, "ImagePath")
normalfilepath = normalize_path(crazyfilepath)

The ultimate origin of the strings is the _winreg function. Here I
also can't mark them with an 'r'.

regkey = OpenKey(HKEY_LOCAL_MACHINE,
"SYSTEM\\CurrentControlSet\\Services\\" + name)
crazyimagepath = QueryValueEx(regkey, "ImagePath")[0]
CloseKey(key)

I just did a quick test using Python 2.5.1 with the following script
on Windows:

# start of test.py
import ConfigParser
config = ConfigParser.RawConfigParser()
config.read("cfg.ini")
x = config.get("foo", "bar")
print x
print repr(x)
from _winreg import *
regkey = OpenKey(HKEY_LOCAL_MACHINE,
r"SYSTEM\CurrentControlSet\Services\IPSec")
x = QueryValueEx(regkey, "ImagePath")[0]
CloseKey(regkey)
print x
print repr(x)
# end of test.py


Here is the contesnts of cfg.ini:

[foo]
bar=c:\dir\file.txt


Here is the output of the script:

c:\dir\file.txt
'c:\\dir\\file.txt'
system32\DRIVERS\ipsec.sys
u'system32\\DRIVERS\\ipsec.sys'


In either case, I don't see the functions returning strings that
requires special handling.  The backslashes are properly escaped in
the repr of both strings.

Something else must be going on if the strings are getting messed up
along the way.
 
D

Dave Angel

Dan said:
Hi Diez,

The source of the string literals is ConfigParser, so I can't just
mark them with an 'r'.

config =onfigParser.RawConfigParser()
config.read(filename)
crazyfilepath =onfig.get(name, "ImagePath")
normalfilepath =ormalize_path(crazyfilepath)

The ultimate origin of the strings is the _winreg function. Here I
also can't mark them with an 'r'.

regkey =penKey(HKEY_LOCAL_MACHINE,
"SYSTEM\\CurrentControlSet\\Services\\" + name)
crazyimagepath =ueryValueEx(regkey, "ImagePath")[0]
CloseKey(key)

--
Dan Guido



Dan Guido wrote:

I'm trying to write a few methods that normalize Windows file paths.
I've gotten it to work in 99% of the cases, but it seems like my code
still chokes on '\x'. I've pasted my code below, can someone help me
figure out a better way to write this? This seems overly complicated
for such a simple problem...


# returns normalized filepath with arguments removed
def remove_arguments(filepath):
#print "removing args from: " + filepath
(head, tail) =s.path.split(filepath)
pathext =s.environ['PATHEXT'].split(";")

while(tail !='):
#print "trying: " + os.path.join(head,tail)

# does it just work?
if os.path.isfile(os.path.join(head, tail)):
#print "it just worked"
return os.path.join(head, tail)

# try every extension
for ext in pathext:
if os.path.isfile(os.path.join(head, tail) + ext):
return os.path.join(head, tail) + ext

# remove the last word, try again
tail =ail.split()[:-1]
tail = ".join(tail)

return None

escape_dict=\a':r'\a',
'\b':r'\b',
'\c':r'\c',
'\f':r'\f',
'\n':r'\n',
'\r':r'\r',
'\t':r'\t',
'\v':r'\v',
'\'':r'\'',
#'\"':r'\"',
'\0':r'\0',
'\1':r'\1',
'\2':r'\2',
'\3':r'\3',
'\4':r'\4',
'\5':r'\5',
'\6':r'\6',
'\7':r'\a', #i have no idea
'\8':r'\8',
'\9':r'\9'}

def raw(text):
"""Returns a raw string representation of text"""
new_string=
for char in text:
try:
new_string+=cape_dict[char]
#print "escaped"
except KeyError:
new_string+=ar
#print "keyerror"
#print new_string
return new_string

# returns the normalized path to a file if it exists
# returns None if it doesn't exist
def normalize_path(path):
#print "not normal: " + path

# make sure it's not blank
if(path =""):
return None

# get rid of mistakenly escaped bytes
path =aw(path)
#print "step1: " + path

# remove quotes
path =ath.replace('"', '')
#print "step2: " + path

#convert to lowercase
lower =ath.lower()
#print "step3: " + lower

# expand all the normally formed environ variables
expanded =s.path.expandvars(lower)
#print "step4: " + expanded

# chop off \??\
if expanded[:4] ="\\??\\":
expanded =xpanded[4:]
#print "step5: " + expanded

# strip a leading '/'
if expanded[:1] ="\\":
expanded =xpanded[1:]
#print "step7: " + expanded

systemroot =s.environ['SYSTEMROOT']

# sometimes systemroot won't have %
r =e.compile('systemroot', re.IGNORECASE)
expanded =.sub(systemroot, expanded)
#print "step8: " + expanded

# prepend the %systemroot% if its missing
if expanded[:8] ="system32" or "syswow64":
expanded =s.path.join(systemroot, expanded)
#print "step9: " + expanded

stripped =emove_arguments(expanded.lower())

# just in case you're running as LUA
# this is a race condition but you can suck it
if(stripped):
if os.access(stripped, os.R_OK):
return stripped

return None

def test_normalize():
test1 =\??\C:\WINDOWS\system32\Drivers\CVPNDRVA.sys"
test2 =C:\WINDOWS\system32\msdtc.exe"
test3 =%SystemRoot%\system32\svchost.exe -k netsvcs"
test4 =\SystemRoot\System32\drivers\vga.sys"
test5 =system32\DRIVERS\compbatt.sys"
test6 =C:\Program Files\ABC\DEC Windows Services\Client Services.exe"
test7 =c:\Program Files\Common Files\Symantec Shared\SNDSrvc.exe"
test8 =C:\WINDOWS\system32\svchost -k dcomlaunch"
test9 ="
test10 =SysWow64\drivers\AsIO.sys"
test11 =\SystemRoot\system32\DRIVERS\amdsbs.sys"
test12 =C:\windows\system32\xeuwhatever.sys" #this breaks everything
If I'm getting this right, what you try to do is to convert characters that
come from string-literal escape-codes to their literal representation. Why?

A simple

test12 ="C:\windows\system32\xeuwhatever.sys"

is all you need - note the leading r. Then

test12[2] ="\\" # need escape on the right because of backslashes at end
of raw-string-literals rule.

holds.

Diez
Your first problem is that you're mixing tabs and spaces in your source
code. Dangerous and confusing, not to mention an error in Python 3.x

The second problem is that your test_normalize() is called with a bunch
of invalid literals. Backslashes in quote literals need to be escaped,
or you need to use the raw form of literal. Now this may have nothing
to do with the data you get from ConfigParser or QueryValueEx(), but it
sure makes testing confusing.

The third problem is your raw() function. It seems like you're trying
to somehow build a version of the string that would pass muster as a
literal string. Unless you're trying to generate Python source code, I
can't see where this can possibly help. Perhaps you're just trying to
compensate for the second problem? If the actual strings are coming
from the registry, you won't need any of this complexity.

I don't see what your original problem is. Is it to take a registry
entry that contains both filepath and some other data, and separate out
just the filepath portion?

Maybe it'd be best if you could show us your config file, or at least
the ImagePath portion of it (with some context). Then let's look at the
actual value of

crazyfilepath:

print crazyfilepath
print repr(crazyfilepath)


Or you could tell us what registry entry is giving you grief. And maybe somebody could see what to do about it.

DaveA
 
L

Lie Ryan

Dan said:
Hi Anthony,

Thanks for your reply, but I don't think your tests have any control
characters in them. Try again with a \v, a \n, or a \x in your input
and I think you'll find it doesn't work as expected.

A path read from a file, config file, or winreg would never contain
control characters unless they contains that a control character.

My crystal ball thinks that you used eval or exec somewhere in your
script, which may cause a perfectly escaped path to get unescaped, like
here:

# python 3
path = 'C:\\path\\to\\somewhere.txt'
script = 'open("%s")' % path # this calls str(path)
exec(script)

OR

you stored the path incorrectly. Try seeing what exactly is stored in
the registry using regedit.



Remember that escape characters doesn't really exist in the in-memory
representation of the string. The escape characters exist only in string
literals (i.e. source code) and when you print the string using repr().
 
D

Dan Guido

I'm writing a test case right now, will update in a few minutes :).
I'm using Python 2.6.x

I need to read these values in from a configparser file or the windows
registry and get MD5 sums of the actual files on the filesystem and
copy the files to a new location. The open() method completely barfs
if I don't normalize the paths to the files first. I'll show the list,
just give me a little bit more time to separate the code from my
project that demonstrates this bug.
 
M

Matt McCredie

Dan Guido said:
Hi Anthony,

Thanks for your reply, but I don't think your tests have any control
characters in them. Try again with a \v, a \n, or a \x in your input
and I think you'll find it doesn't work as expected.


Why don't you try it yourself? He gave you the code. I changed cfg.ini to
contain the following:

[foo]
bar=C:\x\n\r\a\01\x32\foo.py


Which produced the following output:
C:\x\n\r\a\01\x32\foo.py
'C:\\x\\n\\r\\a\\01\\x32\\foo.py'

Looks like config parser worked just fine to me. There is a difference between a
python string literal written inside of a python script and a string read from a
file. When reading from a file (or the registry) what you see is what you get.
There is no need to do so much work.

Matt McCredie
 
T

Terry Reedy

Dan said:
Hi Diez,

The source of the string literals is ConfigParser, so I can't just
mark them with an 'r'.

Python string literals only exist in Python source code. Functions and
methods only return *strings*, not literals. If you mistakenly put the
str() representation of a string (such as print gives you) into source
code, rather than the repr() output, then you may have trouble.

tjr
 
J

Jerry Hill

This doesn't give me quite the results I expected, so I'll have to
take a closer look at my project as a whole tomorrow. The test cases
clearly show the need for all the fancy parsing I'm doing on the path
though.

To get back to what I think was your original question, there is an
easy way to take a string with control characters and turn it back
into a string with the control characters escaped, which could replace
your escape_dict and raw() function in normalize.py:

(Python 2.6.1 on windows XP)

More generally, it sounds like you have some bad data in either the
registry, or your ini file. You shouldn't have control characters in
there (unless you really have directories with control characters in
their names). If you have control over how those values are written,
you should probably fix the bad data at the source instead of fixing
it as you pull it back in.
 
D

Dave Angel

Dan said:
This doesn't give me quite the results I expected, so I'll have to
take a closer look at my project as a whole tomorrow. The test cases
clearly show the need for all the fancy parsing I'm doing on the path
though.

Looks like I'll return to this tomorrow and post an update as
appropriate. Thanks for the help so far!

For none of your test data does raw() change anything at all. These
strings do *not* need escaping.

Now some of the other things you do are interesting:

1) \??\ - presumably you're looking for a "long UNC." But that's
signaled by \\?\ It's used to indicate to some functions that
filenames over about 260 bytes are permissible.

2) The line:

if expanded[:8] == "system32" or "syswow64":

doesn't do what you think it does. it'll always evaluate as true, since
== has higher priority and "syswow64" is a non-empty string. If you
want to compare the string to both, you need to expand it out:

either if expanded[:8] == "system32" or expanded[:8] == "syswow64"
or simpler:
if expanded.startswith("system32") or expanded.startswith("syswow64"):

3) removing a leading backslash should imply that you replace it with
the current directory, at least in most contexts. I'm not sure what's
the right thing here.



DaveA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top