how to get os.py to use an ./ntpath.py instead of Lib/ntpath.py

R

ruck

In Python 2.7.2 on Windows 7,

os.walk() uses isdir(),
which comes from os.path,
which really comes from ntpath.py,
which really comes from genericpath.py

I want os.walk() to use a modified isdir() on my Windows 7.
Not knowing any better, it seems to me like ntpath.py would be a good place to intercept.

When os.py does "import ntpath as path",
how can I get python to process my customized ntpath.py
instead of Lib/ntpath.py ?

Thanks for any comments.
John

BTW, here's my mod to ntpath.py:
$ diff ntpath.py.standard ntpath.py
14c14,19
< from genericpath import *
---
from genericpath import *

def isdir(s):
return genericpath.isdir('\\\\?\\' + abspath(s + '\\'))
def isfile(s):
return genericpath.isfile('\\\\?\\' + abspath(s + '\\'))

Why? Because the genericpath implementation relies on os.stat() which
uses Windows API function that presumes or enforces some naming
conventions like "doesn't end with a space or a period".
But the NTFS actually supports such filenames and dirnames, and some sw
(like cygwin) lets users make files & dirs without restricting.
So, cygwin users like me may have file 'voo...\\doo' which os.walk()
cannot ordinarily walk. That is, the isdir('voo...') returns false
because the underlying os.stat is assessing 'voo' instead of 'voo...' .
The workaround is to pass os.stat a fullpathname that is prefixed
with r'\\?\' so the Windows API recognizes that you do NOT want the
name filtered.

Better said by Microsoft:
"For file I/O, the "\\?\" prefix to a path string tells
the Windows APIs to disable all string parsing and to
send the string that follows it straight to the file
system. For example, if the file system supports large
paths and file names, you can exceed the MAX_PATH limits
that are otherwise enforced by the Windows APIs."
 
S

Steven D'Aprano

In Python 2.7.2 on Windows 7,

os.walk() uses isdir(),
which comes from os.path,
which really comes from ntpath.py,
which really comes from genericpath.py

I want os.walk() to use a modified isdir() on my Windows 7. Not knowing
any better, it seems to me like ntpath.py would be a good place to
intercept.

When os.py does "import ntpath as path", how can I get python to process
my customized ntpath.py instead of Lib/ntpath.py ?

import os
os.path.isdir = my_isdir

ought to do it.

This general technique is called "monkey-patching". The Ruby community is
addicted to it. Everybody else -- and a goodly number of the more
sensible Ruby crowd -- consider it a risky, dirty hack that 99 times out
of 100 will lead to blindness, moral degeneracy and subtle, hard-to-fix
bugs.

They are right to be suspicious of it. As a general rule, monkey-patching
is not for production code. You have been warned.

http://www.codinghorror.com/blog/2008/07/monkeypatching-for-humans.html


[...]
Why? Because the genericpath implementation relies on os.stat() which
uses Windows API function that presumes or enforces some naming
conventions like "doesn't end with a space or a period". But the NTFS
actually supports such filenames and dirnames, and some sw (like cygwin)
lets users make files & dirs without restricting. So, cygwin users like
me may have file 'voo...\\doo' which os.walk() cannot ordinarily walk.
That is, the isdir('voo...') returns false because the underlying
os.stat is assessing 'voo' instead of 'voo...' .

Please consider submitting a patch that adds support for cygwin paths to
the standard library. You'll need to target 3.4 though, 2.7 is now a
maintenance release with no new features allowed.

The workaround is to
pass os.stat a fullpathname that is prefixed with r'\\?\' so the Windows
API recognizes that you do NOT want the name filtered.

Better said by Microsoft:
"For file I/O, the "\\?\" prefix to a path string tells the Windows APIs
to disable all string parsing and to send the string that follows it
straight to the file system.

That's not so much a workaround as the officially supported API for
dealing with the situation you are in. Why don't you just prepend a '?'
to paths like they tell you to?
 
R

ruck

In Python 2.7.2 on Windows 7,

os.walk() uses isdir(),
which comes from os.path,
which really comes from ntpath.py,
which really comes from genericpath.py

I want os.walk() to use a modified isdir() on my Windows 7. Not knowing
any better, it seems to me like ntpath.py would be a good place to


When os.py does "import ntpath as path", how can I get python to process
my customized ntpath.py instead of Lib/ntpath.py ?



import os

os.path.isdir = my_isdir



ought to do it.



This general technique is called "monkey-patching". The Ruby community is

addicted to it. Everybody else -- and a goodly number of the more

sensible Ruby crowd -- consider it a risky, dirty hack that 99 times out

of 100 will lead to blindness, moral degeneracy and subtle, hard-to-fix

bugs.



They are right to be suspicious of it. As a general rule, monkey-patching

is not for production code. You have been warned.



http://www.codinghorror.com/blog/2008/07/monkeypatching-for-humans.html





[...]
Why? Because the genericpath implementation relies on os.stat() which
uses Windows API function that presumes or enforces some naming
conventions like "doesn't end with a space or a period". But the NTFS
actually supports such filenames and dirnames, and some sw (like cygwin)
lets users make files & dirs without restricting. So, cygwin users like
me may have file 'voo...\\doo' which os.walk() cannot ordinarily walk.
That is, the isdir('voo...') returns false because the underlying
os.stat is assessing 'voo' instead of 'voo...' .



Please consider submitting a patch that adds support for cygwin paths to

the standard library. You'll need to target 3.4 though, 2.7 is now a

maintenance release with no new features allowed.




The workaround is to
pass os.stat a fullpathname that is prefixed with r'\\?\' so the Windows
API recognizes that you do NOT want the name filtered.

Better said by Microsoft:
"For file I/O, the "\\?\" prefix to a path string tells the Windows APIs
to disable all string parsing and to send the string that follows it
straight to the file system.



That's not so much a workaround as the officially supported API for

dealing with the situation you are in. Why don't you just prepend a '?'

to paths like they tell you to?

Steven says:
That's not so much a workaround as the officially supported API for
dealing with the situation you are in. Why don't you just prepend a '?'
to paths like they tell you to?

Good idea, but the first thing os.walk() does is a listdir(), and os.listdir() does not like the r'\\?\' prefix. In other words,
os.walk(r'\\?\C:Users\john\Desktop\sandbox\goo')
does not work.

Also, your recipe worked for me --
I'm walking 'goo' which contains 'voo.../doo'

import os

import genericpath
def my_isdir(s):
return genericpath.isdir('\\\\?\\' + os.path.abspath(s + '\\'))

print 'os.walk(\'goo\') with standard isdir()'
for root, dirs, files in os.walk('goo'):
print root, dirs, files

print 'os.walk(\'goo\') with modified isdir()'
os.path.isdir = my_isdir
for root, dirs, files in os.walk('goo'):
print root, dirs, files

yields

os.walk('goo') with standard isdir()
goo [] ['voo...']
os.walk('goo') with modified isdir()
goo ['voo...'] []
goo\voo... [] ['doo']

About monkeypatching, generally -- thanks for the pointer to that discussion. That sounded like a lot of wisdom and lessons learned being shared.
About me suggesting a patch -- I'll sleep on that :)

Thanks Steven!
John
 
S

Steven D'Aprano

Good idea, but the first thing os.walk() does is a listdir(), and
os.listdir() does not like the r'\\?\' prefix. In other words,
os.walk(r'\\?\C:Users\john\Desktop\sandbox\goo') does not work.

Now that sounds like a bug to me. If Microsoft officially support
leading ? in file names, then so should Python on Windows.

Also, your recipe worked for me --
I'm walking 'goo' which contains 'voo.../doo'

Good for you. (Sorry, that comes across as more condescending than it is
intended as.) Monkey-patching often gets used for quick scripts and tiny
pieces of code because it works.

Just beware that if you extend that technique to larger bodies of code,
say when using a large framework, or multiple libraries, your experience
may not be quite so good. Especially if *they* are monkey-patching too,
as some very large frameworks sometimes do. (Or so I am lead to believe.)

The point is not that monkey-patching is dangerous and should never be
used, but that it is risky and should be used with caution.
 
T

Tim Golden

Now that sounds like a bug to me. If Microsoft officially support
leading ? in file names, then so should Python on Windows.

And so it does, but you'll notice from the MSDN docs that the \\?
syntax must be supplied as a Unicode string, which os.listdir
will do if you pass it a Python unicode object and not otherwise:

import os
os.listdir(u"\\\\?\\c:\\users")

# and consequently

for p, ds, fs in os.walk(u"\\\\?\\c:\\users"):
print p


TJG
 
R

ruck

And so it does, but you'll notice from the MSDN docs that the \\?
syntax must be supplied as a Unicode string, which os.listdir
will do if you pass it a Python unicode object and not otherwise:

I was saying os.listdir doesn't like the r'\\?\' prefix.
But Tim corrects me -- so yes, Steven's earler suggestion "Why don't you just prepend a '?' to paths like they tell you to?" does work, when I supply it in unicode.
Good:
[u'voo...']
Bad:

Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
os.listdir('\\\\?\\C:\\Users\\john\\Desktop\\sandbox\\goo')
WindowsError: [Error 123] The filename, directory name, or volume labelsyntax is incorrect: '\\\\?\\C:\\Users\\john\\Desktop\\sandbox\\goo/*.*'

Thanks to both of you for taking the time to teach.

BTW, when I posted the original, I was trying to supply my own customized ntpath module, and I was really puzzled as to why it wasn't getting picked up! According to sys.path I expected my custom ntpath.py to be chosen, instead of the standard Lib/ntpath.py.

Now I guess I understand why. I moved Lib/ntpath.* out of the way, and learned that during initialization, Python is importing "site" module, which is importing "os" which is importing "ntpath" -- before my dir is added to sys.path. So later when I import os, it and ntpath have already been imported, so Python doesn't attempt a fresh import.

To get my custom ntpath.py honored, need to RELOAD, like:
import os
import ntpath
reload(ntpath)
print 'os.walk(\'goo\') with isdir override in custom ntpath'
for root, dirs, files in os.walk('goo'):
print root, dirs, files

where the diff betw standard ntpath.py and my ntpath.py are:
14c14,19
< from genericpath import *
---
from genericpath import *

def isdir(s):
return genericpath.isdir('\\\\?\\' + abspath(s + '\\'))
def isfile(s):
return genericpath.isfile('\\\\?\\' + abspath(s + '\\'))

I'm not sure how I could have known that ntpath was already imported, since*I* didn't import it, but that was the key to my confusion.

Thanks again for the help.
John
 
R

ruck

And so it does, but you'll notice from the MSDN docs that the \\?
syntax must be supplied as a Unicode string, which os.listdir
will do if you pass it a Python unicode object and not otherwise:

I was saying os.listdir doesn't like the r'\\?\' prefix.
But Tim corrects me -- so yes, Steven's earler suggestion "Why don't you just prepend a '?' to paths like they tell you to?" does work, when I supply it in unicode.
Good:
[u'voo...']
Bad:

Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
os.listdir('\\\\?\\C:\\Users\\john\\Desktop\\sandbox\\goo')
WindowsError: [Error 123] The filename, directory name, or volume labelsyntax is incorrect: '\\\\?\\C:\\Users\\john\\Desktop\\sandbox\\goo/*.*'

Thanks to both of you for taking the time to teach.

BTW, when I posted the original, I was trying to supply my own customized ntpath module, and I was really puzzled as to why it wasn't getting picked up! According to sys.path I expected my custom ntpath.py to be chosen, instead of the standard Lib/ntpath.py.

Now I guess I understand why. I moved Lib/ntpath.* out of the way, and learned that during initialization, Python is importing "site" module, which is importing "os" which is importing "ntpath" -- before my dir is added to sys.path. So later when I import os, it and ntpath have already been imported, so Python doesn't attempt a fresh import.

To get my custom ntpath.py honored, need to RELOAD, like:
import os
import ntpath
reload(ntpath)
print 'os.walk(\'goo\') with isdir override in custom ntpath'
for root, dirs, files in os.walk('goo'):
print root, dirs, files

where the diff betw standard ntpath.py and my ntpath.py are:
14c14,19
< from genericpath import *
---
from genericpath import *

def isdir(s):
return genericpath.isdir('\\\\?\\' + abspath(s + '\\'))
def isfile(s):
return genericpath.isfile('\\\\?\\' + abspath(s + '\\'))

I'm not sure how I could have known that ntpath was already imported, since*I* didn't import it, but that was the key to my confusion.

Thanks again for the help.
John
 
C

Chris Angelico

I'm not sure how I could have known that ntpath was already imported, since *I* didn't import it, but that was the key to my confusion.

One way to find out is to peek at the cache.

There are quite a few of them in the 3.2 interactive that I just tried this in.

ChrisA
 
D

Dave Angel

<snip>

I'm not sure how I could have known that ntpath was already imported, since *I* didn't import it, but that was the key to my confusion.

import sys
print sys.modules
 
T

Thomas Rachel

Am 11.09.2012 05:46 schrieb Steven D'Aprano:
Good for you. (Sorry, that comes across as more condescending than it is
intended as.) Monkey-patching often gets used for quick scripts and tiny
pieces of code because it works.

Just beware that if you extend that technique to larger bodies of code,
say when using a large framework, or multiple libraries, your experience
may not be quite so good. Especially if *they* are monkey-patching too,
as some very large frameworks sometimes do. (Or so I am lead to believe.)

This sonds like a good use case for a context manager, like the one in
decimal.Context.get_manager().

First shot:

@contextlib.contextmanager
def changed_os_path(**k):
old = {}
try:
for i in k.items():
old = getattr(os.path, i)
setattr(os.path, i, k)
yield None
finally:
for i in k.items():
setattr(os.path, i, old)

and so for your code you can use

print 'os.walk(\'goo\') with modified isdir()'
with changed_os_path(isdir=my_isdir):
for root, dirs, files in os.walk('goo'):
print root, dirs, files

so the change is only effective as long as you are in the relevant code
part and is reverted as soon as you leave it.


Thomas
 
A

Aahz

Am 11.09.2012 05:46 schrieb Steven D'Aprano:

This sonds like a good use case for a context manager, like the one in
decimal.Context.get_manager().

Note that because get_manager() applies to a specific Context instance it
is safe in a threaded application, which is NOT true for monkey-patching
modules even with a context manager.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top