Win XP: Problem with shell scripting in Python

A

A.M

Hi,



I am having difficulty with shell scripting in Python.



I use the following command to run a DOS command and put the return value in
a Python variable:



print os.popen('DIR').read()



It works very fine with DIR command, but for commands like "MD :" it doesn't
return the error message into the string:



print os.popen('MD :').read()

# No error message



When I use Ruby, it works perfect:



`md :`

The filename, directory name, or volume label syntax is incorrect.



I am also having problem with redirecting the python script output to a
file: That means I can redirect the output to a file by using pipes like
this:



Python.exe script.py >file.txt



But the sequence of contents in file.txt doesn't match with command
execution sequence!

When I don't use pipes, the output sequence is fine when I see the output on
the monitor screen.



Am I missing anything? Considering the fact that Ruby doesn't have any
problem with redirecting STDOUT into files or string variables, is Python
the right tool for this kinds of shell scripting?



Any help would be appreciated,

Alan
 
J

John Machin

Hi,
I am having difficulty with shell scripting in Python.
I use the following command to run a DOS command and put the return value in
a Python variable:
print os.popen('DIR').read()
It works very fine with DIR command, but for commands like "MD :" it doesn't
return the error message into the string:
print os.popen('MD :').read()
# No error message
When I use Ruby, it works perfect:
`md :`

Irrelevant; different "it".
The filename, directory name, or volume label syntax is incorrect.

I have never had occasion to use os.popen_anything before. Perhaps we
can aid each other on the path to enlightenment. I got the following
idea from looking at the manual.

|>>> handles = os.popen3('MD :')
|>>> [f.read() for f in handles[1:]]
['', 'The filename, directory name, or volume label syntax is incorrect.\n']
|>>> [f.close() for f in handles]
[None, None, 1]

|>>> handles = os.popen3('dir *.py')
|>>> [f.read() for f in handles[1:]]
[' Volume in drive C has no label.\n Volume Serial Number is **BIG
SNIP** bytes free\n', '']
|>>> [f.close() for f in handles]
[None, None, None]

Windows does occasionally adhere to *x conventions like "data to stdout,
error messages to stderr" :)

Now it's *your* turn to do something for the cause. It appears to me
that popen4 has exactly the same documentation as popen3, as recently as
2.5a2. I see no fourth gizmoid here.

Whoooaaah! "4" is not a gizmoid count:

|>>> handles = os.popen4('MD :')
|>>> handles
(<open file 'MD :', mode 'w' at 0x00AAF698>, <open file 'MD :', mode 'r'
at 0x00
AAF4E8>)

You might like to suss this out, and raise a request to have the docs fixed.

Cheers,
John
 
J

John Machin

Now it's *your* turn to do something for the cause. It appears to me
that popen4 has exactly the same documentation as popen3, as recently as
2.5a2. I see no fourth gizmoid here.

Whoooaaah! "4" is not a gizmoid count:

|>>> handles = os.popen4('MD :')
|>>> handles
(<open file 'MD :', mode 'w' at 0x00AAF698>, <open file 'MD :', mode 'r'
at 0x00
AAF4E8>)

You might like to suss this out, and raise a request to have the docs
fixed.

OK, OK, alright already. I didn't read the docs closely enough. Forget
the doc-fix request. It looks like it will pay you to investigate popen4
a bit further :)

Cheers,
 
F

Fredrik Lundh

A.M said:
It works very fine with DIR command, but for commands like "MD :" it doesn't
return the error message into the string:

print os.popen('MD :').read()

# No error message

in python, "MD" is spelled os.mkdir.
Am I missing anything?

the difference between STDOUT and STDERR, and the difference between
buffered output and non-buffered output, and perhaps a few other things
related to how STDIO behaves on modern computers... however, if you
want to pretend that STDOUT and STDERR are the same thing, you can use
os.popen4:
'The filename, directory name, or volume label syntax is incorrect.\n'

or the subprocess module.
Considering the fact that Ruby doesn't have any problem with redirecting
> STDOUT into files or string variables, is Python the right tool for
> this kinds of shell scripting?

rewriting BAT files as a series of os.system or os.popen calls isn't
exactly optimal (neither for the computer nor the programmer nor the
future user); better take an hour to skim the "generic operating system
services" section in the library reference, and use built-in functions
wherever you can:

http://docs.python.org/lib/allos.html

the following modules are especially useful:

http://docs.python.org/lib/module-os.html
http://docs.python.org/lib/module-os.path.html
http://docs.python.org/lib/module-glob.html
http://docs.python.org/lib/module-shutil.html

by using the built-in tools, you get better performance in many cases,
better error handling, and code that's a lot easier to reuse (also on
non-Windows platforms).

</F>
 
A

A.M

Fredrik Lundh said:
A.M wrote:


in python, "MD" is spelled os.mkdir.


the difference between STDOUT and STDERR, and the difference between
buffered output and non-buffered output, and perhaps a few other things
related to how STDIO behaves on modern computers... however, if you want
to pretend that STDOUT and STDERR are the same thing, you can use
os.popen4:

'The filename, directory name, or volume label syntax is incorrect.\n'

or the subprocess module.


rewriting BAT files as a series of os.system or os.popen calls isn't
exactly optimal (neither for the computer nor the programmer nor the
future user); better take an hour to skim the "generic operating system
services" section in the library reference, and use built-in functions
wherever you can:

http://docs.python.org/lib/allos.html

the following modules are especially useful:

http://docs.python.org/lib/module-os.html
http://docs.python.org/lib/module-os.path.html
http://docs.python.org/lib/module-glob.html
http://docs.python.org/lib/module-shutil.html

by using the built-in tools, you get better performance in many cases,
better error handling, and code that's a lot easier to reuse (also on
non-Windows platforms).

</F>


Thanks Fredrik for help.



The "MD :" is just a sample. The actual script contains different commands.



The actual script that I am "translating" consolidates huge table data from
multiple SQL Server database into Oracle. I have to use BCP command line at
the SQL server side and SQL*Loader at the Oracle side.



I must capture the stdout/stderr output of command lines into log files for
future inspection/troubleshooting. Beside the issue with stdout/stderror,
the main stressful problem that I have is the fact that Python captures
command line's output somehow differently. For example, popen captures BCP's
command output completely wrong. Some part of summary is at the top and the
progress percentages are at the bottom and more.! This is just stdout
output.



I am going to investigate other popen4 and other popen forms per your
suggestion and try to fix the stdout sequence problem.



Regards,

Alan
 
S

Steve Holden

A.M said:
Thanks Fredrik for help.



The "MD :" is just a sample. The actual script contains different commands.



The actual script that I am "translating" consolidates huge table data from
multiple SQL Server database into Oracle. I have to use BCP command line at
the SQL server side and SQL*Loader at the Oracle side.



I must capture the stdout/stderr output of command lines into log files for
future inspection/troubleshooting. Beside the issue with stdout/stderror,
the main stressful problem that I have is the fact that Python captures
command line's output somehow differently. For example, popen captures BCP's
command output completely wrong. Some part of summary is at the top and the
progress percentages are at the bottom and more.! This is just stdout
output.



I am going to investigate other popen4 and other popen forms per your
suggestion and try to fix the stdout sequence problem.
I dare hardly suggest this, but might it not be better to use Python's
database functionality to perform the task? The language can access both
databases, and you might find it quicker. Then again, if your database
experience is limited, you may not ...

regards
Steve
 
A

A.M

I dare hardly suggest this, but might it not be better to use Python's
database functionality to perform the task? The language can access both
databases, and you might find it quicker. Then again, if your database
experience is limited, you may not ...

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Love me, love my blog http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Hi Steven,



Based on my experience, the fastest possible way to import raw data into
Oracle is SQL*Loader. Similarly, the fastest way to extract raw data from
SQL server is BCP.



My script transfers 40,000,000 records (actually big records) from sql
server to oracle in 20 Min. I tried ODBC to do the same work. I turned off
all record locking and transactions through query hints. The actual program
was a C# program. After 12 hours, I just stopped the program.



I created DOS batch files to control BCP and SQL*Loader steps. It is faster
than any fancy GUI tools. Now I am using Python.



I am thinking to add comprehensive logging to the ETL (extract transform
load) process. All details command line outputs will be stored in database.
System administrators can query database and watch how the ETL job is
working.



At this point I have quite challenge with capturing BCP's stdout/stderr
output to string variables in Python program.



I'll post the final outcome here.



Regards, Alan
 
S

Steve Holden

A.M said:
Hi Steven,



Based on my experience, the fastest possible way to import raw data into
Oracle is SQL*Loader. Similarly, the fastest way to extract raw data from
SQL server is BCP.



My script transfers 40,000,000 records (actually big records) from sql
server to oracle in 20 Min. I tried ODBC to do the same work. I turned off
all record locking and transactions through query hints. The actual program
was a C# program. After 12 hours, I just stopped the program.
I'm not that surprised - it was just an inquiry. Usually the database
bulk dump and load utilities take advantage of every trick in their
respective books to provide speed. A native driver might have done
better than ODBC, but you are probably correct on going that route for
speed.
I created DOS batch files to control BCP and SQL*Loader steps. It is faster
than any fancy GUI tools. Now I am using Python.

I am thinking to add comprehensive logging to the ETL (extract transform
load) process. All details command line outputs will be stored in database.
System administrators can query database and watch how the ETL job is
working.

At this point I have quite challenge with capturing BCP's stdout/stderr
output to string variables in Python program.

I'll post the final outcome here.
OK, good luck.

regards
Steve
 
D

Dennis Lee Bieber

command line's output somehow differently. For example, popen captures BCP's
command output completely wrong. Some part of summary is at the top and the
progress percentages are at the bottom and more.! This is just stdout
output.

That sounds suspiciously like BCP is doing some cursor control
operations to position stuff on screen which is not seen by Python.
Python is just capturing the "printable" text in the order it comes out.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
A

A.M

Here is what I came up with after John and Fredrik's help.



import os

import sys



def Execute(shell_command,logStream = sys.stdout):

print >>logStream, shell_command

child_stdin, child_stdout_and_stderr = os.popen4(shell_command)

commad_output = child_stdout_and_stderr.read()

print >>logStream, commad_output

return_code = child_stdout_and_stderr.close()

return_code = return_code or child_stdin.close()

print >>logStream, "Return Code: " , return_code



Execute ("DIR")

Execute ("MD :")



I tested it and so far it behaves the way that I want.



The tricky part is that when you use popen4, you have to close both returned
streams to be able to get the return code. I wasn't able to find that in the
documentation.



Alan
 
J

John Machin

Here is what I came up with after John and Fredrik's help.

import os
import sys
def Execute(shell_command,logStream = sys.stdout):
print >>logStream, shell_command
child_stdin, child_stdout_and_stderr = os.popen4(shell_command)
commad_output = child_stdout_and_stderr.read()
print >>logStream, commad_output
return_code = child_stdout_and_stderr.close()
return_code = return_code or child_stdin.close()
print >>logStream, "Return Code: " , return_code

Execute ("DIR")

Execute ("MD :")

I tested it and so far it behaves the way that I want.

Does it overcome the problem that you reported earlier, that the
contents of the output file from BCP were out of order? If not, you may
like to try popen3(). It's quite possible (and indeed desirable) that
the child's stderr is not buffered (so that error messages appear
immediately) but the child's stdout is buffered (for efficiency), and
when the buffer is flushed governs the order of appearance in a single
output stream.
The tricky part is that when you use popen4, you have to close both returned
streams to be able to get the return code. I wasn't able to find that in the
documentation.

In general it is good practice to hand back resources (e.g. close files)
explicitly as soon as you are finished with them. This is especially
important for files open for writing, in case there are problems like
out of disk space, device not functioning etc. Also when you are dealing
with a child process it makes some sense to close its stdin first just
in case it is waiting for that, and will then write something to stdout,
which may fail, causing it to write to stderr. So I guess that the
documenter didn't stumble onto the "tricky part" :)

The "tricky part" for popen and friends seems to be that the return code
is handed back upon close of the *last* file:

|>>> h = os.popen3('md : ')
|>>> [h[x].close() for x in 2, 1, 0]
[None, None, 1]

Looks like you get to report a documentation "problem" after all :)

Cheers,
John
 
A

A.M

Does it overcome the problem that you reported earlier, that the
contents of the output file from BCP were out of order?



Yes, it does. But, to be honest, I don't know how!!!





John Machin said:
Here is what I came up with after John and Fredrik's help.

import os
import sys
def Execute(shell_command,logStream = sys.stdout):
print >>logStream, shell_command
child_stdin, child_stdout_and_stderr = os.popen4(shell_command)
commad_output = child_stdout_and_stderr.read()
print >>logStream, commad_output
return_code = child_stdout_and_stderr.close()
return_code = return_code or child_stdin.close()
print >>logStream, "Return Code: " , return_code

Execute ("DIR")

Execute ("MD :")

I tested it and so far it behaves the way that I want.

Does it overcome the problem that you reported earlier, that the contents
of the output file from BCP were out of order? If not, you may like to try
popen3(). It's quite possible (and indeed desirable) that the child's
stderr is not buffered (so that error messages appear immediately) but the
child's stdout is buffered (for efficiency), and when the buffer is
flushed governs the order of appearance in a single output stream.
The tricky part is that when you use popen4, you have to close both
returned streams to be able to get the return code. I wasn't able to find
that in the documentation.

In general it is good practice to hand back resources (e.g. close files)
explicitly as soon as you are finished with them. This is especially
important for files open for writing, in case there are problems like out
of disk space, device not functioning etc. Also when you are dealing with
a child process it makes some sense to close its stdin first just in case
it is waiting for that, and will then write something to stdout, which may
fail, causing it to write to stderr. So I guess that the documenter didn't
stumble onto the "tricky part" :)

The "tricky part" for popen and friends seems to be that the return code
is handed back upon close of the *last* file:

|>>> h = os.popen3('md : ')
|>>> [h[x].close() for x in 2, 1, 0]
[None, None, 1]

Looks like you get to report a documentation "problem" after all :)

Cheers,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top