Problem while doing a cat on a tabbed file with pexpect


S

Saqib Ali

I am using Solaris 10, python 2.6.2, pexpect 2.4

I create a file called me.txt which contains the letters "A", "B", "C"
on the same line separated by tabs.

My shell prompt is "% "

I then do the following in the python shell:

import pexpect
x = pexpect.spawn("/bin/tcsh")
x.sendline("cat me.txt") 11
x.expect([pexpect.TIMEOUT, "% "]) 1
x.before 'cat me.txt\r\r\nA B C\r\n'
x.before.split("\t")
['cat me.txt\r\r\nA B C\r\n']



Now, clearly the file contains tabs. But when I cat it through expect,
and collect cat's output, those tabs have been converted to spaces.
But I need the tabs!

Can anyone explain this phenomenon or suggest how I can fix it?
 
Ad

Advertisements

D

Dennis Lee Bieber

Now, clearly the file contains tabs. But when I cat it through expect,
and collect cat's output, those tabs have been converted to spaces.
But I need the tabs!

Can anyone explain this phenomenon or suggest how I can fix it?

My question is:

WHY are you doing this?

Based upon the problem discription, as given, the solution would
seem to be to just open the file IN Python -- whether you read the lines
and use split() by hand, or pass the open file to the csv module for
reading/parsing is up to you.

-=-=-=-=-=-=-
import csv
import os

TESTFILE = "Test.tsv"

#create data file
fout = open(TESTFILE, "w")
for ln in [ "abc",
"defg",
"hijA" ]:
fout.write("\t".join(list(ln)) + "\n")
fout.close()

#process tab-separated data
fin = open(TESTFILE, "rb")
rdr = csv.reader(fin, dialect="excel-tab")
for rw in rdr:
print rw

fin.close()
del rdr
os.remove(TESTFILE)
-=-=-=-=-=-=-
['a', 'b', 'c']
['d', 'e', 'f', 'g']
['h', 'i', 'j', 'A']
 
S

Steven D'Aprano

I am using Solaris 10, python 2.6.2, pexpect 2.4

I create a file called me.txt which contains the letters "A", "B", "C"
on the same line separated by tabs. [...]
Now, clearly the file contains tabs.

That is not clear at all. How do you know it contains tabs? How was the
file created in the first place?

Try this:

text = open('me.txt', 'r').read()
print '\t' in text

My guess is that it will print False and that the file does not contain
tabs. Check your editor used to create the file.
 
C

Cameron Simpson

| On Sun, 15 Jan 2012 09:51:44 -0800, Saqib Ali wrote:
| > I am using Solaris 10, python 2.6.2, pexpect 2.4
| >
| > I create a file called me.txt which contains the letters "A", "B", "C"
| > on the same line separated by tabs.
| [...]
| > Now, clearly the file contains tabs.
|
| That is not clear at all. How do you know it contains tabs? How was the
| file created in the first place?
|
| Try this:
|
| text = open('me.txt', 'r').read()
| print '\t' in text
|
| My guess is that it will print False and that the file does not contain
| tabs. Check your editor used to create the file.

I was going to post an alternative theory but on more thought I think
Steven is right here.

What does:

od -c me.txt

show you? TABs or multiple spaces?

What does:

ls -ld me.txt

tell you about the file size? Is it 6 bytes long (three letters, two
TABs, one newline)?

Steven hasn't been explicit about it, but some editors will write spaces when
you type a TAB. I have configured mine to do so - it makes indentation more
reliable for others. If I really need a TAB character I have a special
finger contortion to get one, but the actual need is rare.

So first check that the file really does contain TABs.

Cheers,
 
S

Saqib Ali

Very good question. Let me explain why I'm not opening me.txt directly
in python with open.

The example I have posted is simplified for illustrative purpose. In
reality, I'm not doing pexpect.spawn("/bin/tcsh"). I'm doing
pexpect.spawn("ssh [email protected]"). Since I'm operating on a remote
system, I can't simply open the file in my own python context.


Now, clearly the file contains tabs. But when I cat it through expect,
and collect cat's output, those tabs have been converted to spaces.
But I need the tabs!
Can anyone explain this phenomenon or suggest how I can fix it?

        My question is:

        WHY are you doing this?

        Based upon the problem discription, as given, the solution would
seem to be to just open the file IN Python -- whether you read the lines
and use split() by hand, or pass the open file to the csv module for
reading/parsing is up to you.

-=-=-=-=-=-=-
import csv
import os

TESTFILE = "Test.tsv"

#create data file
fout = open(TESTFILE, "w")
for ln in [  "abc",
            "defg",
            "hijA"  ]:
    fout.write("\t".join(list(ln)) + "\n")
fout.close()

#process tab-separated data
fin = open(TESTFILE, "rb")
rdr = csv.reader(fin, dialect="excel-tab")
for rw in rdr:
    print rw

fin.close()
del rdr
os.remove(TESTFILE)
-=-=-=-=-=-=-
['a', 'b', 'c']
['d', 'e', 'f', 'g']
['h', 'i', 'j', 'A']
 
S

Saqib Ali

The file me.txt does indeed contain tabs. I created it with vi.
True


% od -c me.txt
0000000 A \t B \t C \n
0000006


% ls -al me.txt
-rw-r--r-- 1 myUser myGroup 6 Jan 15 12:42 me.txt



| On Sun, 15 Jan 2012 09:51:44 -0800, Saqib Ali wrote:
| > I am using Solaris 10, python 2.6.2, pexpect 2.4
| >
| > I create a file called me.txt which contains the letters "A", "B", "C"
| > on the same line separated by tabs.
| [...]
| > Now, clearly the file contains tabs.
|
| That is not clear at all. How do you know it contains tabs? How was the
| file created in the first place?
|
| Try this:
|
| text = open('me.txt', 'r').read()
| print '\t' in text
|
| My guess is that it will print False and that the file does not contain
| tabs. Check your editor used to create the file.

I was going to post an alternative theory but on more thought I think
Steven is right here.

What does:

  od -c me.txt

show you? TABs or multiple spaces?

What does:

  ls -ld me.txt

tell you about the file size? Is it 6 bytes long (three letters, two
TABs, one newline)?

Steven hasn't been explicit about it, but some editors will write spaces when
you type a TAB. I have configured mine to do so - it makes indentation more
reliable for others. If I really need a TAB character I have a special
finger contortion to get one, but the actual need is rare.

So first check that the file really does contain TABs.

Cheers,
 
Ad

Advertisements

C

Cameron Simpson

| The file me.txt does indeed contain tabs. I created it with vi.
|
| >>> text = open("me.txt", "r").read()
| >>> print "\t" in text
| True
|
| % od -c me.txt
| 0000000 A \t B \t C \n
| 0000006
|
| % ls -al me.txt
| -rw-r--r-- 1 myUser myGroup 6 Jan 15 12:42 me.txt

Ok, your file does indeed contain TABs.

Therefre something is turning the TABs into spaces. Pexpect should be
opening a pty and reading from that, and I do not expect that to expand
TABs. So:

1: Using subprocess.Popen, invoke "cat me.txt" and check the result
for TABs.

2: Using pexpect, run "cat me.txt" instead of "/bin/tcsh" (eliminates a
layer of complexity; I don't actually expect changed behaviour) and
check for TABs.

On your Solaris system, read "man termios". Does it have an "expand
TABs" mode switch? This is about the only thing I can think of that
would produce your result - the pty terminal discipline is expanding
TABs for your (unwanted!) - cat is writing TABs to the terminal and the
terminal is passing expanded spaces to pexpect. Certainly terminal line
disciplines do rewrite stuff, most obviously "\n" into "\r\n", but a
quick glance through termios on a Linux box does not show a tab
expansion mode; I do not have access to a Solaris box at present.

Cheers,
 
S

Steven D'Aprano

I am using Solaris 10, python 2.6.2, pexpect 2.4

Are you sure about that? As far as I can see, pexpect's current version
is 2.3 not 2.4.

I create a file called me.txt which contains the letters "A", "B", "C"
on the same line separated by tabs.

My shell prompt is "% "

I then do the following in the python shell:

Can you try another shell, just in case tcsh is converting the tabs to
spaces?

What happens if you do this from the shell directly, without pexpect? It
is unlikely, but perhaps the problem lies with cat rather than pexpect.
You should eliminate this possibility.

x.expect([pexpect.TIMEOUT, "% "]) 1
x.before
'cat me.txt\r\r\nA B C\r\n'


Unfortunately I can't replicate the same behaviour, however my setup is
different. I'm using pexpect2.3 on Linux, and I tried it using bash and
sh but not tcsh. In all my tests, the tabs were returned as expected.

(However, the x.expect call returned 0 instead of 1, even with the shell
prompt set correctly.)
 
D

Dennis Lee Bieber

Very good question. Let me explain why I'm not opening me.txt directly
in python with open.

The example I have posted is simplified for illustrative purpose. In
reality, I'm not doing pexpect.spawn("/bin/tcsh"). I'm doing
pexpect.spawn("ssh [email protected]"). Since I'm operating on a remote
system, I can't simply open the file in my own python context.
Ah... Now we are outside of my experience... And into the realms of
how your remote host is handling tab characters when sent to an
(apparent) console (stdout not redirected to a file)... That is, does it
expand \t into spaces on the next multiple of 8 characters, or some
other size, rather than issue the tab character itself.

Is it an actual file on the remote end, or the output from some
interactive session? If an actual file, can you find an alternate
command to transfer the file to a local path for file processing (ftp,
scp, ?).
 
Ad

Advertisements

M

Michael Torrie

Very good question. Let me explain why I'm not opening me.txt directly
in python with open.

The example I have posted is simplified for illustrative purpose. In
reality, I'm not doing pexpect.spawn("/bin/tcsh"). I'm doing
pexpect.spawn("ssh [email protected]"). Since I'm operating on a remote
system, I can't simply open the file in my own python context.

There is a very nice python module called "paramiko" that you could use
to, from python, programatically ssh to the remote system and cat the
file (bypassing any shells) or use sftp to access it. Either way you
don't need to use pexpect with it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top