Insert string into string

F

Francesco Pietra

I am posting ex novo as it became confusing to me. I take the
opportunity to ask advice for a second problem.

FIRST PROBLEM
For file xxx.pdb, insert letter "A" into each line that starts with
"ATOM". "A" should be inserted at position 22, i.e., one space after
"LEU", leaving all other characters at the same position as in the
original example:


ATOM 1 N LEU 1 146.615 40.494 103.776 1.00 73.04 1SG 2

In all lines starting with "ATOM", "LEU" is constant as to position
only (18-20), i.e., "LEU" may be replaced by
three different uppercase letters. Therefore, the most direct
indication would be position 22. If specifying line starting with
"ATOM" makes complication, forget about that as most lines begin with
"ATOM" so that hand correction will be easy.

Script
f = open("xxx.pdb", "w")
import sys

for line in sys.stdin:
line = line[:22] + "A" + line[23:]
sys.stdout.write(line)

destroys the xxxx.pdb file and python exits witha non zero exit status.

The same occurs with script

f = open("hASIC1a.B99990003.pdb", "w")
f.write(' line = line[:22] + "A" + line[23:]')
f.close()

I must have misunderstood the suggestion I received on previous posting.
____________________________________
SECOND PROBLEM
File xxx.pdb above has 426 lines stating with "ATOM", this serial
number occupying positions 7-11, right justified (Thus 1, as in the
line example above, means first line). A second, similar file yyy.pdb
has to be concatenated to xxx.pdb. Before that it should be added of
"A" as above and renumbered at position 7-11, starting from 428 (there
is an intermediate line to add). How should a script look like for
this string insertion into string with recursive +1?


Thanks
francesco
 
M

Mensanator

I am posting ex novo as it became confusing to me. I take the
opportunity to ask advice for a second problem.

FIRST PROBLEM
For file xxx.pdb, insert letter "A" into each line that starts with
"ATOM". "A" should be inserted at position 22, i.e., one space after
"LEU", leaving all other characters at the same position as in the
original example:

ATOM � � �1 �N � LEU � � 1 � � 146.615 �40.494 103.776 �1.00 73.04 � � � 1SG � 2

In all lines starting with "ATOM", "LEU" is constant as to position
only (18-20), i.e., "LEU" may be replaced by
three different uppercase letters. Therefore, the most direct
indication would be position 22. If specifying line starting with
"ATOM" makes complication, forget about that as most lines begin with
"ATOM" so that hand correction will be easy.

Script
f = open("xxx.pdb", "w")
import sys

for line in sys.stdin:
� � line = line[:22] + "A" + line[23:]
� � sys.stdout.write(line)

destroys the xxxx.pdb file and python exits witha non zero exit status.

The same occurs with script

f = open("hASIC1a.B99990003.pdb", "w")
f.write(' line = line[:22] + "A" + line[23:]')
f.close()

I must have misunderstood the suggestion I received on previous posting.
____________________________________
SECOND PROBLEM
File xxx.pdb above has 426 lines stating with "ATOM", this serial
number occupying positions 7-11, right justified (Thus 1, as in the
line example above, means first line). A second, similar file yyy.pdb
has to be concatenated to xxx.pdb. Before that it should be added of
"A" as above and renumbered at position 7-11, starting from 428 (there
is an intermediate line to add). How should a script look like for
this string insertion into string with recursive +1?

Thanks
francesco

I don't know why you're using stdin if you're reading from a file.

Also, the serial number isn't 7-11, it's 6-10 (remember to
count from 0, so character 1 is position 0, etc.)

fx = open('xxx.pdb','r') # first input file
fy = open('yyy.pdb','r') # second input file
fz = open('zzz.pdb','w') # output file (to be created)

for xline in fx: # read input one line at a time
if len(xline) >= 80: # don't process invalid lines
line_index = int(xline[7:12]) # keep track of this
if xline[:4]=='ATOM':
fz.write(xline[:22] + 'A' + xline[23:])
else:
fz.write(xline)

fx.close() # done with first file

fz.write('the extra line \n')
line_index += 1 # don't forget to count it

for yline in fy: # read second file
if len(yline) >= 80: # again, valid only
line_index += 1 # ignore serial number, use
# where we left off from
# from first file
if yline[:4]=='ATOM':
# note use of .rjust(5) to creat new serial number
fz.write(yline[:6] + \
str(line_index).rjust(5) + \
yline[11:22] + 'A' + yline[23:])
else:
fz.write(yline[:6] + \
str(line_index).rjust(5) + yline[11:])

fy.close() # done with second file

fz.close() # done with output file
 
R

Roy Smith

"Francesco Pietra said:
I am posting ex novo as it became confusing to me. I take the
opportunity to ask advice for a second problem.

FIRST PROBLEM
For file xxx.pdb, insert letter "A" into each line that starts with
"ATOM". "A" should be inserted at position 22, i.e., one space after
"LEU", leaving all other characters at the same position as in the
original example:


ATOM 1 N LEU 1 146.615 40.494 103.776 1.00 73.04 1SG
2

In all lines starting with "ATOM", "LEU" is constant as to position
only (18-20), i.e., "LEU" may be replaced by
three different uppercase letters. Therefore, the most direct
indication would be position 22. If specifying line starting with
"ATOM" makes complication, forget about that as most lines begin with
"ATOM" so that hand correction will be easy.

Script
f = open("xxx.pdb", "w")
import sys

for line in sys.stdin:
line = line[:22] + "A" + line[23:]
sys.stdout.write(line)

You're opening "xxx.pdb" for writing, but then not writing to it. You're
writing to stdout.

BTW, you might want to take a look at http://biopython.org.
 
M

Mensanator

I am posting ex novo as it became confusing to me. I take the
opportunity to ask advice for a second problem.
FIRST PROBLEM
For file xxx.pdb, insert letter "A" into each line that starts with
"ATOM". "A" should be inserted at position 22, i.e., one space after
"LEU", leaving all other characters at the same position as in the
original example:
ATOM � � �1 �N � LEU � � 1 � � 146.615 �40.494 103.776 �1.00 73.04 � � � 1SG � 2
In all lines starting with "ATOM", "LEU" is constant as to position
only (18-20), i.e., "LEU" may be replaced by
three different uppercase letters. Therefore, the most direct
indication would be position 22. If specifying line starting with
"ATOM" makes complication, forget about that as most lines begin with
"ATOM" so that hand correction will be easy.
Script
f = open("xxx.pdb", "w")
import sys
for line in sys.stdin:
� � line = line[:22] + "A" + line[23:]
� � sys.stdout.write(line)
destroys the xxxx.pdb file and python exits witha non zero exit status.
The same occurs with script
f = open("hASIC1a.B99990003.pdb", "w")
f.write(' line = line[:22] + "A" + line[23:]')
f.close()
I must have misunderstood the suggestion I received on previous posting..
____________________________________
SECOND PROBLEM
File xxx.pdb above has 426 lines stating with "ATOM", this serial
number occupying positions 7-11, right justified (Thus 1, as in the
line example above, means first line). A second, similar file yyy.pdb
has to be concatenated to xxx.pdb. Before that it should be added of
"A" as above and renumbered at position 7-11, starting from 428 (there
is an intermediate line to add). How should a script look like for
this string insertion into string with recursive +1?
Thanks
francesco

I don't know why you're using stdin if you're reading from a file.

Also, the serial number isn't 7-11, it's 6-10 (remember to
count from 0, so character 1 is position 0, etc.)

fx = open('xxx.pdb','r') # first input file
fy = open('yyy.pdb','r') # second input file
fz = open('zzz.pdb','w') # output file (to be created)

for xline in fx:         # read input one line at a time
  if len(xline) >= 80:   # don't process invalid lines
    line_index = int(xline[7:12]) # keep track of this

Forgot to fix this after I discovered your error.
S/b int(xline[6:11])
    if xline[:4]=='ATOM':
      fz.write(xline[:22] + 'A' + xline[23:])
    else:
      fz.write(xline)

fx.close() # done with first file

fz.write('the extra line \n')
line_index += 1               # don't forget to count it

for yline in fy:              # read second file
  if len(yline) >= 80:        # again, valid only
    line_index += 1           # ignore serial number, use
                              #   where we left off from
                              #   from first file
    if yline[:4]=='ATOM':
      # note use of .rjust(5) to creat new serial number
      fz.write(yline[:6] + \
               str(line_index).rjust(5) + \
               yline[11:22] + 'A' + yline[23:])
    else:
      fz.write(yline[:6] + \
               str(line_index).rjust(5) + yline[11:])

fy.close() # done with second file

fz.close() # done with output file
 
P

Peter Otten

Mensanator said:
I don't know why you're using stdin if you're reading from a file.

From Francesco's initial post in his previous thread I inferred that he had
a script like

f = open("xxx.pdb")
for line in f:
# process line
print line

and was calling it

python script.py >outfile

My hope was that

import sys
for line in sys.stdin:
# process line
sys.stdout.write(line)

invoked as

python script.py <xxx.pdb >outfile

would be an improvement as it avoids hardcoding the filename, but instead
chaos ensued...

Francesco: Mensanator's script looks like you can take it "as is". If you
want to use Python to do other interesting things I highly recommend that
you work your way through a tutorial of your choice. This will make
subsequent trial-and-error much more fun.

Following Roy's suggestion I also had a brief look at Biopython's PDB parser
which has the advantage that it "understands" the file format.
Unfortunately it is probably too complex for you to use at this point of
your career as a pythonista ;)

By the way, are you trying to modify the chain ID? Biopython locates that at
position 21, so take this as a reminder that indices in Python start at 0,
i. e. line[21] gives you the 22nd character in the line.

Peter
 
M

Mensanator

From Francesco's initial post in his previous thread I inferred that he had
a script like

f = open("xxx.pdb")
for line in f:
� � # process line
� � print line

and was calling it

python script.py >outfile

My hope was that

import sys
for line in sys.stdin:
� � # process line
� � sys.stdout.write(line)

invoked as

python script.py <xxx.pdb >outfile

would be an improvement as it avoids hardcoding the filename, but instead
chaos ensued...

Francesco: Mensanator's script looks like you can take it "as is".

Well, I didn't bother to insert the serial number
into the extra line as the extra line wasn't given.
Hopefully, it's obvious how to do that.
If you
want to use Python to do other interesting things I highly recommend that
you work your way through a tutorial of your choice. This will make
subsequent trial-and-error much more fun.

Following Roy's suggestion I also had a brief look at Biopython's PDB parser
which has the advantage that it "understands" the file format.
Unfortunately it is probably too complex for you to use at this point of
your career as a pythonista ;)

By the way, are you trying to modify the chain ID? Biopython locates that at
position 21, so take this as a reminder that indices in Python start at 0,
i. e. line[21] gives you the 22nd character in the line.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,824
Messages
2,569,755
Members
45,744
Latest member
PoppyRizzo

Latest Threads

Top