Insert string into string

Francesco Pietra · Jul 26, 2008

I am posting ex novo as it became confusing to me. I take the
opportunity to ask advice for a second problem.

FIRST PROBLEM
For file xxx.pdb, insert letter "A" into each line that starts with
"ATOM". "A" should be inserted at position 22, i.e., one space after
"LEU", leaving all other characters at the same position as in the
original example:

ATOM 1 N LEU 1 146.615 40.494 103.776 1.00 73.04 1SG 2

In all lines starting with "ATOM", "LEU" is constant as to position
only (18-20), i.e., "LEU" may be replaced by
three different uppercase letters. Therefore, the most direct
indication would be position 22. If specifying line starting with
"ATOM" makes complication, forget about that as most lines begin with
"ATOM" so that hand correction will be easy.

Script
f = open("xxx.pdb", "w")
import sys

for line in sys.stdin:
line = line[:22] + "A" + line[23:]
sys.stdout.write(line)

destroys the xxxx.pdb file and python exits witha non zero exit status.

The same occurs with script

f = open("hASIC1a.B99990003.pdb", "w")
f.write(' line = line[:22] + "A" + line[23:]')
f.close()

I must have misunderstood the suggestion I received on previous posting.
____________________________________
SECOND PROBLEM
File xxx.pdb above has 426 lines stating with "ATOM", this serial
number occupying positions 7-11, right justified (Thus 1, as in the
line example above, means first line). A second, similar file yyy.pdb
has to be concatenated to xxx.pdb. Before that it should be added of
"A" as above and renumbered at position 7-11, starting from 428 (there
is an intermediate line to add). How should a script look like for
this string insertion into string with recursive +1?

Thanks
francesco

Mensanator · Jul 27, 2008

I am posting ex novo as it became confusing to me. I take the
opportunity to ask advice for a second problem.

FIRST PROBLEM
For file xxx.pdb, insert letter "A" into each line that starts with
"ATOM". "A" should be inserted at position 22, i.e., one space after
"LEU", leaving all other characters at the same position as in the
original example:

ATOM ï¿½ ï¿½ ï¿½1 ï¿½N ï¿½ LEU ï¿½ ï¿½ 1 ï¿½ ï¿½ 146.615 ï¿½40.494 103.776 ï¿½1.00 73.04 ï¿½ ï¿½ ï¿½ 1SG ï¿½ 2

In all lines starting with "ATOM", "LEU" is constant as to position
only (18-20), i.e., "LEU" may be replaced by
three different uppercase letters. Therefore, the most direct
indication would be position 22. If specifying line starting with
"ATOM" makes complication, forget about that as most lines begin with
"ATOM" so that hand correction will be easy.

Script
f = open("xxx.pdb", "w")
import sys

for line in sys.stdin:
ï¿½ ï¿½ line = line[:22] + "A" + line[23:]
ï¿½ ï¿½ sys.stdout.write(line)

destroys the xxxx.pdb file and python exits witha non zero exit status.

The same occurs with script

f = open("hASIC1a.B99990003.pdb", "w")
f.write(' line = line[:22] + "A" + line[23:]')
f.close()

I must have misunderstood the suggestion I received on previous posting.
____________________________________
SECOND PROBLEM
File xxx.pdb above has 426 lines stating with "ATOM", this serial
number occupying positions 7-11, right justified (Thus 1, as in the
line example above, means first line). A second, similar file yyy.pdb
has to be concatenated to xxx.pdb. Before that it should be added of
"A" as above and renumbered at position 7-11, starting from 428 (there
is an intermediate line to add). How should a script look like for
this string insertion into string with recursive +1?

Thanks
francesco

I don't know why you're using stdin if you're reading from a file.

Also, the serial number isn't 7-11, it's 6-10 (remember to
count from 0, so character 1 is position 0, etc.)

fx = open('xxx.pdb','r') # first input file
fy = open('yyy.pdb','r') # second input file
fz = open('zzz.pdb','w') # output file (to be created)

for xline in fx: # read input one line at a time
if len(xline) >= 80: # don't process invalid lines
line_index = int(xline[7:12]) # keep track of this
if xline[:4]=='ATOM':
fz.write(xline[:22] + 'A' + xline[23:])
else:
fz.write(xline)

fx.close() # done with first file

fz.write('the extra line \n')
line_index += 1 # don't forget to count it

for yline in fy: # read second file
if len(yline) >= 80: # again, valid only
line_index += 1 # ignore serial number, use
# where we left off from
# from first file
if yline[:4]=='ATOM':
# note use of .rjust(5) to creat new serial number
fz.write(yline[:6] + \
str(line_index).rjust(5) + \
yline[11:22] + 'A' + yline[23:])
else:
fz.write(yline[:6] + \
str(line_index).rjust(5) + yline[11:])

fy.close() # done with second file

fz.close() # done with output file

Roy Smith · Jul 27, 2008

"Francesco Pietra said:
I am posting ex novo as it became confusing to me. I take the
opportunity to ask advice for a second problem.

FIRST PROBLEM
For file xxx.pdb, insert letter "A" into each line that starts with
"ATOM". "A" should be inserted at position 22, i.e., one space after
"LEU", leaving all other characters at the same position as in the
original example:

ATOM 1 N LEU 1 146.615 40.494 103.776 1.00 73.04 1SG
2

In all lines starting with "ATOM", "LEU" is constant as to position
only (18-20), i.e., "LEU" may be replaced by
three different uppercase letters. Therefore, the most direct
indication would be position 22. If specifying line starting with
"ATOM" makes complication, forget about that as most lines begin with
"ATOM" so that hand correction will be easy.

Script
f = open("xxx.pdb", "w")
import sys

for line in sys.stdin:
line = line[:22] + "A" + line[23:]
sys.stdout.write(line)

You're opening "xxx.pdb" for writing, but then not writing to it. You're
writing to stdout.

BTW, you might want to take a look at http://biopython.org.

Mensanator · Jul 27, 2008

I am posting ex novo as it became confusing to me. I take the
opportunity to ask advice for a second problem.

Click to expand...

FIRST PROBLEM
For file xxx.pdb, insert letter "A" into each line that starts with
"ATOM". "A" should be inserted at position 22, i.e., one space after
"LEU", leaving all other characters at the same position as in the
original example:

Click to expand...

ATOM ï¿½ ï¿½ ï¿½1 ï¿½N ï¿½ LEU ï¿½ ï¿½ 1 ï¿½ ï¿½ 146.615 ï¿½40.494 103.776 ï¿½1.00 73.04 ï¿½ ï¿½ ï¿½ 1SG ï¿½ 2

Click to expand...

In all lines starting with "ATOM", "LEU" is constant as to position
only (18-20), i.e., "LEU" may be replaced by
three different uppercase letters. Therefore, the most direct
indication would be position 22. If specifying line starting with
"ATOM" makes complication, forget about that as most lines begin with
"ATOM" so that hand correction will be easy.

Click to expand...

Script
f = open("xxx.pdb", "w")
import sys

Click to expand...

for line in sys.stdin:
ï¿½ ï¿½ line = line[:22] + "A" + line[23:]
ï¿½ ï¿½ sys.stdout.write(line)

Click to expand...

destroys the xxxx.pdb file and python exits witha non zero exit status.

Click to expand...

The same occurs with script

Click to expand...

f = open("hASIC1a.B99990003.pdb", "w")
f.write(' line = line[:22] + "A" + line[23:]')
f.close()

Click to expand...

I must have misunderstood the suggestion I received on previous posting..
____________________________________
SECOND PROBLEM
File xxx.pdb above has 426 lines stating with "ATOM", this serial
number occupying positions 7-11, right justified (Thus 1, as in the
line example above, means first line). A second, similar file yyy.pdb
has to be concatenated to xxx.pdb. Before that it should be added of
"A" as above and renumbered at position 7-11, starting from 428 (there
is an intermediate line to add). How should a script look like for
this string insertion into string with recursive +1?

Click to expand...

Thanks
francesco

Click to expand...

I don't know why you're using stdin if you're reading from a file.

Also, the serial number isn't 7-11, it's 6-10 (remember to
count from 0, so character 1 is position 0, etc.)

fx = open('xxx.pdb','r') # first input file
fy = open('yyy.pdb','r') # second input file
fz = open('zzz.pdb','w') # output file (to be created)

for xline in fx: Â Â Â Â # read input one line at a time
Â if len(xline) >= 80: Â # don't process invalid lines
Â Â line_index = int(xline[7:12]) # keep track of this

Forgot to fix this after I discovered your error.
S/b int(xline[6:11])

Â Â if xline[:4]=='ATOM':
Â Â Â fz.write(xline[:22] + 'A' + xline[23:])
Â Â else:
Â Â Â fz.write(xline)

fx.close() # done with first file

fz.write('the extra line \n')
line_index += 1 Â Â Â Â Â Â Â # don't forget to count it

for yline in fy: Â Â Â Â Â Â Â # read second file
Â if len(yline) >= 80: Â Â Â Â # again, valid only
Â Â line_index += 1 Â Â Â Â Â # ignore serial number, use
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â # Â where we left off from
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â # Â from first file
Â Â if yline[:4]=='ATOM':
Â Â Â # note use of .rjust(5) to creat new serial number
Â Â Â fz.write(yline[:6] + \
Â Â Â Â Â Â Â Â str(line_index).rjust(5) + \
Â Â Â Â Â Â Â Â yline[11:22] + 'A' + yline[23:])
Â Â else:
Â Â Â fz.write(yline[:6] + \
Â Â Â Â Â Â Â Â str(line_index).rjust(5) + yline[11:])

fy.close() # done with second file

fz.close() # done with output file

Peter Otten · Jul 27, 2008

Mensanator said:
I don't know why you're using stdin if you're reading from a file.

From Francesco's initial post in his previous thread I inferred that he had
a script like

f = open("xxx.pdb")
for line in f:
# process line
print line

and was calling it

python script.py >outfile

My hope was that

import sys
for line in sys.stdin:
# process line
sys.stdout.write(line)

invoked as

python script.py <xxx.pdb >outfile

would be an improvement as it avoids hardcoding the filename, but instead
chaos ensued...

Francesco: Mensanator's script looks like you can take it "as is". If you
want to use Python to do other interesting things I highly recommend that
you work your way through a tutorial of your choice. This will make
subsequent trial-and-error much more fun.

Following Roy's suggestion I also had a brief look at Biopython's PDB parser
which has the advantage that it "understands" the file format.
Unfortunately it is probably too complex for you to use at this point of
your career as a pythonista

By the way, are you trying to modify the chain ID? Biopython locates that at
position 21, so take this as a reminder that indices in Python start at 0,
i. e. line[21] gives you the 22nd character in the line.

Peter

Mensanator · Jul 27, 2008

From Francesco's initial post in his previous thread I inferred that he had
a script like

f = open("xxx.pdb")
for line in f:
ï¿½ ï¿½ # process line
ï¿½ ï¿½ print line

and was calling it

python script.py >outfile

My hope was that

import sys
for line in sys.stdin:
ï¿½ ï¿½ # process line
ï¿½ ï¿½ sys.stdout.write(line)

invoked as

python script.py <xxx.pdb >outfile

would be an improvement as it avoids hardcoding the filename, but instead
chaos ensued...

Francesco: Mensanator's script looks like you can take it "as is".

Well, I didn't bother to insert the serial number
into the extra line as the extra line wasn't given.
Hopefully, it's obvious how to do that.

If you
want to use Python to do other interesting things I highly recommend that
you work your way through a tutorial of your choice. This will make
subsequent trial-and-error much more fun.

Following Roy's suggestion I also had a brief look at Biopython's PDB parser
which has the advantage that it "understands" the file format.
Unfortunately it is probably too complex for you to use at this point of
your career as a pythonista

By the way, are you trying to modify the chain ID? Biopython locates that at
position 21, so take this as a reminder that indices in Python start at 0,
i. e. line[21] gives you the 22nd character in the line.

Peter

Insert character at a fixed position of lines	11	Jul 26, 2008
Insert variable into text search string	4	Feb 19, 2014
Insert NULL into mySQL datetime	3	Dec 25, 2013
Renumbering	4	Sep 2, 2008
Rearranging .ply file via C++ String Parsing	0	Dec 14, 2019
Minimum Total Difficulty	0	Nov 15, 2023
Converting several Markdown files into DOCX with pandoc	4	Feb 1, 2023
Datetime string reformatting	0	Dec 22, 2013

Insert string into string

Francesco Pietra

Mensanator

Roy Smith

Mensanator

Peter Otten

Mensanator

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads