ATTN : Georges ( [email protected])

G

Guest

First of all thanks for helping me out.

I have to admit I dont understand some of your suggestiosn, sorry.
I dont know what is the "3D" thing... Is there another way to make it
work something more simple for a newbie like me? Thanks

What I want to do is:
First check all the files from a folder and analyze only the one with the .Seq extension.
What I want to do is to get the reverse complement of the DNA sequence. If their is a problem
with some characters in the DNA Sequence I want the function to tell it to me.

Here are the comp and iupac:

iupac ="GgAaTtCcRrYyMmKkSsWwHhBbVvDdNn"

comp={"A":"T", "T":"A", "G":"C", "C":"G", "R":"Y", "Y":"R", "M":"K",
"K":"M", "S":"W", "W":"S", "B":"V", "V":"B", "D":"H", "H":"D", "r":"y",
"y":"r", "m":"k", "k":"m", "s":"w", "w":"s", "b":"v", "v":"b", "d":"h",
"h":"d", "a":"t", "t":"a", "g":"c", "c":"g", "N":"N","n":"n"}

So if a $ or Z appears in the DNA sequence, I want to know it.

My code so far:
# -*- coding: iso-8859-1 -*-
import sys
import os
from progadn import *

ab1seq = raw_input("Entrez le répertoire où sont les fichiers à analyser: ") or None
if ab1seq == None :
print "Erreur: Pas de répertoire! \n" \
"\nAu revoir \n"
sys.exit()

listrep = os.listdir(ab1seq)
#print listrep

extseq=[]

for f in listrep:
if f[-4:]==".Seq":
extseq.append(f)
#print extseq

for x in extseq:
f=open(x, "r")
seq=f.read()
f.close()
#s=seq


def checkDNA(seq):
"""Retourne une liste des caractères non conformes à l'IUPAC."""

junk=[]
for c in range (len(seq)):
if seq[c] not in iupac:
junk.append([seq[c],c])
#print junk
print "ATTN: Il y a le caractère %s en position %s " % (seq[c],c)
if junk == []:
indinv=range(len(seq))
indinv.reverse()
resultat=""
for i in indinv:
resultat +=comp[seq]
return resultat

seq=checkDNA(seq)

-------------------------------------------------------------------------------------------------------------------------

Path: news3!feeder.news-service.com!news.glorb.com!postnews.google.com!o13g2000cwo.googlegroups.com!not-for-mail
From: (e-mail address removed)
Newsgroups: comp.lang.python
Subject: Re: problem with the logic of read files
Date: 12 Apr 2005 10:47:17 -0700
Organization: http://groups.google.com
Lines: 104
Message-ID: <[email protected]>
References: <[email protected]>
NNTP-Posting-Host: 129.55.200.20
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1113328069 32347 127.0.0.1 (12 Apr 2005 17:47:49 GMT)
X-Complaints-To: (e-mail address removed)
NNTP-Posting-Date: Tue, 12 Apr 2005 17:47:49 +0000 (UTC)
In-Reply-To: <[email protected]>
User-Agent: G2/0.2
Complaints-To: (e-mail address removed)
Injection-Info: o13g2000cwo.googlegroups.com; posting-host=129.55.200.20;
posting-account=tzIXbQwAAACT3z3X4eITVLtksgiDRxhx
Xref: news-x2.support.nl comp.lang.python:438583


I am new to python and I am not in computer science. In fact I am a
biologist and I ma trying to learn python. So if someone can help me, I
will appreciate it.
Thanks


#!/cbi/prg/python/current/bin/python
# -*- coding: iso-8859-1 -*-
import sys
import os
from progadn import *

ab1seq =3D raw_input("Entrez le r=E9pertoire o=F9 sont les fichiers =E0 analyser: ") or None
if ab1seq =3D=3D None :
print "Erreur: Pas de r=E9pertoire! \n"
"\nAu revoir \n"
sys.exit()

listrep =3D os.listdir(ab1seq)
#print listrep

extseq=3D[]

for f in listrep:
###### Minor -- this is better said as: if f.endswith(".Seq"):
if f[-4:]=3D=3D".Seq":
extseq.append(f)
# print extseq

for x in extseq:
f =3D open(x, "r")
###### seq=3D... discards previous data and refers only to that just
read.
###### It would be simplest to process each file as it is read:
@@@@@@ seq=3Df.read()
@@@@@@ checkDNA(seq)
seq=3Df.read()
f.close()
s=3Dseq

def checkDNA(seq):
"""Retourne une liste des caract=E8res non conformes =E0 l'IUPAC."""

junk=3D[]
for c in range (len(seq)):
if seq[c] not in iupac:
junk.append([seq[c],c])
#print junk
print "ATTN: Il y a le caract=E8re %s en position %s " % (seq[c],c)
if junk =3D=3D []:
indinv=3Drange(len(seq))
indinv.reverse()
resultat=3D""
for i in indinv:
resultat +=3Dcomp[seq]
return resultat

seq=3DcheckDNA(seq)
print seq


##### The program segment you posted did not define "comp" or "iupac",
##### so it's a little hard to guess how it's supposed to work. It
would
##### be helpful if you gave a concise description of what you want the

##### program to do, as well as brief sample of input data.
##### I hope this helps! -- George
#I got the following ( as you see only one file is proceed by the
function even if more files is in extseq
['B1-11_win3F_B04_04.ab1.Seq']
['B1-11_win3F_B04_04.ab1.Seq', 'B1-11_win3R_C04_06.ab1.Seq']
['B1-11_win3F_B04_04.ab1.Seq', 'B1-11_win3R_C04_06.ab1.Seq', 'B1-18_win3F_D04_08.ab1.Seq']
['B1-11_win3F_B04_04.ab1.Seq', 'B1-11_win3R_C04_06.ab1.Seq',
'B1-18_win3F_D04_08.ab1.Seq', 'B1-18_win3R_E04_10.ab1.Seq']
['B1-11_win3F_B04_04.ab1.Seq', 'B1-11_win3R_C04_06.ab1.Seq',
'B1-18_win3F_D04_08.ab1.Seq', 'B1-18_win3R_E04_10.ab1.Seq',
'B1-19_win3F_F04_12.ab1.Seq']
..
['B1-11_win3F_B04_04.ab1.Seq', 'B1-11_win3R_C04_06.ab1.Seq',
'B1-18_win3F_D04_08.ab1.Seq', 'B1-18_win3R_E04_10.ab1.Seq',
'B1-19_win3F_F04_12.ab1.Seq', 'B1-19_win3R_G04_14.ab1.Seq',
'B90_win3F_H04_16.ab1.Seq', 'B90_win3R_A05_01.ab1.Seq',
'DL2-11_win3F_H03_15.ab1.Seq', 'DL2-11_win3R_A04_02.ab1.Seq',
'DL2-12_win3F_F03_11.ab1.Seq', 'DL2-12_win3R_G03_13.ab1.Seq',
'M7757_win3F_B05_03.ab1.Seq', 'M7757_win3R_C05_05.ab1.Seq',
'M7759_win3F_D05_07.ab1.Seq', 'M7759_win3R_E05_09.ab1.Seq',
'TCR700-114_win3F_H05_15.ab1.Seq', 'TCR700-114_win3R_A06_02.ab1.Seq',
'TRC666-100_win3F_F05_11.ab1.Seq', 'TRC666-100_win3R_G05_13.ab1.Seq']
after this listing my programs proceed only the last element of this
listing (TRC666-100_win3R_G05_13.ab1.Seq)NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCCCGAAGTGTCCCAGAGCAAATAAATGGACCAAAACGTTTTTAG=
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCCCGAAGTGTCCCAGAGCAAATAAATGGACCAAAACGTTTTTAG=
AATACTTGAACGTGTAATCTCATTTTAA



**********End Of Post*************
 
?

=?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=

My code so far:
# -*- coding: iso-8859-1 -*-
import sys
import os
from progadn import *

ab1seq = raw_input("Entrez le répertoire où sont les fichiers à
analyser: ") or None

Ce serait mieux d'utiliser sys.argv pour spécifier le répertoire dans la
ligne de commande du programme :
import sys
help(sys.argv)
if ab1seq == None :
print "Erreur: Pas de répertoire! \n" \
"\nAu revoir \n"
sys.exit()

je propose :

import os, os.path, sys

def usage():
print "documentation..."
sys.exit(-1)


args = sys.argv[1:]

if not args:
usage()

files = []
for path in args:
if os.path.isfile( path ):
files.append( path )
elif os.path.isdir( path ):
files.extend( [os.path.join( path, fname ) for fname in os.listdir( path
)] )
else:
print "%s n'est ni un fichier ni un répertoire..." % path
usage()

files = [ fname for fname in files if fname.endswith( ".Seq" ) ]
88
if not files:
print "Aucun fichier a traiter."
usage()

print "Fichier à traiter :"
print ", ".join( files )

for path in files:
print path
checkDNA( open( path ).read() )
def checkDNA(seq):
"""Retourne une liste des caractères non conformes à l'IUPAC."""

junk=[]
for c in range (len(seq)):
if seq[c] not in iupac:
junk.append([seq[c],c])
#print junk
print "ATTN: Il y a le caractère %s en position %s " %
(seq[c],c)
if junk == []:
indinv=range(len(seq))
indinv.reverse()
resultat=""
for i in indinv:
resultat +=comp[seq]
return resultat


Je réécris un peu votre fonction d'une manière plus "python", à placer
dans le programme avant son appel bien sûr !

def checkDNA( seq ):
seq = seq.strip()
if not seq:
print "Fichier vide."
return
resultat = []
for i,c in enumerate(seq):
try:
resultat.append( comp[c] )
except KeyError:
print "Catactère <%s> en position <%d> invalide" % (c,i)
resultat.reverse()
return ''.join( resultat )


seq=checkDNA(seq)

-------------------------------------------------------------------------------------------------------------------------

Path:
news3!feeder.news-service.com!news.glorb.com!postnews.google.com!o13g2000cwo.googlegroups.com!not-for-mail
From: (e-mail address removed)
Newsgroups: comp.lang.python
Subject: Re: problem with the logic of read files
Date: 12 Apr 2005 10:47:17 -0700
Organization: http://groups.google.com
Lines: 104
2> Message-ID: said:
References: <[email protected]>
NNTP-Posting-Host: 129.55.200.20
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1113328069 32347 127.0.0.1 (12 Apr 2005
17:47:49 GMT)
X-Complaints-To: (e-mail address removed)
NNTP-Posting-Date: Tue, 12 Apr 2005 17:47:49 +0000 (UTC)
In-Reply-To: <[email protected]>
User-Agent: G2/0.2
Complaints-To: (e-mail address removed)
Injection-Info: o13g2000cwo.googlegroups.com; posting-host=129.55.200.20;
posting-account=tzIXbQwAAACT3z3X4eITVLtksgiDRxhx
Xref: news-x2.support.nl comp.lang.python:438583


I am new to python and I am not in computer science. In fact I am a
biologist and I ma trying to learn python. So if someone can help me, I
will appreciate it.
Thanks


#!/cbi/prg/python/current/bin/python
# -*- coding: iso-8859-1 -*-
import sys
import os
from progadn import *

ab1seq =3D raw_input("Entrez le r=E9pertoire o=F9 sont les fichiers =E0 analyser: ") or None
if ab1seq =3D=3D None :
print "Erreur: Pas de r=E9pertoire! \n"
"\nAu revoir \n"
sys.exit()

listrep =3D os.listdir(ab1seq)
#print listrep

extseq=3D[]

for f in listrep:
###### Minor -- this is better said as: if f.endswith(".Seq"):
if f[-4:]=3D=3D".Seq":
extseq.append(f)
# print extseq

for x in extseq:
f =3D open(x, "r")
###### seq=3D... discards previous data and refers only to that just
read.
###### It would be simplest to process each file as it is read:
@@@@@@ seq=3Df.read()
@@@@@@ checkDNA(seq)
seq=3Df.read()
f.close()
s=3Dseq

def checkDNA(seq):
"""Retourne une liste des caract=E8res non conformes =E0 l'IUPAC."""

junk=3D[]
for c in range (len(seq)):
if seq[c] not in iupac:
junk.append([seq[c],c])
#print junk
print "ATTN: Il y a le caract=E8re %s en position %s " % (seq[c],c)
if junk =3D=3D []:
indinv=3Drange(len(seq))
indinv.reverse()
resultat=3D""
for i in indinv:
resultat +=3Dcomp[seq]
return resultat

seq=3DcheckDNA(seq)
print seq


##### The program segment you posted did not define "comp" or "iupac",
##### so it's a little hard to guess how it's supposed to work. It
would
##### be helpful if you gave a concise description of what you want the

##### program to do, as well as brief sample of input data.
##### I hope this helps! -- George
#I got the following ( as you see only one file is proceed by the
function even if more files is in extseq
['B1-11_win3F_B04_04.ab1.Seq']
['B1-11_win3F_B04_04.ab1.Seq', 'B1-11_win3R_C04_06.ab1.Seq']
['B1-11_win3F_B04_04.ab1.Seq', 'B1-11_win3R_C04_06.ab1.Seq', 'B1-18_win3F_D04_08.ab1.Seq']
['B1-11_win3F_B04_04.ab1.Seq', 'B1-11_win3R_C04_06.ab1.Seq',
'B1-18_win3F_D04_08.ab1.Seq', 'B1-18_win3R_E04_10.ab1.Seq']
['B1-11_win3F_B04_04.ab1.Seq', 'B1-11_win3R_C04_06.ab1.Seq',
'B1-18_win3F_D04_08.ab1.Seq', 'B1-18_win3R_E04_10.ab1.Seq',
'B1-19_win3F_F04_12.ab1.Seq']
..
['B1-11_win3F_B04_04.ab1.Seq', 'B1-11_win3R_C04_06.ab1.Seq',
'B1-18_win3F_D04_08.ab1.Seq', 'B1-18_win3R_E04_10.ab1.Seq',
'B1-19_win3F_F04_12.ab1.Seq', 'B1-19_win3R_G04_14.ab1.Seq',
'B90_win3F_H04_16.ab1.Seq', 'B90_win3R_A05_01.ab1.Seq',
'DL2-11_win3F_H03_15.ab1.Seq', 'DL2-11_win3R_A04_02.ab1.Seq',
'DL2-12_win3F_F03_11.ab1.Seq', 'DL2-12_win3R_G03_13.ab1.Seq',
'M7757_win3F_B05_03.ab1.Seq', 'M7757_win3R_C05_05.ab1.Seq',
'M7759_win3F_D05_07.ab1.Seq', 'M7759_win3R_E05_09.ab1.Seq',
'TCR700-114_win3F_H05_15.ab1.Seq', 'TCR700-114_win3R_A06_02.ab1.Seq',
'TRC666-100_win3F_F05_11.ab1.Seq', 'TRC666-100_win3R_G05_13.ab1.Seq']
after this listing my programs proceed only the last element of this
listing (TRC666-100_win3R_G05_13.ab1.Seq)NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCCCGAAGTGTCCCAGAGCAAATAAATGGACCAAAACGTTTTTAG=
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCCCGAAGTGTCCCAGAGCAAATAAATGGACCAAAACGTTTTTAG=
AATACTTGAACGTGTAATCTCATTTTAA



**********End Of Post*************
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top