Need help with a program


E

evilweasel

Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

AAAAAGACTCGAGTGCGCGGA 0
AAAAAGATAAGCTAATTAAGCTACTGG 0
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
AAAAAGGTCGCCTGACGGCTGC 0

The text is nothing but DNA sequences, and there is a number next to
it. What I will have to do is, ignore those lines that have 0 in it,
and print all other lines (excluding the number) in a new text file
(in a particular format called as FASTA format). This is the program I
wrote for that:

seq1 = []
list1 = []
lister = []
listers = []
listers1 = []
a = []
d = []
i = 0
j = 0
num = 0

file1 = open(sys.argv[1], 'r')
for line in file1:
if not line.startswith('\n'):
seq1 = line.split()
if len(seq1) == 0:
continue

a = seq1[0]
list1.append(a)

d = seq1[1]
lister.append(d)


b = len(lister)
for j in range(0, b):
if lister[j] == 0:
listers.append(j)
else:
listers1.append(j)


print listers1
resultsfile = open("sequences1.txt", 'w')
for i in listers1:
resultsfile.write('\n>seq' + str(i) + '\n' + list1 + '\n')

But this isn't working. I am not able to find the bug in this. I would
be thankful if someone could point it out. Thanks in advance!

Cheers!
 
Ad

Advertisements

A

Alf P. Steinbach

* evilweasel:
Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

AAAAAGACTCGAGTGCGCGGA 0
AAAAAGATAAGCTAATTAAGCTACTGG 0
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
AAAAAGGTCGCCTGACGGCTGC 0

The text is nothing but DNA sequences, and there is a number next to
it. What I will have to do is, ignore those lines that have 0 in it,
and print all other lines (excluding the number) in a new text file
(in a particular format called as FASTA format). This is the program I
wrote for that:

seq1 = []
list1 = []
lister = []
listers = []
listers1 = []
a = []
d = []
i = 0
j = 0
num = 0

file1 = open(sys.argv[1], 'r')
for line in file1:
if not line.startswith('\n'):
seq1 = line.split()
if len(seq1) == 0:
continue

a = seq1[0]
list1.append(a)

d = seq1[1]
lister.append(d)


b = len(lister)
for j in range(0, b):
if lister[j] == 0:
listers.append(j)
else:
listers1.append(j)


print listers1
resultsfile = open("sequences1.txt", 'w')
for i in listers1:
resultsfile.write('\n>seq' + str(i) + '\n' + list1 + '\n')

But this isn't working.


What do you mean by "isn't working"?

I am not able to find the bug in this. I would
be thankful if someone could point it out. Thanks in advance!

What do you expect as output, and what do you actually get as output?


Cheers,

- Alf
 
K

Krister Svanlund

Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

AAAAAGACTCGAGTGCGCGGA   0
AAAAAGATAAGCTAATTAAGCTACTGG     0
AAAAAGATAAGCTAATTAAGCTACTGGGTT   1
AAAAAGGGGGCTCACAGGGGAGGGGTAT     1
AAAAAGGTCGCCTGACGGCTGC  0

The text is nothing but DNA sequences, and there is a number next to
it. What I will have to do is, ignore those lines that have 0 in it,
and print all other lines (excluding the number) in a new text file
(in a particular format called as FASTA format). This is the program I
wrote for that:

seq1 = []
list1 = []
lister = []
listers = []
listers1 = []
a = []
d = []
i = 0
j = 0
num = 0

file1 = open(sys.argv[1], 'r')
for line in file1:
   if not line.startswith('\n'):
       seq1 = line.split()
       if len(seq1) == 0:
           continue

       a = seq1[0]
       list1.append(a)

       d = seq1[1]
       lister.append(d)


b = len(lister)
for j in range(0, b):
   if lister[j] == 0:
       listers.append(j)
   else:
       listers1.append(j)


print listers1
resultsfile = open("sequences1.txt", 'w')
for i in listers1:
   resultsfile.write('\n>seq' + str(i) + '\n' + list1 + '\n')

But this isn't working. I am not able to find the bug in this. I would
be thankful if someone could point it out. Thanks in advance!

Cheers!


I'm not totaly sure what you want to do but try this (python2.6+):

newlines = []

with open(sys.argv[1], 'r') as f:
text = f.read();
for line in text.splitlines():
if not line.strip() and line.strip().endswith('1'):
newlines.append('seq'+line)

with open(sys.argv[2], 'w') as f:
f.write('\n'.join(newlines))
 
K

Krister Svanlund

Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

AAAAAGACTCGAGTGCGCGGA   0
AAAAAGATAAGCTAATTAAGCTACTGG     0
AAAAAGATAAGCTAATTAAGCTACTGGGTT   1
AAAAAGGGGGCTCACAGGGGAGGGGTAT     1
AAAAAGGTCGCCTGACGGCTGC  0

The text is nothing but DNA sequences, and there is a number next to
it. What I will have to do is, ignore those lines that have 0 in it,
and print all other lines (excluding the number) in a new text file
(in a particular format called as FASTA format). This is the program I
wrote for that:

seq1 = []
list1 = []
lister = []
listers = []
listers1 = []
a = []
d = []
i = 0
j = 0
num = 0

file1 = open(sys.argv[1], 'r')
for line in file1:
   if not line.startswith('\n'):
       seq1 = line.split()
       if len(seq1) == 0:
           continue

       a = seq1[0]
       list1.append(a)

       d = seq1[1]
       lister.append(d)


b = len(lister)
for j in range(0, b):
   if lister[j] == 0:
       listers.append(j)
   else:
       listers1.append(j)


print listers1
resultsfile = open("sequences1.txt", 'w')
for i in listers1:
   resultsfile.write('\n>seq' + str(i) + '\n' + list1 + '\n')

But this isn't working. I am not able to find the bug in this. I would
be thankful if someone could point it out. Thanks in advance!

Cheers!


I'm not totaly sure what you want to do but try this (python2.6+):

newlines = []

with open(sys.argv[1], 'r') as f:
   text = f.read();
   for line in text.splitlines():
       if not line.strip() and line.strip().endswith('1'):

           newlines.append('seq'+line.strip()[:-1].strip())
with open(sys.argv[2], 'w') as f:
   f.write('\n'.join(newlines))

Gah, made some errors
 
K

Krister Svanlund

Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

AAAAAGACTCGAGTGCGCGGA   0
AAAAAGATAAGCTAATTAAGCTACTGG     0
AAAAAGATAAGCTAATTAAGCTACTGGGTT   1
AAAAAGGGGGCTCACAGGGGAGGGGTAT     1
AAAAAGGTCGCCTGACGGCTGC  0

The text is nothing but DNA sequences, and there is a number next to
it. What I will have to do is, ignore those lines that have 0 in it,
and print all other lines (excluding the number) in a new text file
(in a particular format called as FASTA format). This is the program I
wrote for that:

seq1 = []
list1 = []
lister = []
listers = []
listers1 = []
a = []
d = []
i = 0
j = 0
num = 0

file1 = open(sys.argv[1], 'r')
for line in file1:
   if not line.startswith('\n'):
       seq1 = line.split()
       if len(seq1) == 0:
           continue

       a = seq1[0]
       list1.append(a)

       d = seq1[1]
       lister.append(d)


b = len(lister)
for j in range(0, b):
   if lister[j] == 0:
       listers.append(j)
   else:
       listers1.append(j)


print listers1
resultsfile = open("sequences1.txt", 'w')
for i in listers1:
   resultsfile.write('\n>seq' + str(i) + '\n' + list1 + '\n')

But this isn't working. I am not able to find the bug in this. I would
be thankful if someone could point it out. Thanks in advance!

Cheers!


I'm trying this again:

newlines = []

with open(sys.argv[1], 'r') as f:
   text = f.read();
   for line in (l.strip() for l in text.splitlines()):
       if line:
           line_elem = line.split()
           if len(line_elem) == 2 and line_elem[1] == '1':
               newlines.append('seq'+line_elem[0])

with open(sys.argv[2], 'w') as f:
   f.write('\n'.join(newlines))
 
Ad

Advertisements

D

D'Arcy J.M. Cain

I am a newbie to python, and I would be grateful if someone could
Welcome.

point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

You don't say how it isn't working. As a first step you should read
http://catb.org/~esr/faqs/smart-questions.html.
The text is nothing but DNA sequences, and there is a number next to
it. What I will have to do is, ignore those lines that have 0 in it,

Your code doesn't completely ignore them. See below.
and print all other lines (excluding the number) in a new text file
(in a particular format called as FASTA format). This is the program I
wrote for that:

seq1 = []
list1 = []
lister = []
listers = []
listers1 = []
a = []
d = []
i = 0
j = 0
num = 0

This seems like an awful lot of variables for such a simple task.
file1 = open(sys.argv[1], 'r')
for line in file1:

This is good. You aren't trying to load the whole file into memory at
once. If the file is huge as you say then that would have been bad. I
would have made one small optimization that saves one assignment and
one extra variable.

for line in open(sys.argv[1], 'r'):
if not line.startswith('\n'):
seq1 = line.split()
if len(seq1) == 0:
continue

This is redundant and perhaps not even correct at the end of the file.
It assumes that the last line ends with a newline. Look at what
'\n'.split() gives you and see if you can't improve the above code.

Another small optimization - "if seq1" is better than "if len(seq1)".
a = seq1[0]
list1.append(a)

Aha! I may have found your bug. Are you mixing tabs and spaces?
Don't do that. Either always use spaces or always use tabs. My
suggestion is to use spaces and choose a short indent such as three or
even two but that's a religious issue.
d = seq1[1]
lister.append(d)

You can also do "a, d = seq1". Of course you must be sure that you
have two fields. Perhaps that's guaranteed for your input but a quick
sanity test wouldn't hurt here.

However, I don't understand all of the above. It may also be a source
of problems. You say the files are huge. Are you filling up memory
here? You did the smart thing reading the file but you lose it here.
In any case, see below.
b = len(lister)
for j in range(0, b):

Go lookup zip()
if lister[j] == 0:

I think that you will find that lister[j] is "0", not 0.
listers.append(j)
else:
listers1.append(j)

Why are you collecting the input? Just toss the '0' ones and write the
others lines directly to the output.

Hope this helps with this script and in further understanding the power
and simplicity of Python. Good luck.
 
E

evilweasel

I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:
seq59902 TTTTTTTATAAAATATATAGT

TTTTTTTATTTCTTGGCGTTGT

TTTTTTTGGTTGCCCTGCGTGG

seq59905
TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. :) I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it. :)
 
N

nn

I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:


TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. :) I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it. :)

People have already given you some pointers to your problem. In the
end you will have to "tweak the details" because only you have access
to the data not us.

Just as example here is another way to do what you are doing:

with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
partgen=(line.split() for line in infile)
dnagen=(str(i+1)+'\n'+part[0]+'\n'
for i,part in enumerate(partgen)
if len(part)>1 and part[1]!='0')
outfile.writelines(dnagen)
 
A

Arnaud Delobelle

nn said:
I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:


TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. :) I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it. :)

People have already given you some pointers to your problem. In the
end you will have to "tweak the details" because only you have access
to the data not us.

Just as example here is another way to do what you are doing:

with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
partgen=(line.split() for line in infile)
dnagen=(str(i+1)+'\n'+part[0]+'\n'
for i,part in enumerate(partgen)
if len(part)>1 and part[1]!='0')
outfile.writelines(dnagen)

I think that generator expressions are overrated :) What's wrong with:

with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
for i, line in enumerate(infile):
parts = line.split()
if len(parts) > 1 and parts[1] != '0':
outfile.write(">seq%s\n%s\n" % (i+1, parts[0]))

(untested)
 
J

John Posner

I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:

TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. :) I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it. :)

Your program is a good first try. It contains a newbie error (looking
for the number 0 instead of the string "0"). But more importantly,
you're doing too much work yourself, rather than letting Python do the
heavy lifting for you. These practices and tools make life a lot easier:

* As others have noted, don't accumulate output in a list. Just write
data to the output file line-by-line.

* You don't need to initialize every variable at the beginning of the
program. But there's no harm in it.

* Use the enumerate() function to provide a line counter:

for counter, line in enumerate(file1):

This eliminates the need to accumulate output data in a list, then use
the index variable "j" as the line counter.

* Use string formatting. Each chunk of output is a two-line string, with
the line-counter and the DNA sequence as variables:

outformat = """seq%05d
%s
"""

... later, inside your loop ...

resultsfile.write(outformat % (counter, sequence))

HTH,
John
 
Ad

Advertisements

J

Jean-Michel Pichavant

evilweasel said:
I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:


TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. :) I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it. :)
Using regexp may increase readability (if you are familiar with it).
What about

import re

output = open("sequences1.txt", 'w')

for index, line in enumerate(open(sys.argv[1], 'r')):
match = re.match('(?P<sequence>[GATC]+)\s+1')
if match:
output.write('seq%s\n%s\n' % (index, match.group('sequence')))


Jean-Michel
 
D

D'Arcy J.M. Cain

Using regexp may increase readability (if you are familiar with it).

If you have a problem and you think that regular expressions are the
solution then now you have two problems. Regex is really overkill for
the OP's problem and it certainly doesn't improve readability.
 
J

Jean-Michel Pichavant

D'Arcy J.M. Cain said:
If you have a problem and you think that regular expressions are the
solution then now you have two problems. Regex is really overkill for
the OP's problem and it certainly doesn't improve readability.
It depends on the reader ability to understand a *simple* regexp.
It is also strange to get such answer after taking so much precautions,
so let me quote myself:

"Using regexp *may* increase readability (*if* you are *familiar* with it)."

I honestly find it quite readable in the sample code I provided and
spares all the if-len-startwith-strip logic, but If the OP does not
agree, fine with me. But there's no need to get certain that I'm
completly wrong.

JM
 
S

Steven Howe

evilweasel said:
I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:

TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. :) I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it. :)
Using regexp may increase readability (if you are familiar with it).
What about

import re

output = open("sequences1.txt", 'w')

for index, line in enumerate(open(sys.argv[1], 'r')):
match = re.match('(?P<sequence>[GATC]+)\s+1')
if match:
output.write('seq%s\n%s\n' % (index, match.group('sequence')))


Jean-Michel

Finally!

After ready 8 or 9 messages about find a line ending with '1', someone
suggests Regex.
It was my first thought.

Steven
 
M

Mensanator

Using regexp may increase readability (if you are familiar with it).
What about
import re
output = open("sequences1.txt", 'w')
for index, line in enumerate(open(sys.argv[1], 'r')):
   match = re.match('(?P<sequence>[GATC]+)\s+1')
   if match:
       output.write('seq%s\n%s\n' % (index, match.group('sequence')))
Jean-Michel

Finally!

After ready 8 or 9 messages about find a line ending with '1', someone
suggests Regex.
It was my first thought.

And as a first thought, it is, of course, wrong.

You don't want lines ending in '1', you want ANY non-'0' amount.

Likewise, you don't want to exclude lines ending in '0' because
you'll end up excluding counts of 10, 20, 30, etc.

You need a regex that extracts ALL the numeric characters at the end
of the
line and exclude those that evaluate to 0.
 
Ad

Advertisements

M

MRAB

Steven said:
evilweasel said:
I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:

seq59902
TTTTTTTATAAAATATATAGT

seq59903
TTTTTTTATTTCTTGGCGTTGT

seq59904
TTTTTTTGGTTGCCCTGCGTGG

seq59905
TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. :) I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it. :)
Using regexp may increase readability (if you are familiar with it).
What about

import re

output = open("sequences1.txt", 'w')

for index, line in enumerate(open(sys.argv[1], 'r')):
match = re.match('(?P<sequence>[GATC]+)\s+1')
if match:
output.write('seq%s\n%s\n' % (index, match.group('sequence')))


Jean-Michel

Finally!

After ready 8 or 9 messages about find a line ending with '1', someone
suggests Regex.
It was my first thought.
I'm a great fan of regexes, but I never though of using them for this
because it doesn't look like a regex type of problem to me.
 
N

nn

Arnaud said:
nn said:
I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:

seq59902

TTTTTTTATAAAATATATAGT

seq59903

TTTTTTTATTTCTTGGCGTTGT

seq59904

TTTTTTTGGTTGCCCTGCGTGG

seq59905

TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. :) I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it. :)

People have already given you some pointers to your problem. In the
end you will have to "tweak the details" because only you have access
to the data not us.

Just as example here is another way to do what you are doing:

with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
partgen=(line.split() for line in infile)
dnagen=(str(i+1)+'\n'+part[0]+'\n'
for i,part in enumerate(partgen)
if len(part)>1 and part[1]!='0')
outfile.writelines(dnagen)

I think that generator expressions are overrated :) What's wrong with:

with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
for i, line in enumerate(infile):
parts = line.split()
if len(parts) > 1 and parts[1] != '0':
outfile.write(">seq%s\n%s\n" % (i+1, parts[0]))

(untested)

Nothing really,
After posting I was thinking I should have posted a more
straightforward version like the one you wrote. Now there is! It
probably is more efficient too. I just have a tendency to think in
terms of pipes: "pipe this junk in here, then in here, get output".
Probably damage from too much Unix scripting.Since I can't resist the
urge to post crazy code here goes the bonus round (don't do this at
work):

open('dnaout.dat','w').writelines(
'seq%s\n%s\n'%(i+1,part[0])
for i,part in enumerate(line.split() for line in open('dnain.dat'))
if len(part)>1 and part[1]!='0')
 
A

Arnaud Delobelle

nn said:
After posting I was thinking I should have posted a more
straightforward version like the one you wrote. Now there is! It
probably is more efficient too. I just have a tendency to think in
terms of pipes: "pipe this junk in here, then in here, get output".
Probably damage from too much Unix scripting.

This is funny, I did think *exactly* this when I saw your code :)
 
Ad

Advertisements

J

Johann Spies

Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

AAAAAGACTCGAGTGCGCGGA 0
AAAAAGATAAGCTAATTAAGCTACTGG 0
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
AAAAAGGTCGCCTGACGGCTGC 0

I know this is a python list but if you really want to get the job
done quickly this is one method without writing python code:

$ cat /tmp/y
AAAAAGACTCGAGTGCGCGGA 0
AAAAAGATAAGCTAATTAAGCTACTGG 0
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
AAAAAGGTCGCCTGACGGCTGC 0
$ grep -v 0 /tmp/y > tmp/z
$ cat /tmp/z
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1

Regards
Johann
--
Johann Spies Telefoon: 021-808 4599
Informasietegnologie, Universiteit van Stellenbosch

"My son, if sinners entice thee, consent thou not."
Proverbs 1:10
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top