Need help with a program

S

Steven D'Aprano

I know this is a python list but if you really want to get the job done
quickly this is one method without writing python code:

$ cat /tmp/y
AAAAAGACTCGAGTGCGCGGA 0
AAAAAGATAAGCTAATTAAGCTACTGG 0
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
AAAAAGGTCGCCTGACGGCTGC 0
$ grep -v 0 /tmp/y > tmp/z
$ cat /tmp/z
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1

That will do the wrong thing for lines like:

AAAAAGATAAGCTAATTAAGCTACTGGGTT 10
 
J

Johann Spies

That will do the wrong thing for lines like:

AAAAAGATAAGCTAATTAAGCTACTGGGTT 10

In that case change the grep to ' 0$' then only the lines with a
singel digit '0' at the end of the line will be excluded.

One can do the same using regulare expressions in Python and it will
probably a lot slower on large files.

Regards
Johann
--
Johann Spies Telefoon: 021-808 4599
Informasietegnologie, Universiteit van Stellenbosch

"My son, if sinners entice thee, consent thou not."
Proverbs 1:10
 
D

D'Arcy J.M. Cain

I know this is a python list but if you really want to get the job
done quickly this is one method without writing python code:
[...]
$ grep -v 0 /tmp/y > tmp/z

There's plenty of ways to do it without writing Python. C, C++, Perl,
Forth, Awk, BASIC, Intercal, etc. So what? Besides, your solution
doesn't work. You want "grep -vw 0 /tmp/y > tmp/z" and even then it
doesn't meet the requirements. It extracts the lines the OP wants but
doesn't reformat them. It also assumes a Unix system or at least
something with grep installed so it isn't portable.

If you want to see how the same task can be done in many different
languages see http://www.roesler-ac.de/wolfram/hello.htm.
 
N

nn

Johann said:
I know this is a python list but if you really want to get the job
done quickly this is one method without writing python code:

$ cat /tmp/y
AAAAAGACTCGAGTGCGCGGA 0
AAAAAGATAAGCTAATTAAGCTACTGG 0
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
AAAAAGGTCGCCTGACGGCTGC 0
$ grep -v 0 /tmp/y > tmp/z
$ cat /tmp/z
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1

Regards
Johann
--
Johann Spies Telefoon: 021-808 4599
Informasietegnologie, Universiteit van Stellenbosch

"My son, if sinners entice thee, consent thou not."
Proverbs 1:10

I would rather use awk for this:

awk 'NF==2 && $2!~/^0$/ {printf("seq%s\n%s\n",NR,$1)}' dnain.dat

but I think that is getting a bit off topic...
 
A

Aahz

If you have a problem and you think that regular expressions are the
solution then now you have two problems. Regex is really overkill for
the OP's problem and it certainly doesn't improve readability.

If you're going to use a quote, it works better if you use the exact
quote and attribute it:

'Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.' --Jamie Zawinski
 
N

Nobody

If you're going to use a quote, it works better if you use the exact
quote and attribute it:

'Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.' --Jamie Zawinski

He may have mixed that one up with a different (and more generic) saying:

"If you think that X is the solution to your problem, then you don't
understand X and you don't understand your problem."

For most values of X.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top