Extracting Text

J

Jake Gottlieb

I am trying to extract lines with:

GO:0009986

out of:


ENSG00000113494.3 AAA60174.1 GO:0009123 5618 216638_s_at
ENSG00000113494.3 AAD32032.1 GO:0009345 5618 216638_s_at
ENSG00000113494.3 AAK32703.1 GO:0009764 5618 216638_s_at
ENSG00000113494.3 AAH59392.1 GO:0009986 5618 216638_s_at

ENSG00000113494.3 AAA60174.1 GO:0009986 206346_at
ENSG00000113494.3 AAD32032.1 GO:0009867 206346_at
ENSG00000113494.3 AAK32703.1 GO:0004567 206346_at
ENSG00000113494.3 AAH59392.1 GO:0000678 206346_at

ENSG00000113494.3 AAA60174.1 GO:0009986 211917_s_at
ENSG00000113494.3 AAD32032.1 GO:0009986 211917_s_at
ENSG00000113494.3 AAK32703.1 GO:0005764 211917_s_at
ENSG00000113494.3 AAH59392.1 GO:0009986 211917_s_at

ENSG00000113494.3 AAA60174.1 GO:0009986 210476_s_at
ENSG00000113494.3 AAD32032.1 GO:0003765 210476_s_at
ENSG00000113494.3 AAK32703.1 GO:0009986 210476_s_at
ENSG00000113494.3 AAH59392.1 GO:0005876 210476_s_at

I have been trying to write a program for it, but can't seem to do it.
If someone could help, I would be very appreciative (I am sure it's
really easy, but Perl is new to me).

Thanks
 
P

Paul Lalli

I am trying to extract lines with:

GO:0009986

out of:


ENSG00000113494.3 AAA60174.1 GO:0009123 5618 216638_s_at
ENSG00000113494.3 AAD32032.1 GO:0009345 5618 216638_s_at
ENSG00000113494.3 AAK32703.1 GO:0009764 5618 216638_s_at
ENSG00000113494.3 AAH59392.1 GO:0009986 5618 216638_s_at

ENSG00000113494.3 AAA60174.1 GO:0009986 206346_at
ENSG00000113494.3 AAD32032.1 GO:0009867 206346_at
ENSG00000113494.3 AAK32703.1 GO:0004567 206346_at
ENSG00000113494.3 AAH59392.1 GO:0000678 206346_at

ENSG00000113494.3 AAA60174.1 GO:0009986 211917_s_at
ENSG00000113494.3 AAD32032.1 GO:0009986 211917_s_at
ENSG00000113494.3 AAK32703.1 GO:0005764 211917_s_at
ENSG00000113494.3 AAH59392.1 GO:0009986 211917_s_at

ENSG00000113494.3 AAA60174.1 GO:0009986 210476_s_at
ENSG00000113494.3 AAD32032.1 GO:0003765 210476_s_at
ENSG00000113494.3 AAK32703.1 GO:0009986 210476_s_at
ENSG00000113494.3 AAH59392.1 GO:0005876 210476_s_at

I have been trying to write a program for it, but can't seem to do it.
If someone could help, I would be very appreciative (I am sure it's
really easy, but Perl is new to me).

Show us what you've written so far, so we can help you to see why it
"doesn't work". You've shown us the input and we can deduce the desired
output. Now show us your code, and what output it gave, so we may see how
it doesn't meet your specifications.

Paul Lalli
 
J

Jake Gottlieb

Gunnar Hjalmarsson said:

Here is my code. I am sure its wrong, and would be greatful if someone
could correct and complete it. I would like to extract lines from the
original code, and put them into another text file. I have been trying
for a while:

while (<file.txt>) {
$line = $_;
$yes = (index $line, 'GO:000');
if ($yes > -1) {
print "YES : $line";
}
if ($line =~ /ENSG\d+.\d\s+\S+\s+GO:\d{7}\s+\d+\s+/){
print "La GO! $line \n";
}
}
 
P

Peter Hickman

Jake said:
Here is my code. I am sure its wrong, and would be greatful if someone
could correct and complete it. I would like to extract lines from the
original code, and put them into another text file. I have been trying
for a while:

while (<file.txt>) {
$line = $_;
$yes = (index $line, 'GO:000');
if ($yes > -1) {
print "YES : $line";
}
if ($line =~ /ENSG\d+.\d\s+\S+\s+GO:\d{7}\s+\d+\s+/){
print "La GO! $line \n";
}
}

If all you want is to display lines that contain the string GO:0009986 then this
will do the trick.

[peter@wasabi xxx]$ cat prog
#!/usr/bin/perl -w

use strict;
use warnings;

while ( my $line = <> ) {
next unless $line =~ m/\s+GO:0009986\s+/;

print $line;
}
[peter@wasabi xxx]$

Basically it reads data from standard input and skips if the line does not match
the regex otherwise it prints it to standard output.

[peter@wasabi xxx]$ perl prog file.txt
ENSG00000113494.3 AAH59392.1 GO:0009986 5618 216638_s_at
ENSG00000113494.3 AAA60174.1 GO:0009986 206346_at
ENSG00000113494.3 AAA60174.1 GO:0009986 211917_s_at
ENSG00000113494.3 AAD32032.1 GO:0009986 211917_s_at
ENSG00000113494.3 AAH59392.1 GO:0009986 211917_s_at
ENSG00000113494.3 AAA60174.1 GO:0009986 210476_s_at
ENSG00000113494.3 AAK32703.1 GO:0009986 210476_s_at
[peter@wasabi xxx]$

I'm not too sure what all the $yes stuff in your code was for and <file.txt> is
not how you open or handle a file but you got the idea of regex although it
would seem to be over specified for the problem.
 
A

Anno Siegel

Peter Hickman said:
Jake Gottlieb wrote:
[...]

If all you want is to display lines that contain the string GO:0009986
then this
will do the trick.

[peter@wasabi xxx]$ cat prog
#!/usr/bin/perl -w

use strict;
use warnings;

while ( my $line = <> ) {
next unless $line =~ m/\s+GO:0009986\s+/;
^ ^
The "+"es make no difference here.
print $line;
}

That can be simplified to

/\sGO:0009986\s/ and print while <>;

Anno
 
G

Gunnar Hjalmarsson

Jake said:
Here is my code. I am sure its wrong,

Please be more specific about the problem. You'd better study the
posting guidelines for this group:

http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
and would be greatful if someone could correct and complete it. I
would like to extract lines from the original code, and put them
into another text file.

Below please find a couple of comments. If you want to write something
to another file, you should open that file for writing...
while (<file.txt>) {

That does not open the file for reading. This does:

open my $fh, '< file.txt' or die $!;
while (<$fh>) {

See

perldoc -f open
$line = $_;
$yes = (index $line, 'GO:000');

You should have

use strict;
use warnings;

in the beginning of the program, and declare the variables you introduce:

my $line = $_;
my $yes = (index $line, 'GO:000');
----^^
 
T

Tore Aursand

while (<file.txt>) {

That doesn't read from "file.txt". This one does (untested);

open( FH, '<', 'file.txt' ) or die "$!\n";
while ( <FH> ) {
# ...
}
$line = $_;
$yes = (index $line, 'GO:000');
if ($yes > -1) {
print "YES : $line";
}
if ($line =~ /ENSG\d+.\d\s+\S+\s+GO:\d{7}\s+\d+\s+/){
print "La GO! $line \n";
}
}

If you are sure that you can match on 'GO:000', you're on the right track
using 'index'. But you don't need any regular expressions (untested);

open( FH, '<', 'file.txt' ) or die "$!\n";
while ( <FH> ) {
next unless ( index($_, 'GO:000') >= 0 );
print;
}
close( FH );

Also: Be sure to 'use strict' and 'use warnings' in your script(s).
 
T

Tore Aursand

1 + index $_, 'GO:000' or next;

While we're at it: How about keeping those two lines (the check and the
print) on one line?

while ( <> ) {
index($_, 'GO:000') and print;
}
 
J

John Bokma

Tore said:
While we're at it: How about keeping those two lines (the check and the
print) on one line?

while ( <> ) {
index($_, 'GO:000') and print;
}

what if $_ is 'GO:000' ?
 
J

J. Romano

I am trying to extract lines with:

GO:0009986

out of:

ENSG00000113494.3 AAA60174.1 GO:0009123 5618 216638_s_at
ENSG00000113494.3 AAD32032.1 GO:0009345 5618 216638_s_at
ENSG00000113494.3 AAK32703.1 GO:0009764 5618 216638_s_at
ENSG00000113494.3 AAH59392.1 GO:0009986 5618 216638_s_at
<snip>

If all you want is to print out the lines that contain
"GO:0009986", you can just use the "grep" command (if you happen to be
on UNIX):

grep "GO:0009986" file.txt

If you really want to use Perl for this task, you can use a one-liner
that's almost as simple:

perl -ne "print if /GO:0009986/" file.txt

If that looks confusing to you, let me offer a short explanation:

The -ne switch tells perl to run the "print if /GO:0009986/"
command on every line of file.txt (with $_ as the current line).
Since the "print" statement has no arguments, it defaults to $_.
Therefore, the above line is identical to:

perl -ne '$line = $_; print $line if /GO:0009986/' file.txt

which means that, for every line, that line will only get printed if
the string "GO:0009986" is found in that line.

But it's not entirely clear to me if you wanted to search for the
exact string "GO:0009986" or just any string that matched "GO:" and
any seven digits. If the latter is the case, you can use the
following:

On Unix:
perl -ne 'print if /GO:\d{7}/' file.txt
On Win32:
perl -ne "print if /GO:[0-9]{7}/" file.txt

I hope this helps!

-- Jean-Luc
 
J

John Bokma

J. Romano said:
If all you want is to print out the lines that contain
"GO:0009986", you can just use the "grep" command (if you happen to be
on UNIX):

grep "GO:0009986" file.txt

the grep family is available on Windows, and many more OSes.
 
J

Jake Gottlieb

Tore Aursand said:
While we're at it: How about keeping those two lines (the check and the
print) on one line?

while ( <> ) {
index($_, 'GO:000') and print;
}

Thank you all. What is the command to save it to a text file. Thanks again.
 
T

Tad McClellan

John Bokma said:
the grep family is available on Windows, and many more OSes.


Perl runs lots of places.

Do grep(1) in perl.

perl -ne 'print if /GO:0009986/' file.txt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top