Extracting Text

Discussion in 'Perl Misc' started by Jake Gottlieb, Jun 10, 2004.

  1. I am trying to extract lines with:

    GO:0009986

    out of:


    ENSG00000113494.3 AAA60174.1 GO:0009123 5618 216638_s_at
    ENSG00000113494.3 AAD32032.1 GO:0009345 5618 216638_s_at
    ENSG00000113494.3 AAK32703.1 GO:0009764 5618 216638_s_at
    ENSG00000113494.3 AAH59392.1 GO:0009986 5618 216638_s_at

    ENSG00000113494.3 AAA60174.1 GO:0009986 206346_at
    ENSG00000113494.3 AAD32032.1 GO:0009867 206346_at
    ENSG00000113494.3 AAK32703.1 GO:0004567 206346_at
    ENSG00000113494.3 AAH59392.1 GO:0000678 206346_at

    ENSG00000113494.3 AAA60174.1 GO:0009986 211917_s_at
    ENSG00000113494.3 AAD32032.1 GO:0009986 211917_s_at
    ENSG00000113494.3 AAK32703.1 GO:0005764 211917_s_at
    ENSG00000113494.3 AAH59392.1 GO:0009986 211917_s_at

    ENSG00000113494.3 AAA60174.1 GO:0009986 210476_s_at
    ENSG00000113494.3 AAD32032.1 GO:0003765 210476_s_at
    ENSG00000113494.3 AAK32703.1 GO:0009986 210476_s_at
    ENSG00000113494.3 AAH59392.1 GO:0005876 210476_s_at

    I have been trying to write a program for it, but can't seem to do it.
    If someone could help, I would be very appreciative (I am sure it's
    really easy, but Perl is new to me).

    Thanks
    Jake Gottlieb, Jun 10, 2004
    #1
    1. Advertising

  2. Jake Gottlieb

    Paul Lalli Guest

    On Thu, 10 Jun 2004, Jake Gottlieb wrote:

    > I am trying to extract lines with:
    >
    > GO:0009986
    >
    > out of:
    >
    >
    > ENSG00000113494.3 AAA60174.1 GO:0009123 5618 216638_s_at
    > ENSG00000113494.3 AAD32032.1 GO:0009345 5618 216638_s_at
    > ENSG00000113494.3 AAK32703.1 GO:0009764 5618 216638_s_at
    > ENSG00000113494.3 AAH59392.1 GO:0009986 5618 216638_s_at
    >
    > ENSG00000113494.3 AAA60174.1 GO:0009986 206346_at
    > ENSG00000113494.3 AAD32032.1 GO:0009867 206346_at
    > ENSG00000113494.3 AAK32703.1 GO:0004567 206346_at
    > ENSG00000113494.3 AAH59392.1 GO:0000678 206346_at
    >
    > ENSG00000113494.3 AAA60174.1 GO:0009986 211917_s_at
    > ENSG00000113494.3 AAD32032.1 GO:0009986 211917_s_at
    > ENSG00000113494.3 AAK32703.1 GO:0005764 211917_s_at
    > ENSG00000113494.3 AAH59392.1 GO:0009986 211917_s_at
    >
    > ENSG00000113494.3 AAA60174.1 GO:0009986 210476_s_at
    > ENSG00000113494.3 AAD32032.1 GO:0003765 210476_s_at
    > ENSG00000113494.3 AAK32703.1 GO:0009986 210476_s_at
    > ENSG00000113494.3 AAH59392.1 GO:0005876 210476_s_at
    >
    > I have been trying to write a program for it, but can't seem to do it.
    > If someone could help, I would be very appreciative (I am sure it's
    > really easy, but Perl is new to me).


    Show us what you've written so far, so we can help you to see why it
    "doesn't work". You've shown us the input and we can deduce the desired
    output. Now show us your code, and what output it gave, so we may see how
    it doesn't meet your specifications.

    Paul Lalli
    Paul Lalli, Jun 10, 2004
    #2
    1. Advertising

  3. Jake Gottlieb wrote:
    > I am trying to extract lines with:
    >
    > GO:0009986


    <snip>

    > I have been trying to write a program for it, but can't seem to do
    > it. If someone could help, I would be very appreciative (I am sure
    > it's really easy, but Perl is new to me).


    http://learn.perl.org/

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Jun 10, 2004
    #3
  4. Gunnar Hjalmarsson <> wrote in message news:<>...
    > Jake Gottlieb wrote:
    > > I am trying to extract lines with:
    > >
    > > GO:0009986

    >
    > <snip>
    >
    > > I have been trying to write a program for it, but can't seem to do
    > > it. If someone could help, I would be very appreciative (I am sure
    > > it's really easy, but Perl is new to me).

    >
    > http://learn.perl.org/


    Here is my code. I am sure its wrong, and would be greatful if someone
    could correct and complete it. I would like to extract lines from the
    original code, and put them into another text file. I have been trying
    for a while:

    while (<file.txt>) {
    $line = $_;
    $yes = (index $line, 'GO:000');
    if ($yes > -1) {
    print "YES : $line";
    }
    if ($line =~ /ENSG\d+.\d\s+\S+\s+GO:\d{7}\s+\d+\s+/){
    print "La GO! $line \n";
    }
    }
    Jake Gottlieb, Jun 11, 2004
    #4
  5. Jake Gottlieb wrote:
    > Here is my code. I am sure its wrong, and would be greatful if someone
    > could correct and complete it. I would like to extract lines from the
    > original code, and put them into another text file. I have been trying
    > for a while:
    >
    > while (<file.txt>) {
    > $line = $_;
    > $yes = (index $line, 'GO:000');
    > if ($yes > -1) {
    > print "YES : $line";
    > }
    > if ($line =~ /ENSG\d+.\d\s+\S+\s+GO:\d{7}\s+\d+\s+/){
    > print "La GO! $line \n";
    > }
    > }


    If all you want is to display lines that contain the string GO:0009986 then this
    will do the trick.

    [peter@wasabi xxx]$ cat prog
    #!/usr/bin/perl -w

    use strict;
    use warnings;

    while ( my $line = <> ) {
    next unless $line =~ m/\s+GO:0009986\s+/;

    print $line;
    }
    [peter@wasabi xxx]$

    Basically it reads data from standard input and skips if the line does not match
    the regex otherwise it prints it to standard output.

    [peter@wasabi xxx]$ perl prog file.txt
    ENSG00000113494.3 AAH59392.1 GO:0009986 5618 216638_s_at
    ENSG00000113494.3 AAA60174.1 GO:0009986 206346_at
    ENSG00000113494.3 AAA60174.1 GO:0009986 211917_s_at
    ENSG00000113494.3 AAD32032.1 GO:0009986 211917_s_at
    ENSG00000113494.3 AAH59392.1 GO:0009986 211917_s_at
    ENSG00000113494.3 AAA60174.1 GO:0009986 210476_s_at
    ENSG00000113494.3 AAK32703.1 GO:0009986 210476_s_at
    [peter@wasabi xxx]$

    I'm not too sure what all the $yes stuff in your code was for and <file.txt> is
    not how you open or handle a file but you got the idea of regex although it
    would seem to be over specified for the problem.
    Peter Hickman, Jun 11, 2004
    #5
  6. Jake Gottlieb

    Anno Siegel Guest

    Peter Hickman <> wrote in comp.lang.perl.misc:
    > Jake Gottlieb wrote:


    [...]

    > If all you want is to display lines that contain the string GO:0009986
    > then this
    > will do the trick.
    >
    > [peter@wasabi xxx]$ cat prog
    > #!/usr/bin/perl -w
    >
    > use strict;
    > use warnings;
    >
    > while ( my $line = <> ) {
    > next unless $line =~ m/\s+GO:0009986\s+/;

    ^ ^
    The "+"es make no difference here.

    > print $line;
    > }


    That can be simplified to

    /\sGO:0009986\s/ and print while <>;

    Anno
    Anno Siegel, Jun 11, 2004
    #6
  7. Jake Gottlieb wrote:
    > Here is my code. I am sure its wrong,


    Please be more specific about the problem. You'd better study the
    posting guidelines for this group:

    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

    > and would be greatful if someone could correct and complete it. I
    > would like to extract lines from the original code, and put them
    > into another text file.


    Below please find a couple of comments. If you want to write something
    to another file, you should open that file for writing...

    > while (<file.txt>) {


    That does not open the file for reading. This does:

    open my $fh, '< file.txt' or die $!;
    while (<$fh>) {

    See

    perldoc -f open

    > $line = $_;
    > $yes = (index $line, 'GO:000');


    You should have

    use strict;
    use warnings;

    in the beginning of the program, and declare the variables you introduce:

    my $line = $_;
    my $yes = (index $line, 'GO:000');
    ----^^

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Jun 11, 2004
    #7
  8. Jake Gottlieb

    Tore Aursand Guest

    On Fri, 11 Jun 2004 00:35:59 -0700, Jake Gottlieb wrote:
    > while (<file.txt>) {


    That doesn't read from "file.txt". This one does (untested);

    open( FH, '<', 'file.txt' ) or die "$!\n";
    while ( <FH> ) {
    # ...
    }

    > $line = $_;
    > $yes = (index $line, 'GO:000');
    > if ($yes > -1) {
    > print "YES : $line";
    > }
    > if ($line =~ /ENSG\d+.\d\s+\S+\s+GO:\d{7}\s+\d+\s+/){
    > print "La GO! $line \n";
    > }
    > }


    If you are sure that you can match on 'GO:000', you're on the right track
    using 'index'. But you don't need any regular expressions (untested);

    open( FH, '<', 'file.txt' ) or die "$!\n";
    while ( <FH> ) {
    next unless ( index($_, 'GO:000') >= 0 );
    print;
    }
    close( FH );

    Also: Be sure to 'use strict' and 'use warnings' in your script(s).


    --
    Tore Aursand <>
    "Poor management can increase software costs more rapidly than any
    other factor." (Barry Boehm)
    Tore Aursand, Jun 11, 2004
    #8
  9. Jake Gottlieb

    John Bokma Guest

    John Bokma, Jun 11, 2004
    #9
  10. Jake Gottlieb

    Anno Siegel Guest

    John Bokma <> wrote in comp.lang.perl.misc:
    > Tore Aursand wrote:
    >
    > > next unless ( index($_, 'GO:000') >= 0 );

    >
    > index($_, 'GO:000') > -1 or next;


    1 + index $_, 'GO:000' or next;

    Anno
    Anno Siegel, Jun 11, 2004
    #10
  11. Anno Siegel wrote:

    > Peter Hickman <> wrote in comp.lang.perl.misc:
    >>while ( my $line = <> ) {
    >> next unless $line =~ m/\s+GO:0009986\s+/;

    > ^ ^
    > The "+"es make no difference here.


    Good catch.

    > That can be simplified to
    >
    > /\sGO:0009986\s/ and print while <>;


    Now you are just showing off ;-)
    Peter Hickman, Jun 11, 2004
    #11
  12. Jake Gottlieb

    Tore Aursand Guest

    On Fri, 11 Jun 2004 11:08:59 +0000, Anno Siegel wrote:
    >>> next unless ( index($_, 'GO:000') >= 0 );


    >> index($_, 'GO:000') > -1 or next;


    > 1 + index $_, 'GO:000' or next;


    While we're at it: How about keeping those two lines (the check and the
    print) on one line?

    while ( <> ) {
    index($_, 'GO:000') and print;
    }


    --
    Tore Aursand <>
    "People that think logically are a nice contrast to the real world."
    (Matt Biershbach)
    Tore Aursand, Jun 11, 2004
    #12
  13. Jake Gottlieb

    John Bokma Guest

    Tore Aursand wrote:

    > On Fri, 11 Jun 2004 11:08:59 +0000, Anno Siegel wrote:
    >
    >>>>next unless ( index($_, 'GO:000') >= 0 );

    >
    >
    >>>index($_, 'GO:000') > -1 or next;

    >
    >
    >>1 + index $_, 'GO:000' or next;

    >
    >
    > While we're at it: How about keeping those two lines (the check and the
    > print) on one line?
    >
    > while ( <> ) {
    > index($_, 'GO:000') and print;
    > }


    what if $_ is 'GO:000' ?

    --
    John MexIT: http://johnbokma.com/mexit/
    personal page: http://johnbokma.com/
    Experienced Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
    John Bokma, Jun 11, 2004
    #13
  14. Jake Gottlieb

    J. Romano Guest

    (Jake Gottlieb) wrote in message news:<>...

    > I am trying to extract lines with:
    >
    > GO:0009986
    >
    > out of:
    >
    > ENSG00000113494.3 AAA60174.1 GO:0009123 5618 216638_s_at
    > ENSG00000113494.3 AAD32032.1 GO:0009345 5618 216638_s_at
    > ENSG00000113494.3 AAK32703.1 GO:0009764 5618 216638_s_at
    > ENSG00000113494.3 AAH59392.1 GO:0009986 5618 216638_s_at

    <snip>

    If all you want is to print out the lines that contain
    "GO:0009986", you can just use the "grep" command (if you happen to be
    on UNIX):

    grep "GO:0009986" file.txt

    If you really want to use Perl for this task, you can use a one-liner
    that's almost as simple:

    perl -ne "print if /GO:0009986/" file.txt

    If that looks confusing to you, let me offer a short explanation:

    The -ne switch tells perl to run the "print if /GO:0009986/"
    command on every line of file.txt (with $_ as the current line).
    Since the "print" statement has no arguments, it defaults to $_.
    Therefore, the above line is identical to:

    perl -ne '$line = $_; print $line if /GO:0009986/' file.txt

    which means that, for every line, that line will only get printed if
    the string "GO:0009986" is found in that line.

    But it's not entirely clear to me if you wanted to search for the
    exact string "GO:0009986" or just any string that matched "GO:" and
    any seven digits. If the latter is the case, you can use the
    following:

    On Unix:
    perl -ne 'print if /GO:\d{7}/' file.txt
    On Win32:
    perl -ne "print if /GO:[0-9]{7}/" file.txt

    I hope this helps!

    -- Jean-Luc
    J. Romano, Jun 11, 2004
    #14
  15. Jake Gottlieb

    John Bokma Guest

    John Bokma, Jun 11, 2004
    #15
  16. Tore Aursand <> wrote in message news:<>...
    > On Fri, 11 Jun 2004 11:08:59 +0000, Anno Siegel wrote:
    > >>> next unless ( index($_, 'GO:000') >= 0 );

    >
    > >> index($_, 'GO:000') > -1 or next;

    >
    > > 1 + index $_, 'GO:000' or next;

    >
    > While we're at it: How about keeping those two lines (the check and the
    > print) on one line?
    >
    > while ( <> ) {
    > index($_, 'GO:000') and print;
    > }


    Thank you all. What is the command to save it to a text file. Thanks again.
    Jake Gottlieb, Jun 11, 2004
    #16
  17. John Bokma <> wrote:
    > J. Romano wrote:
    >
    >> If all you want is to print out the lines that contain
    >> "GO:0009986", you can just use the "grep" command (if you happen to be
    >> on UNIX):
    >>
    >> grep "GO:0009986" file.txt

    >
    > the grep family is available on Windows, and many more OSes.



    Perl runs lots of places.

    Do grep(1) in perl.

    perl -ne 'print if /GO:0009986/' file.txt


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Jun 12, 2004
    #17
  18. Jake Gottlieb

    Anno Siegel Guest

    Jake Gottlieb <> wrote in comp.lang.perl.misc:
    > Tore Aursand <> wrote in message
    > news:<>...
    > > On Fri, 11 Jun 2004 11:08:59 +0000, Anno Siegel wrote:
    > > >>> next unless ( index($_, 'GO:000') >= 0 );

    > >
    > > >> index($_, 'GO:000') > -1 or next;

    > >
    > > > 1 + index $_, 'GO:000' or next;

    > >
    > > While we're at it: How about keeping those two lines (the check and the
    > > print) on one line?
    > >
    > > while ( <> ) {
    > > index($_, 'GO:000') and print;
    > > }

    >
    > Thank you all. What is the command to save it to a text file. Thanks again.


    Read up on it. perldoc -f open, perldoc -f print.

    Anno
    Anno Siegel, Jun 12, 2004
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. kunal
    Replies:
    0
    Views:
    470
    kunal
    Oct 15, 2005
  2. Bubbles
    Replies:
    0
    Views:
    427
    Bubbles
    Mar 3, 2004
  3. kunal
    Replies:
    0
    Views:
    344
    kunal
    Oct 15, 2005
  4. =?Utf-8?B?S2V2aW4gSw==?=
    Replies:
    2
    Views:
    2,875
    =?Utf-8?B?S2V2aW4gSw==?=
    Apr 6, 2006
  5. John Davison
    Replies:
    1
    Views:
    571
    Hal Rosser
    Jul 7, 2004
Loading...

Share This Page