string capture regex

Discussion in 'Perl Misc' started by Cheez, Jan 7, 2004.

  1. Cheez

    Cheez Guest

    Howdy, newbie to Perl. I want to make a regex that will process a
    particular line of text from a large flatfile:

    >gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]


    I want the regex to:
    1. capture the 7 digit number that always follows >gi|
    2. then associate that number (in a hash?) with the "words"
    Hypothetical, ORF, Yal069wp. These "words" always follow the
    "NP_009331.1|" format and end before the "[SC]".

    I am little overwhelmed by all the m// and s/// modifiers. Any nudge
    in the right direction about developing a regex would be greatly
    appreciated.

    I will post my code but it's really lame!

    Thanks,
    Cheez

    =====================
    $flatfile = "I.faa";

    open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n";

    @test2 = <FILE>;

    close(FILE);

    foreach (@test2) {

    chomp;

    $_ =~ s/\W/ /g; # getting rid of non word chunks..not sure it helps

    push @newtest, split(/ /);

    }

    open (FILE, ">parsed.txt") || die "Can't open '$parsed': $!\n";

    print FILE "$_\n" for @newtest;

    close(FILE);

    print scalar(@newtest); # checking that the array is populated
     
    Cheez, Jan 7, 2004
    #1
    1. Advertising

  2. Cheez

    Sam Holden Guest

    On 6 Jan 2004 16:20:29 -0800, Cheez <> wrote:
    > Howdy, newbie to Perl. I want to make a regex that will process a
    > particular line of text from a large flatfile:
    >
    >>gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

    >
    > I want the regex to:
    > 1. capture the 7 digit number that always follows >gi|
    > 2. then associate that number (in a hash?) with the "words"
    > Hypothetical, ORF, Yal069wp. These "words" always follow the
    > "NP_009331.1|" format and end before the "[SC]".


    my %hash;
    while (<>) {
    chomp;
    my (undef, $number, undef, undef, $words) = split /\|/;
    $words=~s/\s*\[SC\]$//;
    $hash{$number} = $words;
    }


    >
    > I am little overwhelmed by all the m// and s/// modifiers. Any nudge
    > in the right direction about developing a regex would be greatly
    > appreciated.


    Just use split (which does use a regex but a very simple one).

    [snip code]

    --
    Sam Holden
     
    Sam Holden, Jan 7, 2004
    #2
    1. Advertising

  3. Cheez

    Matt Garrish Guest

    "Cheez" <> wrote in message
    news:...
    > Howdy, newbie to Perl.


    You're best not asking the newbie for help. He just doesn't get it... : )

    Matt
     
    Matt Garrish, Jan 7, 2004
    #3
  4. Cheez wrote:
    > Howdy, newbie to Perl. I want to make a regex that will process a
    > particular line of text from a large flatfile:
    >
    >>gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

    >
    > I want the regex to:
    > 1. capture the 7 digit number that always follows >gi|
    > 2. then associate that number (in a hash?) with the "words"
    > Hypothetical, ORF, Yal069wp. These "words" always follow the
    > "NP_009331.1|" format and end before the "[SC]".


    <snip>

    > $flatfile = "I.faa";
    >
    > open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n";


    Yet another variant - from here you might want to do something like:

    my %hash = ();
    while (<FILE>) {
    if ( /^gi\|(\d+)\S+\s+(\w+)\s+(\w+);\s+(\w+)/ ) {
    $hash{$1} = [ $2, $3, $4 ];
    }
    }
    close FILE;

    use Data::Dumper;
    print Dumper %hash;

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jan 7, 2004
    #4
  5. Cheez

    Cheez Guest

    I am really thankful for all of these suggestions. I will try some of
    these regexes and get back to you all. Looking at a regex that works
    helps me to work backwards (deconstuctionist?) to see *why* it worked.
    This will be very helpful for not only this task but many more in the
    future.

    Thanks again everyone,
    Cheez

    (Cheez) wrote in message news:<>...
    > Howdy, newbie to Perl. I want to make a regex that will process a
    > particular line of text from a large flatfile:

    [snip]
     
    Cheez, Jan 7, 2004
    #5
  6. Cheez

    Kris Jenkins Guest

    Sam Holden wrote:
    > On 6 Jan 2004 16:20:29 -0800, Cheez <> wrote:
    >
    >>Howdy, newbie to Perl. I want to make a regex that will process a
    >>particular line of text from a large flatfile:
    >>
    >>
    >>>gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

    >>
    >>I want the regex to:
    >>1. capture the 7 digit number that always follows >gi|
    >>2. then associate that number (in a hash?) with the "words"
    >>Hypothetical, ORF, Yal069wp. These "words" always follow the
    >>"NP_009331.1|" format and end before the "[SC]".

    >
    >
    > my %hash;
    > while (<>) {
    > chomp;
    > my (undef, $number, undef, undef, $words) = split /\|/;
    > $words=~s/\s*\[SC\]$//;
    > $hash{$number} = $words;
    > }


    Just as another option, you could replace:

    my (undef, $number, undef, undef, $words) = split /\|/;

    With:

    my ( $number, $words) = ( split /\|/ )[1,4];

    Cheers,
    Kris
     
    Kris Jenkins, Jan 8, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mladen Adamovic
    Replies:
    0
    Views:
    744
    Mladen Adamovic
    Dec 4, 2003
  2. Mladen Adamovic
    Replies:
    3
    Views:
    14,625
    Mladen Adamovic
    Dec 5, 2003
  3. Max
    Replies:
    7
    Views:
    9,127
  4. Replies:
    3
    Views:
    775
    Reedick, Andrew
    Jul 1, 2008
  5. Ruby Newbee

    regex =~ string or string =~ regex?

    Ruby Newbee, Jan 4, 2010, in forum: Ruby
    Replies:
    3
    Views:
    134
    Kirk Haines
    Jan 4, 2010
Loading...

Share This Page