variable that I want to treat as information read from a txt file, using a while loop

Discussion in 'Perl Misc' started by bekijkfotos@gmail.com, May 17, 2006.

  1. Guest

    Dear newsgroup,

    I have this (typically newbie?) question.


    If you want to read the file "input.txt" and write it (filtered for the
    expression "<expression>") to "output.txt" you could use this code.


    input.txt:
    --
    <expression>start</expression>
    <expression>end</expression>
    --

    output.txt:
    --
    start
    end
    --

    script.pl:
    ---
    open (INPUT, "input.txt") or die "Can't open data file: $!";

    open(OFile, ">output.txt");

    while (<INPUT>) {

    if ( /\<expression>/ ) {
    /<expression>(.*)<\/expression>/;
    print OFile "$1 \n";

    print OFile "$_ \n";

    }
    ---

    However, I have a variable called @inputtext, that has all the
    information from input.txt in it. (e.g. if I would print @inputtext to
    a file, that file would be a copy of input.txt )

    And the while loop should work on the contents of @inputtext, line by
    line.
    Unfortunately,
    while ( @inputtext ) {

    or

    foreach(@results) {

    doesn't work.

    How can I make it working?


    with kind regards,

    Jaap
    , May 17, 2006
    #1
    1. Advertising

  2. Guest

    I noticed a mistake in my perl script. It should read:

    script.pl:
    ---
    open (INPUT, "input.txt") or die "Can't open data file: $!";


    open(OFile, ">output.txt");


    while (<INPUT>) {


    if ( /\<expression>/ ) {
    /<expression>(.*)<\/expression>/;
    print OFile "$1 \n";

    }
    }

    close OFile;

    close INPUT;
    --

    Jaap
    , May 17, 2006
    #2
    1. Advertising

  3. Paul Lalli Guest

    wrote:
    > Dear newsgroup,
    >
    > I have this (typically newbie?) question.
    >
    >
    > If you want to read the file "input.txt" and write it (filtered for the
    > expression "<expression>") to "output.txt" you could use this code.
    >
    >
    > input.txt:
    > --
    > <expression>start</expression>
    > <expression>end</expression>
    > --
    >
    > output.txt:
    > --
    > start
    > end
    > --
    >
    > script.pl:
    > ---


    You have forgotten the shebang, use strict, and use warnings.

    > open (INPUT, "input.txt") or die "Can't open data file: $!";


    Use lexical filehandles rather than global barewords, and the
    three-argument form of open:

    open my $INPUT, '<', 'input.txt' or die "Can't open data file: $!";

    > open(OFile, ">output.txt");


    open my $OFile, '>', 'output.txt' or die "Can't open output file: $!";


    > while (<INPUT>) {
    >
    > if ( /\<expression>/ ) {


    < is not special. No need to escape it.

    > /<expression>(.*)<\/expression>/;


    Ew. Why are you matching twice?

    > print OFile "$1 \n";


    Worse, why are you using $1 without verifying that the second pattern
    match succeeded?

    if (/<expression>(.*)<\/expression>/){
    print $OFile "$1 \n";
    }

    > print OFile "$_ \n";


    So you're printing both the thing you captured and the entire line as
    well? Why?

    >
    > }
    > ---
    >
    > However, I have a variable called @inputtext, that has all the
    > information from input.txt in it. (e.g. if I would print @inputtext to
    > a file, that file would be a copy of input.txt )


    And how have you determined that? Not that I don't believe you or
    anything, I would just like to see the code that you seem to be
    assuming works as you desire.

    > And the while loop should work on the contents of @inputtext, line by
    > line.


    No it shouldn't. You misunderstand while loops. while() simply
    executes so long as the condition is true. When you say
    while (<$INFILE>) {
    There's some special magic going on there. Perl automatically
    translates that to:
    while ( defined ($_ = <$INFILE>) ) {

    That is, read a line from $INFILE, and put it in $_. Then check to
    make sure that $_ is actually defined. If that is true, then do the
    loop block.

    > Unfortunately,
    > while ( @inputtext ) {


    This simply says "while @inputtext is a true value". @inputtext is
    being evaluated in a scalar context, which means it returns its size.
    So as long as @inputtext is not empty, this while loop will be true.
    Since you're never changing the contents of @inputtext, this is an
    infinite loop. Further, no where did you ever assign anything to $_.

    > or
    >
    > foreach(@results) {


    Where the hell did @results come from? What happened to @inputtext?

    > doesn't work.


    "doesn't work" is the worst of all possible error descriptions. How
    did it not work? Syntax error? Run-time error? Infinite loop?
    Segmentation fault? Wrong output? No output?

    > How can I make it working?


    By not just making stuff up and expecting it to work correctly. You
    need to read some very basic Perl documentation, rather than just
    taking existing code, changing random bits, and expecting it to do what
    you want.

    perldoc perlintro
    perldoc perldata
    perldoc perlsyn
    are good places for you to start.

    FWIW,

    foreach (@inputtext) {

    should indeed put each line of @inputtext into $_.

    Paul Lalli
    Paul Lalli, May 17, 2006
    #3
  4. wrote:
    > Dear newsgroup,
    >
    > I have this (typically newbie?) question.
    >
    >
    > If you want to read the file "input.txt" and write it (filtered for the
    > expression "<expression>") to "output.txt" you could use this code.
    >
    >
    > input.txt:
    > --
    > <expression>start</expression>
    > <expression>end</expression>
    > --
    >
    > output.txt:
    > --
    > start
    > end
    > --
    >
    > script.pl:
    > ---
    > open (INPUT, "input.txt") or die "Can't open data file: $!";
    >
    > open(OFile, ">output.txt");
    >
    > while (<INPUT>) {
    >
    > if ( /\<expression>/ ) {
    > /<expression>(.*)<\/expression>/;
    > print OFile "$1 \n";
    >
    > print OFile "$_ \n";
    >
    > }
    > ---
    >
    > However, I have a variable called @inputtext, that has all the
    > information from input.txt in it. (e.g. if I would print @inputtext to
    > a file, that file would be a copy of input.txt )
    >
    > And the while loop should work on the contents of @inputtext, line by
    > line.
    > Unfortunately,
    > while ( @inputtext ) {


    in scalar context, an array will equal the number of elements in it. so
    this is an infinite loop.

    >
    > or
    >
    > foreach(@results) {
    >
    > doesn't work.


    this is not surprising since never before this have you mentioned
    @results.

    at the top of your code, underneath the shebang line, write:

    use strict; use warnings;

    >
    > How can I make it working?


    here's an example: you can change the regex to suit your needs, as well
    as open a file for writing if you desire. this isn't that elegant
    considering i'm running the data through the regex twice, but when i
    applied a single map without the grep, i got an extra blank line.

    use strict;
    use warnings;

    my @input = qw(<hi>start</hi> <hi>end</hi> end2);

    my $pat = qr/<hi>(.*)<\/hi>/;
    my @res = map { /$pat/ and $1 } grep { /$pat/ } @input;

    print "$_\n" for @res;
    it_says_BALLS_on_your forehead, May 17, 2006
    #4
  5. David Squire Guest

    Re: variable that I want to treat as information read from a txtfile, using a while loop

    wrote:
    > Dear newsgroup,
    >
    > I have this (typically newbie?) question.
    >
    >
    > If you want to read the file "input.txt" and write it (filtered for the
    > expression "<expression>") to "output.txt" you could use this code.
    >
    >
    > input.txt:
    > --
    > <expression>start</expression>
    > <expression>end</expression>
    > --
    >
    > output.txt:
    > --
    > start
    > end
    > --
    >
    > script.pl:
    > ---


    missing:

    #!/usr/bin/perl
    use strict;
    use warnings;

    > open (INPUT, "input.txt") or die "Can't open data file: $!";
    >
    > open(OFile, ">output.txt");


    The use of bare-word file handles is discouraged. Use lexical filehandle
    (references), e.g.

    open my $OFile, '>', 'output.txt' or die "Could not open output.txt for
    writing:$!";

    Note also the recommended three argument form of open... and why didn't
    you check for success with this one?
    >
    > while (<INPUT>) {
    >
    > if ( /\<expression>/ ) {
    > /<expression>(.*)<\/expression>/;
    > print OFile "$1 \n";

    missing }

    Please *always* post by cutting and pasting real scripts that you have
    compiled and tested. This one would not compile.
    >
    > print OFile "$_ \n";
    >
    > }
    > ---
    >
    > However, I have a variable called @inputtext, that has all the
    > information from input.txt in it. (e.g. if I would print @inputtext to
    > a file, that file would be a copy of input.txt )
    >
    > And the while loop should work on the contents of @inputtext, line by
    > line.
    > Unfortunately,
    > while ( @inputtext ) {



    Read the manual! perldoc perlsyn (section on Compound Statements)
    while does not take a list argument, and set $_, it takes an expression.
    >
    > or
    >
    > foreach(@results) {
    >
    > doesn't work.
    >


    Interesting. In what way does this not work as you expect?

    ----
    #!/usr/bin/perl
    use strict;
    use warnings;

    my @data = ("aAa\n", "bAc\n", "aCd\n", "dAb\n", "bBc\n");

    foreach (@data) {
    print if /^.A/;
    }
    ----

    produces:

    ----
    aAa
    bAc
    dAb
    ----

    .... sure looks like it works to me.

    If you don't post real scripts with real data and error descriptions,
    it's hard to help.

    DS
    David Squire, May 17, 2006
    #5
  6. wrote:

    > open (INPUT, "input.txt") or die "Can't open data file: $!";
    >
    > open(OFile, ">output.txt");
    >
    > while (<INPUT>) {


    > However, I have a variable called @inputtext, that has all the
    > information from input.txt in it. (e.g. if I would print @inputtext to
    > a file, that file would be a copy of input.txt )
    >
    > And the while loop should work on the contents of @inputtext, line by
    > line.
    > Unfortunately,
    > while ( @inputtext ) {
    >
    > or
    >
    > foreach(@results) {
    >
    > doesn't work.
    >
    > How can I make it working?


    Trick question?

    foreach(@inputtext) {
    Brian McCauley, May 17, 2006
    #6
  7. Paul Lalli Guest

    Ferry Bolhar wrote:
    > David Squire:
    >
    > >> open(OFile, ">output.txt");

    > >
    > > The use of bare-word file handles is discouraged.

    >
    > Since when?


    Perl Version 5.6

    Paul Lalli
    Paul Lalli, May 18, 2006
    #7
  8. David Squire Guest

    Re: variable that I want to treat as information read from a txtfile, using a while loop

    Ferry Bolhar wrote:
    > David Squire:
    >
    >>> open(OFile, ">output.txt");

    >> The use of bare-word file handles is discouraged.

    >
    > Since when?


    Since lexical filehandles have been supported (5.6.0, I think). It gives
    you filehandles with limited scope, and lets 'use strict;' help you will
    typos.

    It might have been clearer to say "discouraged by most posters in this
    group".

    >
    >> Use lexical filehandle (references),

    >
    > ..assuming you're running a version of Perl which supports them.


    Indeed. It's not as if it's that recent a change.

    DS
    David Squire, May 18, 2006
    #8
  9. Guest

    Thank you all very much for your help!

    I'm sorry I didn't post my actual perl scripts, because I thought that
    the examples I gave were sufficient to point out my problem. I will
    post the real scripts I made in this post.

    Generally, I'm trying to modify an existing perl script (PREPv1-0.pl by
    Christopher M. Frenz) which is designed to do a PubMed database search
    using the command prompt. This script generates a html page with a
    description of all results from a certain PubMed query.
    I don't want a html page, I want a text-database with only relevant
    (for me) information. (year, journal, title, authors).
    My current problem is that my filter in the foreach loop is only
    carried out once, even if there are more lines that match the query (In
    my example below it only gives me one author, whereas it should give me
    more authors). A corresponding while loop in another script does it
    correctly. But then I have to create a text file and run a seperate
    script on that text file, while I want to perform all necessary
    actions in one script.

    With kind regards,

    Jaap

    my files:
    I run it in windows 2000 (Activeperl 5.8.8)
    c:\perl\bin\perl grabPubmed.pl van ingen jansen
    (This uses the query "van Ingen Jansen ", which results in one hit on
    PubMed)

    grabPubmed.pl
    --
    #c:\perl\bin\perl

    use strict;
    use warnings;



    # PREP (Perl RegExps for Pubmed) is a script that allows the use of
    # Perl regexs in the searching of Pubmed records, providing the ability
    to search
    # records for textual patterns as well as keywords

    # Copyright 2005- Christopher M. Frenz
    # This script is free sofware it may be used, copied, redistributed,
    and/or modified
    # under the terms laid forth in the Perl Artisic License

    # Please cite this script in any publication in which literature cited
    within the
    # publication was located using the PREP.pl script.

    # Usage: perl PREPv1-0.pl PubmedQueryTerms

    # Usage of this script requires the LWP and XML::LibXML modules are
    installed
    use LWP;
    use XML::LibXML; #Version 1.58 used for development and testing


    my $request;
    my $response;
    my $query;

    # Concatenates arguments passed to script to form Pubmed query
    $query=join(" ", @ARGV);

    # Creates the URL to search Pubmed
    my $baseurl="http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?";
    my $url=$baseurl . "db=Pubmed&retmax=1&usehistory=y&term=" . $query;


    # Searches Pubmed and Returns the number of results
    # as well as the session information needed for results retrieval
    $request=LWP::UserAgent->new();
    $response=$request->get($url);
    my $results= $response->content;
    die unless $response->is_success;
    print "PubMed Search Results \n";

    $results=~/<Count>(\d+)<\/Count>/;
    my $NumAbstracts=$1;
    $results=~/<QueryKey>(\d+)<\/QueryKey>/;
    my $QueryKey=$1;
    $results=~/<WebEnv>(.*?)<\/WebEnv>/;
    my $WebEnv=$1;


    print "$NumAbstracts are Available \n";

    my $parser=XML::LibXML->new;

    my $retmax=500; #Number of records to be retrieved per request-Max 500
    my $retstart=0; #Record number to start retreival from

    # Creates the URL needed to retrieve results
    $baseurl="http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?";
    my
    $url2="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=";

    my $Count=0;
    # Retreives results in XML format
    for($retstart=0;$retstart<=$NumAbstracts;$retstart+=$retmax){
    print "Processing record # $retstart \n";
    $url=$baseurl .
    "rettype=abstract&retmode=xml&retstart=$retstart&retmax=$retmax&db=Pubmed&query_key=$QueryKey&WebEnv=$WebEnv";

    $response=$request->get($url);
    $results=$response->content;
    die unless $response->is_success;

    }

    open my $OFile, '>', 'output.txt' or die "Can't open output file: $!";


    my $tracker = 0; # The tag "Year" occurs more times in the xml file,
    therefore I only want to read the Year-line beneath the PubDate tag.

    foreach ($results){
    next if /^#/; # skip comments
    next if /^\s*$/; # skip empty lines
    chomp; # remove line terminator



    if ( /<PMID>/ ) {
    /<PMID>(.*)<\/PMID>/;
    print $OFile "$1 \n";

    }

    if ( /<PubDate>/ ) {
    $tracker = 1;
    }

    if ( /<Year>/ ) {
    if ($tracker == 1) {
    /<Year>(.*)<\/Year>/;
    print $OFile "$1 \n";
    $tracker = 0;
    }
    }
    if ( /<Title>/ ) {
    /<Title>(.*)<\/Title>/;
    print $OFile "$1 \n";
    }
    if ( /<ArticleTitle>/ ) {
    /<ArticleTitle>(.*)<\/ArticleTitle>/;
    print $OFile "$1 \n";
    }
    if ( /<LastName>/ ) {
    /<LastName>(.*)<\/LastName>/;
    print $OFile "$1 \n";

    }

    }

    close $OFile;

    --

    The output file is not complete, it doesn't list all the authors.

    output.txt
    --
    14705930
    2004
    Biochemistry.
    Extension of the binding motif of the Sin3 interacting domain of the
    Mad family proteins.
    van Ingen
    --

    When I then write the XML ($results) to a file:

    grabPubmed_full.pl
    --
    #c:\perl\bin\perl

    use strict;
    use warnings;



    # PREP (Perl RegExps for Pubmed) is a script that allows the use of
    # Perl regexs in the searching of Pubmed records, providing the ability
    to search
    # records for textual patterns as well as keywords

    # Copyright 2005- Christopher M. Frenz
    # This script is free sofware it may be used, copied, redistributed,
    and/or modified
    # under the terms laid forth in the Perl Artisic License

    # Please cite this script in any publication in which literature cited
    within the
    # publication was located using the PREP.pl script.

    # Usage: perl PREPv1-0.pl PubmedQueryTerms

    # Usage of this script requires the LWP and XML::LibXML modules are
    installed
    use LWP;
    use XML::LibXML; #Version 1.58 used for development and testing


    my $request;
    my $response;
    my $query;

    # Concatenates arguments passed to script to form Pubmed query
    $query=join(" ", @ARGV);

    # Creates the URL to search Pubmed
    my $baseurl="http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?";
    my $url=$baseurl . "db=Pubmed&retmax=1&usehistory=y&term=" . $query;


    # Searches Pubmed and Returns the number of results
    # as well as the session information needed for results retrieval
    $request=LWP::UserAgent->new();
    $response=$request->get($url);
    my $results= $response->content;
    die unless $response->is_success;
    print "PubMed Search Results \n";

    $results=~/<Count>(\d+)<\/Count>/;
    my $NumAbstracts=$1;
    $results=~/<QueryKey>(\d+)<\/QueryKey>/;
    my $QueryKey=$1;
    $results=~/<WebEnv>(.*?)<\/WebEnv>/;
    my $WebEnv=$1;


    print "$NumAbstracts are Available \n";

    my $parser=XML::LibXML->new;

    my $retmax=500; #Number of records to be retrieved per request-Max 500
    my $retstart=0; #Record number to start retreival from

    # Creates the URL needed to retrieve results
    $baseurl="http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?";
    my
    $url2="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=";

    my $Count=0;
    # Retreives results in XML format
    for($retstart=0;$retstart<=$NumAbstracts;$retstart+=$retmax){
    print "Processing record # $retstart \n";
    $url=$baseurl .
    "rettype=abstract&retmode=xml&retstart=$retstart&retmax=$retmax&db=Pubmed&query_key=$QueryKey&WebEnv=$WebEnv";

    $response=$request->get($url);
    $results=$response->content;
    die unless $response->is_success;

    }

    open my $OFile, '>', 'output_full.txt' or die "Can't open output file:
    $!";

    print $OFile $results;


    close $OFile;

    --

    resulting in this file:

    output_full.txt
    --
    <?xml version="1.0"?>
    <!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st
    January 2006//EN"
    "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_060101.dtd">
    <PubmedArticleSet>
    <PubmedArticle>
    <MedlineCitation Owner="NLM" Status="MEDLINE">
    <PMID>14705930</PMID>
    <DateCreated>
    <Year>2004</Year>
    <Month>01</Month>
    <Day>06</Day>
    </DateCreated>
    <DateCompleted>
    <Year>2004</Year>
    <Month>05</Month>
    <Day>12</Day>
    </DateCompleted>
    <DateRevised>
    <Year>2005</Year>
    <Month>11</Month>
    <Day>17</Day>
    </DateRevised>
    <Article PubModel="Print">
    <Journal>
    <ISSN IssnType="Print">0006-2960</ISSN>
    <JournalIssue CitedMedium="Print">
    <Volume>43</Volume>
    <Issue>1</Issue>
    <PubDate>
    <Year>2004</Year>
    <Month>Jan</Month>
    <Day>13</Day>
    </PubDate>
    </JournalIssue>
    <Title>Biochemistry. </Title>
    <ISOAbbreviation>Biochemistry</ISOAbbreviation>
    </Journal>
    <ArticleTitle>Extension of the binding motif of the Sin3
    interacting domain of the Mad family proteins.</ArticleTitle>
    <Pagination>
    <MedlinePgn>46-54</MedlinePgn>
    </Pagination>
    <Abstract>
    <AbstractText>Sin3 forms the scaffold for a
    multiprotein corepressor complex that silences transcription via the
    action of histone deacetylases. Sin3 is recruited to the DNA by several
    DNA binding repressors, such as the helix-loop-helix proteins of the
    Mad family. Here, we elaborate on the Mad-Sin3 interaction based on a
    binding study, solution structure, and dynamics of the PAH2 domain of
    mSin3 in complex to an extended Sin3 interacting domain (SID) of 24
    residues of Mad1. We show that SID residues Met7 and Glu23, outside the
    previously defined minimal binding motif, mediate additional
    hydrophobic and electrostatic interactions with PAH2. On the basis of
    these results we propose an extended consensus sequence describing the
    PAH2-SID interaction specifically for the Mad family, showing that
    residues outside the hydrophobic core of the SID interact with PAH2 and
    modulate binding affinity to appropriate levels.</AbstractText>
    </Abstract>
    <Affiliation>Departments of Biophysical Chemistry and
    Molecular Biology, NSRIM Center, University of Nijmegen, Toernooiveld
    1, 6525 ED Nijmegen, The Netherlands.</Affiliation>
    <AuthorList CompleteYN="Y">
    <Author ValidYN="Y">
    <LastName>van Ingen</LastName>
    <ForeName>Hugo</ForeName>
    <Initials>H</Initials>
    </Author>
    <Author ValidYN="Y">
    <LastName>Lasonder</LastName>
    <ForeName>Edwin</ForeName>
    <Initials>E</Initials>
    </Author>
    <Author ValidYN="Y">
    <LastName>Jansen</LastName>
    <ForeName>Jacobus F A</ForeName>
    <Initials>JF</Initials>
    </Author>
    <Author ValidYN="Y">
    <LastName>Kaan</LastName>
    <ForeName>Anita M</ForeName>
    <Initials>AM</Initials>
    </Author>
    <Author ValidYN="Y">
    <LastName>Spronk</LastName>
    <ForeName>Christian A E M</ForeName>
    <Initials>CA</Initials>
    </Author>
    <Author ValidYN="Y">
    <LastName>Stunnenberg</LastName>
    <ForeName>Henk G</ForeName>
    <Initials>HG</Initials>
    </Author>
    <Author ValidYN="Y">
    <LastName>Vuister</LastName>
    <ForeName>Geerten W</ForeName>
    <Initials>GW</Initials>
    </Author>
    </AuthorList>
    <Language>eng</Language>
    <DataBankList CompleteYN="Y">
    <DataBank>
    <DataBankName>PDB</DataBankName>
    <AccessionNumberList>
    <AccessionNumber>1PD7</AccessionNumber>
    </AccessionNumberList>
    </DataBank>
    </DataBankList>
    <PublicationTypeList>
    <PublicationType>Journal Article</PublicationType>
    </PublicationTypeList>
    </Article>
    <MedlineJournalInfo>
    <Country>United States</Country>
    <MedlineTA>Biochemistry</MedlineTA>
    <NlmUniqueID>0370623</NlmUniqueID>
    </MedlineJournalInfo>
    <ChemicalList>
    <Chemical>
    <RegistryNumber>0</RegistryNumber>
    <NameOfSubstance>Basic Helix-Loop-Helix Leucine Zipper
    Transcription Factors</NameOfSubstance>
    </Chemical>
    <Chemical>
    <RegistryNumber>0</RegistryNumber>
    <NameOfSubstance>Caenorhabditis elegans
    Proteins</NameOfSubstance>
    </Chemical>
    <Chemical>
    <RegistryNumber>0</RegistryNumber>
    <NameOfSubstance>DNA-Binding Proteins</NameOfSubstance>
    </Chemical>
    <Chemical>
    <RegistryNumber>0</RegistryNumber>
    <NameOfSubstance>Fungal Proteins</NameOfSubstance>
    </Chemical>
    <Chemical>
    <RegistryNumber>0</RegistryNumber>
    <NameOfSubstance>MXD1 protein, human</NameOfSubstance>
    </Chemical>
    <Chemical>
    <RegistryNumber>0</RegistryNumber>
    <NameOfSubstance>Membrane Proteins</NameOfSubstance>
    </Chemical>
    <Chemical>
    <RegistryNumber>0</RegistryNumber>
    <NameOfSubstance>PAH2 protein, Pichia
    angusta</NameOfSubstance>
    </Chemical>
    <Chemical>
    <RegistryNumber>0</RegistryNumber>
    <NameOfSubstance>Repressor Proteins</NameOfSubstance>
    </Chemical>
    <Chemical>
    <RegistryNumber>0</RegistryNumber>
    <NameOfSubstance>SID-1 protein, C
    elegans</NameOfSubstance>
    </Chemical>
    <Chemical>
    <RegistryNumber>0</RegistryNumber>
    <NameOfSubstance>SIN3 protein, S
    cerevisiae</NameOfSubstance>
    </Chemical>
    <Chemical>
    <RegistryNumber>0</RegistryNumber>
    <NameOfSubstance>Saccharomyces cerevisiae
    Proteins</NameOfSubstance>
    </Chemical>
    <Chemical>
    <RegistryNumber>0</RegistryNumber>
    <NameOfSubstance>Solutions</NameOfSubstance>
    </Chemical>
    <Chemical>
    <RegistryNumber>0</RegistryNumber>
    <NameOfSubstance>Transcription
    Factors</NameOfSubstance>
    </Chemical>
    </ChemicalList>
    <CitationSubset>IM</CitationSubset>
    <MeshHeadingList>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Amino Acid
    Motifs</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Amino Acid
    Sequence</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName
    MajorTopicYN="N">Animals</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Basic Helix-Loop-Helix
    Leucine Zipper Transcription Factors</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Caenorhabditis elegans
    Proteins</DescriptorName>
    <QualifierName
    MajorTopicYN="N">chemistry</QualifierName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Comparative
    Study</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Conserved
    Sequence</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Crystallography,
    X-Ray</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">DNA-Binding
    Proteins</DescriptorName>
    <QualifierName
    MajorTopicYN="Y">chemistry</QualifierName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Fungal
    Proteins</DescriptorName>
    <QualifierName
    MajorTopicYN="N">chemistry</QualifierName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName
    MajorTopicYN="N">Humans</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Membrane
    Proteins</DescriptorName>
    <QualifierName
    MajorTopicYN="N">chemistry</QualifierName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Molecular Sequence
    Data</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Multigene
    Family</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Nuclear Magnetic
    Resonance, Biomolecular</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Protein
    Binding</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Protein Structure,
    Tertiary</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="Y">Repressor
    Proteins</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Research Support,
    Non-U.S. Gov't</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="Y">Saccharomyces
    cerevisiae Proteins</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Sequence
    Alignment</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="Y">Sequence Homology,
    Amino Acid</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName
    MajorTopicYN="N">Solutions</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Surface Plasmon
    Resonance</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName
    MajorTopicYN="N">Thermodynamics</DescriptorName>
    </MeshHeading>
    <MeshHeading>
    <DescriptorName MajorTopicYN="N">Transcription
    Factors</DescriptorName>
    <QualifierName
    MajorTopicYN="Y">chemistry</QualifierName>
    </MeshHeading>
    </MeshHeadingList>
    </MedlineCitation>
    <PubmedData>
    <History>
    <PubMedPubDate PubStatus="pubmed">
    <Year>2004</Year>
    <Month>1</Month>
    <Day>7</Day>
    <Hour>5</Hour>
    <Minute>0</Minute>
    </PubMedPubDate>
    <PubMedPubDate PubStatus="medline">
    <Year>2004</Year>
    <Month>5</Month>
    <Day>13</Day>
    <Hour>5</Hour>
    <Minute>0</Minute>
    </PubMedPubDate>
    </History>
    <PublicationStatus>ppublish</PublicationStatus>
    <ArticleIdList>
    <ArticleId IdType="pubmed">14705930</ArticleId>
    <ArticleId IdType="doi">10.1021/bi0355645</ArticleId>
    </ArticleIdList>
    </PubmedData>
    </PubmedArticle>



    </PubmedArticleSet>
    ---

    And then run another script:

    parseXML.pl
    --
    #c:\perl\bin\perl
    use strict;
    use warnings;



    open my $INPUT, '<', 'output_full.txt' or die "Can't open data file:
    $!";


    open my $OFile, '>', 'parsed_output.txt' or die "Can't open output
    file: $!";


    my $tracker = 0;

    #print OFile "$INPUT";

    while (<$INPUT>) {
    next if /^#/; # skip comments
    next if /^\s*$/; # skip empty lines
    chomp; # remove line terminator



    if ( /<PMID>/ ) {
    /<PMID>(.*)<\/PMID>/;
    print $OFile "$1 \n";

    }

    if ( /<PubDate>/ ) {
    $tracker = 1;
    }

    if ( /<Year>/ ) {
    if ($tracker == 1) {
    /<Year>(.*)<\/Year>/;
    print $OFile "$1 \n";
    $tracker = 0;
    }
    }
    if ( /<Title>/ ) {
    /<Title>(.*)<\/Title>/;
    print $OFile "$1 \n";
    }
    if ( /<ArticleTitle>/ ) {
    /<ArticleTitle>(.*)<\/ArticleTitle>/;
    print $OFile "$1 \n";
    }
    if ( /<LastName>/ ) {
    /<LastName>(.*)<\/LastName>/;
    print $OFile "$1 \n";

    }

    }


    close $OFile;

    close $INPUT;
    --

    I get what I want:

    parsed_output.txt
    --
    14705930
    2004
    Biochemistry.
    Extension of the binding motif of the Sin3 interacting domain of the
    Mad family proteins.
    van Ingen
    Lasonder
    Jansen
    Kaan
    Spronk
    Stunnenberg
    Vuister
    --
    , May 18, 2006
    #9
  10. Paul Lalli Guest

    wrote:
    > My current problem is that my filter in the foreach loop is only
    > carried out once, even if there are more lines that match the query


    <snip>

    > # Searches Pubmed and Returns the number of results
    > # as well as the session information needed for results retrieval
    > $request=LWP::UserAgent->new();
    > $response=$request->get($url);
    > my $results= $response->content;
    > die unless $response->is_success;
    > print "PubMed Search Results \n";


    <snip>

    > foreach ($results){


    Here's your problem. $results is a scalar. One variable. This loop
    says "for each element of the list containing ($results)" That list
    contains only one element. Your foreach loop is only executing once.
    It has nothing to do with any of the code inside of the for loop.

    You need to figure out what you actually want to iterate over. Maybe
    you want to split $results on newlines and iterate over each "line"
    inside $results? I don't know, because I'm not going to take the time
    to parse this massive program to figure out what you're actually trying
    to do.

    Paul Lalli
    Paul Lalli, May 18, 2006
    #10
  11. Guest

    Thank you very much!, I inserted the code:

    --
    my @array = split /\n / , $results;

    foreach (@array){
    --

    And now it works,

    Jaap
    , May 18, 2006
    #11
  12. Ferry Bolhar <> wrote:
    >> I noticed a mistake in my perl script. It should read:

    >
    > Which mistake?
    >
    > You might try this:
    >
    > $ perl -ne '/<expression>(.*)<\/expression>/ and print "$1\n"' input.txt
    >>output.txt



    That will fail if there are more than one expression elements on a line.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, May 18, 2006
    #12
  13. <> wrote:

    > I inserted the code:
    >
    > my @array = split /\n / , $results;
    >
    > foreach (@array){



    That is a horrid choice of name.

    The at-sign already means array, so naming it "array" does not
    add any useful information.

    You should choose *meaningful* named for your variables.

    @authors

    perhaps?


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, May 18, 2006
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven

    while loop in a while loop

    Steven, Mar 24, 2005, in forum: Java
    Replies:
    5
    Views:
    2,214
    Tim Slattery
    Mar 30, 2005
  2. Uday Bidkar
    Replies:
    4
    Views:
    474
    =?ISO-8859-15?Q?Juli=E1n?= Albo
    Dec 12, 2006
  3. Robin Wenger
    Replies:
    3
    Views:
    1,229
    Daniele Futtorovic
    Jan 25, 2011
  4. ela
    Replies:
    3
    Views:
    92
    J├╝rgen Exner
    Dec 8, 2008
  5. Isaac Won
    Replies:
    9
    Views:
    350
    Ulrich Eckhardt
    Mar 4, 2013
Loading...

Share This Page