variable that I want to treat as information read from a txt file, using a while loop

B

bekijkfotos

Dear newsgroup,

I have this (typically newbie?) question.


If you want to read the file "input.txt" and write it (filtered for the
expression "<expression>") to "output.txt" you could use this code.


input.txt:
--
<expression>start</expression>
<expression>end</expression>
--

output.txt:
--
start
end
--

script.pl:
---
open (INPUT, "input.txt") or die "Can't open data file: $!";

open(OFile, ">output.txt");

while (<INPUT>) {

if ( /\<expression>/ ) {
/<expression>(.*)<\/expression>/;
print OFile "$1 \n";

print OFile "$_ \n";

}
---

However, I have a variable called @inputtext, that has all the
information from input.txt in it. (e.g. if I would print @inputtext to
a file, that file would be a copy of input.txt )

And the while loop should work on the contents of @inputtext, line by
line.
Unfortunately,
while ( @inputtext ) {

or

foreach(@results) {

doesn't work.

How can I make it working?


with kind regards,

Jaap
 
B

bekijkfotos

I noticed a mistake in my perl script. It should read:

script.pl:
---
open (INPUT, "input.txt") or die "Can't open data file: $!";


open(OFile, ">output.txt");


while (<INPUT>) {


if ( /\<expression>/ ) {
/<expression>(.*)<\/expression>/;
print OFile "$1 \n";

}
}

close OFile;

close INPUT;
 
P

Paul Lalli

Dear newsgroup,

I have this (typically newbie?) question.


If you want to read the file "input.txt" and write it (filtered for the
expression "<expression>") to "output.txt" you could use this code.


input.txt:

You have forgotten the shebang, use strict, and use warnings.
open (INPUT, "input.txt") or die "Can't open data file: $!";

Use lexical filehandles rather than global barewords, and the
three-argument form of open:

open my $INPUT, '<', 'input.txt' or die "Can't open data file: $!";
open(OFile, ">output.txt");

open my $OFile, '>', 'output.txt' or die "Can't open output file: $!";

while (<INPUT>) {

if ( /\<expression>/ ) {

< is not special. No need to escape it.
/<expression>(.*)<\/expression>/;

Ew. Why are you matching twice?
print OFile "$1 \n";

Worse, why are you using $1 without verifying that the second pattern
match succeeded?

if (/<expression>(.*)<\/expression>/){
print $OFile "$1 \n";
}
print OFile "$_ \n";

So you're printing both the thing you captured and the entire line as
well? Why?
}
---

However, I have a variable called @inputtext, that has all the
information from input.txt in it. (e.g. if I would print @inputtext to
a file, that file would be a copy of input.txt )

And how have you determined that? Not that I don't believe you or
anything, I would just like to see the code that you seem to be
assuming works as you desire.
And the while loop should work on the contents of @inputtext, line by
line.

No it shouldn't. You misunderstand while loops. while() simply
executes so long as the condition is true. When you say
while (<$INFILE>) {
There's some special magic going on there. Perl automatically
translates that to:
while ( defined ($_ = <$INFILE>) ) {

That is, read a line from $INFILE, and put it in $_. Then check to
make sure that $_ is actually defined. If that is true, then do the
loop block.
Unfortunately,
while ( @inputtext ) {

This simply says "while @inputtext is a true value". @inputtext is
being evaluated in a scalar context, which means it returns its size.
So as long as @inputtext is not empty, this while loop will be true.
Since you're never changing the contents of @inputtext, this is an
infinite loop. Further, no where did you ever assign anything to $_.
or

foreach(@results) {

Where the hell did @results come from? What happened to @inputtext?
doesn't work.

"doesn't work" is the worst of all possible error descriptions. How
did it not work? Syntax error? Run-time error? Infinite loop?
Segmentation fault? Wrong output? No output?
How can I make it working?

By not just making stuff up and expecting it to work correctly. You
need to read some very basic Perl documentation, rather than just
taking existing code, changing random bits, and expecting it to do what
you want.

perldoc perlintro
perldoc perldata
perldoc perlsyn
are good places for you to start.

FWIW,

foreach (@inputtext) {

should indeed put each line of @inputtext into $_.

Paul Lalli
 
I

it_says_BALLS_on_your forehead

Dear newsgroup,

I have this (typically newbie?) question.


If you want to read the file "input.txt" and write it (filtered for the
expression "<expression>") to "output.txt" you could use this code.


input.txt:
--
<expression>start</expression>
<expression>end</expression>
--

output.txt:
--
start
end
--

script.pl:
---
open (INPUT, "input.txt") or die "Can't open data file: $!";

open(OFile, ">output.txt");

while (<INPUT>) {

if ( /\<expression>/ ) {
/<expression>(.*)<\/expression>/;
print OFile "$1 \n";

print OFile "$_ \n";

}
---

However, I have a variable called @inputtext, that has all the
information from input.txt in it. (e.g. if I would print @inputtext to
a file, that file would be a copy of input.txt )

And the while loop should work on the contents of @inputtext, line by
line.
Unfortunately,
while ( @inputtext ) {

in scalar context, an array will equal the number of elements in it. so
this is an infinite loop.
or

foreach(@results) {

doesn't work.

this is not surprising since never before this have you mentioned
@results.

at the top of your code, underneath the shebang line, write:

use strict; use warnings;
How can I make it working?

here's an example: you can change the regex to suit your needs, as well
as open a file for writing if you desire. this isn't that elegant
considering i'm running the data through the regex twice, but when i
applied a single map without the grep, i got an extra blank line.

use strict;
use warnings;

my @input = qw(<hi>start</hi> <hi>end</hi> end2);

my $pat = qr/<hi>(.*)<\/hi>/;
my @res = map { /$pat/ and $1 } grep { /$pat/ } @input;

print "$_\n" for @res;
 
D

David Squire

Dear newsgroup,

I have this (typically newbie?) question.


If you want to read the file "input.txt" and write it (filtered for the
expression "<expression>") to "output.txt" you could use this code.


input.txt:

missing:

#!/usr/bin/perl
use strict;
use warnings;
open (INPUT, "input.txt") or die "Can't open data file: $!";

open(OFile, ">output.txt");

The use of bare-word file handles is discouraged. Use lexical filehandle
(references), e.g.

open my $OFile, '>', 'output.txt' or die "Could not open output.txt for
writing:$!";

Note also the recommended three argument form of open... and why didn't
you check for success with this one?
while (<INPUT>) {

if ( /\<expression>/ ) {
/<expression>(.*)<\/expression>/;
print OFile "$1 \n";
missing }

Please *always* post by cutting and pasting real scripts that you have
compiled and tested. This one would not compile.
print OFile "$_ \n";

}
---

However, I have a variable called @inputtext, that has all the
information from input.txt in it. (e.g. if I would print @inputtext to
a file, that file would be a copy of input.txt )

And the while loop should work on the contents of @inputtext, line by
line.
Unfortunately,
while ( @inputtext ) {


Read the manual! perldoc perlsyn (section on Compound Statements)
while does not take a list argument, and set $_, it takes an expression.
or

foreach(@results) {

doesn't work.

Interesting. In what way does this not work as you expect?

----
#!/usr/bin/perl
use strict;
use warnings;

my @data = ("aAa\n", "bAc\n", "aCd\n", "dAb\n", "bBc\n");

foreach (@data) {
print if /^.A/;
}
----

produces:

----
aAa
bAc
dAb
----

.... sure looks like it works to me.

If you don't post real scripts with real data and error descriptions,
it's hard to help.

DS
 
B

Brian McCauley

open (INPUT, "input.txt") or die "Can't open data file: $!";

open(OFile, ">output.txt");

while (<INPUT>) {
However, I have a variable called @inputtext, that has all the
information from input.txt in it. (e.g. if I would print @inputtext to
a file, that file would be a copy of input.txt )

And the while loop should work on the contents of @inputtext, line by
line.
Unfortunately,
while ( @inputtext ) {

or

foreach(@results) {

doesn't work.

How can I make it working?

Trick question?

foreach(@inputtext) {
 
D

David Squire

Ferry said:
David Squire:


Since when?

Since lexical filehandles have been supported (5.6.0, I think). It gives
you filehandles with limited scope, and lets 'use strict;' help you will
typos.

It might have been clearer to say "discouraged by most posters in this
group".
..assuming you're running a version of Perl which supports them.

Indeed. It's not as if it's that recent a change.

DS
 
B

bekijkfotos

Thank you all very much for your help!

I'm sorry I didn't post my actual perl scripts, because I thought that
the examples I gave were sufficient to point out my problem. I will
post the real scripts I made in this post.

Generally, I'm trying to modify an existing perl script (PREPv1-0.pl by
Christopher M. Frenz) which is designed to do a PubMed database search
using the command prompt. This script generates a html page with a
description of all results from a certain PubMed query.
I don't want a html page, I want a text-database with only relevant
(for me) information. (year, journal, title, authors).
My current problem is that my filter in the foreach loop is only
carried out once, even if there are more lines that match the query (In
my example below it only gives me one author, whereas it should give me
more authors). A corresponding while loop in another script does it
correctly. But then I have to create a text file and run a seperate
script on that text file, while I want to perform all necessary
actions in one script.

With kind regards,

Jaap

my files:
I run it in windows 2000 (Activeperl 5.8.8)
c:\perl\bin\perl grabPubmed.pl van ingen jansen
(This uses the query "van Ingen Jansen ", which results in one hit on
PubMed)

grabPubmed.pl
--
#c:\perl\bin\perl

use strict;
use warnings;



# PREP (Perl RegExps for Pubmed) is a script that allows the use of
# Perl regexs in the searching of Pubmed records, providing the ability
to search
# records for textual patterns as well as keywords

# Copyright 2005- Christopher M. Frenz
# This script is free sofware it may be used, copied, redistributed,
and/or modified
# under the terms laid forth in the Perl Artisic License

# Please cite this script in any publication in which literature cited
within the
# publication was located using the PREP.pl script.

# Usage: perl PREPv1-0.pl PubmedQueryTerms

# Usage of this script requires the LWP and XML::LibXML modules are
installed
use LWP;
use XML::LibXML; #Version 1.58 used for development and testing


my $request;
my $response;
my $query;

# Concatenates arguments passed to script to form Pubmed query
$query=join(" ", @ARGV);

# Creates the URL to search Pubmed
my $baseurl="http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?";
my $url=$baseurl . "db=Pubmed&retmax=1&usehistory=y&term=" . $query;


# Searches Pubmed and Returns the number of results
# as well as the session information needed for results retrieval
$request=LWP::UserAgent->new();
$response=$request->get($url);
my $results= $response->content;
die unless $response->is_success;
print "PubMed Search Results \n";

$results=~/<Count>(\d+)<\/Count>/;
my $NumAbstracts=$1;
$results=~/<QueryKey>(\d+)<\/QueryKey>/;
my $QueryKey=$1;
$results=~/<WebEnv>(.*?)<\/WebEnv>/;
my $WebEnv=$1;


print "$NumAbstracts are Available \n";

my $parser=XML::LibXML->new;

my $retmax=500; #Number of records to be retrieved per request-Max 500
my $retstart=0; #Record number to start retreival from

# Creates the URL needed to retrieve results
$baseurl="http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?";
my
$url2="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=";

my $Count=0;
# Retreives results in XML format
for($retstart=0;$retstart<=$NumAbstracts;$retstart+=$retmax){
print "Processing record # $retstart \n";
$url=$baseurl .
"rettype=abstract&retmode=xml&retstart=$retstart&retmax=$retmax&db=Pubmed&query_key=$QueryKey&WebEnv=$WebEnv";

$response=$request->get($url);
$results=$response->content;
die unless $response->is_success;

}

open my $OFile, '>', 'output.txt' or die "Can't open output file: $!";


my $tracker = 0; # The tag "Year" occurs more times in the xml file,
therefore I only want to read the Year-line beneath the PubDate tag.

foreach ($results){
next if /^#/; # skip comments
next if /^\s*$/; # skip empty lines
chomp; # remove line terminator



if ( /<PMID>/ ) {
/<PMID>(.*)<\/PMID>/;
print $OFile "$1 \n";

}

if ( /<PubDate>/ ) {
$tracker = 1;
}

if ( /<Year>/ ) {
if ($tracker == 1) {
/<Year>(.*)<\/Year>/;
print $OFile "$1 \n";
$tracker = 0;
}
}
if ( /<Title>/ ) {
/<Title>(.*)<\/Title>/;
print $OFile "$1 \n";
}
if ( /<ArticleTitle>/ ) {
/<ArticleTitle>(.*)<\/ArticleTitle>/;
print $OFile "$1 \n";
}
if ( /<LastName>/ ) {
/<LastName>(.*)<\/LastName>/;
print $OFile "$1 \n";

}

}

close $OFile;

--

The output file is not complete, it doesn't list all the authors.

output.txt
--
14705930
2004
Biochemistry.
Extension of the binding motif of the Sin3 interacting domain of the
Mad family proteins.
van Ingen
--

When I then write the XML ($results) to a file:

grabPubmed_full.pl
--
#c:\perl\bin\perl

use strict;
use warnings;



# PREP (Perl RegExps for Pubmed) is a script that allows the use of
# Perl regexs in the searching of Pubmed records, providing the ability
to search
# records for textual patterns as well as keywords

# Copyright 2005- Christopher M. Frenz
# This script is free sofware it may be used, copied, redistributed,
and/or modified
# under the terms laid forth in the Perl Artisic License

# Please cite this script in any publication in which literature cited
within the
# publication was located using the PREP.pl script.

# Usage: perl PREPv1-0.pl PubmedQueryTerms

# Usage of this script requires the LWP and XML::LibXML modules are
installed
use LWP;
use XML::LibXML; #Version 1.58 used for development and testing


my $request;
my $response;
my $query;

# Concatenates arguments passed to script to form Pubmed query
$query=join(" ", @ARGV);

# Creates the URL to search Pubmed
my $baseurl="http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?";
my $url=$baseurl . "db=Pubmed&retmax=1&usehistory=y&term=" . $query;


# Searches Pubmed and Returns the number of results
# as well as the session information needed for results retrieval
$request=LWP::UserAgent->new();
$response=$request->get($url);
my $results= $response->content;
die unless $response->is_success;
print "PubMed Search Results \n";

$results=~/<Count>(\d+)<\/Count>/;
my $NumAbstracts=$1;
$results=~/<QueryKey>(\d+)<\/QueryKey>/;
my $QueryKey=$1;
$results=~/<WebEnv>(.*?)<\/WebEnv>/;
my $WebEnv=$1;


print "$NumAbstracts are Available \n";

my $parser=XML::LibXML->new;

my $retmax=500; #Number of records to be retrieved per request-Max 500
my $retstart=0; #Record number to start retreival from

# Creates the URL needed to retrieve results
$baseurl="http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?";
my
$url2="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=";

my $Count=0;
# Retreives results in XML format
for($retstart=0;$retstart<=$NumAbstracts;$retstart+=$retmax){
print "Processing record # $retstart \n";
$url=$baseurl .
"rettype=abstract&retmode=xml&retstart=$retstart&retmax=$retmax&db=Pubmed&query_key=$QueryKey&WebEnv=$WebEnv";

$response=$request->get($url);
$results=$response->content;
die unless $response->is_success;

}

open my $OFile, '>', 'output_full.txt' or die "Can't open output file:
$!";

print $OFile $results;


close $OFile;

--

resulting in this file:

output_full.txt
--
<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st
January 2006//EN"
"http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_060101.dtd">
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID>14705930</PMID>
<DateCreated>
<Year>2004</Year>
<Month>01</Month>
<Day>06</Day>
</DateCreated>
<DateCompleted>
<Year>2004</Year>
<Month>05</Month>
<Day>12</Day>
</DateCompleted>
<DateRevised>
<Year>2005</Year>
<Month>11</Month>
<Day>17</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Print">0006-2960</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>43</Volume>
<Issue>1</Issue>
<PubDate>
<Year>2004</Year>
<Month>Jan</Month>
<Day>13</Day>
</PubDate>
</JournalIssue>
<Title>Biochemistry. </Title>
<ISOAbbreviation>Biochemistry</ISOAbbreviation>
</Journal>
<ArticleTitle>Extension of the binding motif of the Sin3
interacting domain of the Mad family proteins.</ArticleTitle>
<Pagination>
<MedlinePgn>46-54</MedlinePgn>
</Pagination>
<Abstract>
<AbstractText>Sin3 forms the scaffold for a
multiprotein corepressor complex that silences transcription via the
action of histone deacetylases. Sin3 is recruited to the DNA by several
DNA binding repressors, such as the helix-loop-helix proteins of the
Mad family. Here, we elaborate on the Mad-Sin3 interaction based on a
binding study, solution structure, and dynamics of the PAH2 domain of
mSin3 in complex to an extended Sin3 interacting domain (SID) of 24
residues of Mad1. We show that SID residues Met7 and Glu23, outside the
previously defined minimal binding motif, mediate additional
hydrophobic and electrostatic interactions with PAH2. On the basis of
these results we propose an extended consensus sequence describing the
PAH2-SID interaction specifically for the Mad family, showing that
residues outside the hydrophobic core of the SID interact with PAH2 and
modulate binding affinity to appropriate levels.</AbstractText>
</Abstract>
<Affiliation>Departments of Biophysical Chemistry and
Molecular Biology, NSRIM Center, University of Nijmegen, Toernooiveld
1, 6525 ED Nijmegen, The Netherlands.</Affiliation>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>van Ingen</LastName>
<ForeName>Hugo</ForeName>
<Initials>H</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Lasonder</LastName>
<ForeName>Edwin</ForeName>
<Initials>E</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Jansen</LastName>
<ForeName>Jacobus F A</ForeName>
<Initials>JF</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Kaan</LastName>
<ForeName>Anita M</ForeName>
<Initials>AM</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Spronk</LastName>
<ForeName>Christian A E M</ForeName>
<Initials>CA</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Stunnenberg</LastName>
<ForeName>Henk G</ForeName>
<Initials>HG</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Vuister</LastName>
<ForeName>Geerten W</ForeName>
<Initials>GW</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<DataBankList CompleteYN="Y">
<DataBank>
<DataBankName>PDB</DataBankName>
<AccessionNumberList>
<AccessionNumber>1PD7</AccessionNumber>
</AccessionNumberList>
</DataBank>
</DataBankList>
<PublicationTypeList>
<PublicationType>Journal Article</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>Biochemistry</MedlineTA>
<NlmUniqueID>0370623</NlmUniqueID>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>Basic Helix-Loop-Helix Leucine Zipper
Transcription Factors</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>Caenorhabditis elegans
Proteins</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>DNA-Binding Proteins</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>Fungal Proteins</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>MXD1 protein, human</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>Membrane Proteins</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>PAH2 protein, Pichia
angusta</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>Repressor Proteins</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>SID-1 protein, C
elegans</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>SIN3 protein, S
cerevisiae</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>Saccharomyces cerevisiae
Proteins</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>Solutions</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>Transcription
Factors</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Amino Acid
Motifs</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Amino Acid
Sequence</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName
MajorTopicYN="N">Animals</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Basic Helix-Loop-Helix
Leucine Zipper Transcription Factors</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Caenorhabditis elegans
Proteins</DescriptorName>
<QualifierName
MajorTopicYN="N">chemistry</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Comparative
Study</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Conserved
Sequence</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Crystallography,
X-Ray</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">DNA-Binding
Proteins</DescriptorName>
<QualifierName
MajorTopicYN="Y">chemistry</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Fungal
Proteins</DescriptorName>
<QualifierName
MajorTopicYN="N">chemistry</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName
MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Membrane
Proteins</DescriptorName>
<QualifierName
MajorTopicYN="N">chemistry</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Molecular Sequence
Data</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Multigene
Family</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Nuclear Magnetic
Resonance, Biomolecular</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Protein
Binding</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Protein Structure,
Tertiary</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="Y">Repressor
Proteins</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Research Support,
Non-U.S. Gov't</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="Y">Saccharomyces
cerevisiae Proteins</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Sequence
Alignment</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="Y">Sequence Homology,
Amino Acid</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName
MajorTopicYN="N">Solutions</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Surface Plasmon
Resonance</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName
MajorTopicYN="N">Thermodynamics</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Transcription
Factors</DescriptorName>
<QualifierName
MajorTopicYN="Y">chemistry</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="pubmed">
<Year>2004</Year>
<Month>1</Month>
<Day>7</Day>
<Hour>5</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2004</Year>
<Month>5</Month>
<Day>13</Day>
<Hour>5</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">14705930</ArticleId>
<ArticleId IdType="doi">10.1021/bi0355645</ArticleId>
</ArticleIdList>
</PubmedData>
</PubmedArticle>



</PubmedArticleSet>
---

And then run another script:

parseXML.pl
--
#c:\perl\bin\perl
use strict;
use warnings;



open my $INPUT, '<', 'output_full.txt' or die "Can't open data file:
$!";


open my $OFile, '>', 'parsed_output.txt' or die "Can't open output
file: $!";


my $tracker = 0;

#print OFile "$INPUT";

while (<$INPUT>) {
next if /^#/; # skip comments
next if /^\s*$/; # skip empty lines
chomp; # remove line terminator



if ( /<PMID>/ ) {
/<PMID>(.*)<\/PMID>/;
print $OFile "$1 \n";

}

if ( /<PubDate>/ ) {
$tracker = 1;
}

if ( /<Year>/ ) {
if ($tracker == 1) {
/<Year>(.*)<\/Year>/;
print $OFile "$1 \n";
$tracker = 0;
}
}
if ( /<Title>/ ) {
/<Title>(.*)<\/Title>/;
print $OFile "$1 \n";
}
if ( /<ArticleTitle>/ ) {
/<ArticleTitle>(.*)<\/ArticleTitle>/;
print $OFile "$1 \n";
}
if ( /<LastName>/ ) {
/<LastName>(.*)<\/LastName>/;
print $OFile "$1 \n";

}

}


close $OFile;

close $INPUT;
--

I get what I want:

parsed_output.txt
--
14705930
2004
Biochemistry.
Extension of the binding motif of the Sin3 interacting domain of the
Mad family proteins.
van Ingen
Lasonder
Jansen
Kaan
Spronk
Stunnenberg
Vuister
--
 
P

Paul Lalli

My current problem is that my filter in the foreach loop is only
carried out once, even if there are more lines that match the query

# Searches Pubmed and Returns the number of results
# as well as the session information needed for results retrieval
$request=LWP::UserAgent->new();
$response=$request->get($url);
my $results= $response->content;
die unless $response->is_success;
print "PubMed Search Results \n";

foreach ($results){

Here's your problem. $results is a scalar. One variable. This loop
says "for each element of the list containing ($results)" That list
contains only one element. Your foreach loop is only executing once.
It has nothing to do with any of the code inside of the for loop.

You need to figure out what you actually want to iterate over. Maybe
you want to split $results on newlines and iterate over each "line"
inside $results? I don't know, because I'm not going to take the time
to parse this massive program to figure out what you're actually trying
to do.

Paul Lalli
 
T

Tad McClellan

I inserted the code:

my @array = split /\n / , $results;

foreach (@array){


That is a horrid choice of name.

The at-sign already means array, so naming it "array" does not
add any useful information.

You should choose *meaningful* named for your variables.

@authors

perhaps?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top