problem with pattern match

D

Dafke8

Hi all,

I'm making a perl script that takes names of people out of a text and than I
must link the persons name to the text where his name was in.
But now i'm trying to get a list with all the names in the text. I already
have an array with the text but when i try to get all the names in a list, i
only get the first name.
Below is the code i use for it:
(every name is between a <span class="naam" id="n1"></span> tag)

foreach $i (@inputtext){
$i =~ /<span\sclass="naam"\sid="n[0-9]+">(.*?)<\/span>/g;
@test = $1;
}

Does anyone know what the problem is with my expression?


below is a piece of the text i'm searchin in:

<h3><font color="#0000FF">BA585 Akte nr.<span class="akte"
id="a0">1774-1</span> Bestand: 480 - 481</font></h3>
<p>Op 21 februari 1774 lenen <span class="naam" id="n0">Verachtert Petrus
Josephus</span> en zijn echtgenote <span class="naam" id="n1">Verboven Anna
Elisabeth</span> bij <span class="naam" id="n2">Bleirinckx Marten</span>,
<span class="naam" id="n3">Groenen Jan</span> en <span class="naam"
id="n4">Van Nuten Jan</span>, administrateurs van de fondatie gefondeerd
door wijlen de Heer <span class="naam" id="n5">Swinnen Henricus</span>, ten
gunste van de <span class="plaats" id="p1">kapel van Meren</span>.</p>
<p>Zij hypothekeren een huis, hof,schuur en binnenveld in <span
class="plaats" id="p2">Boeckel</span>. Palende oost: den <span
class="plaats" id="p3">Aert</span>, zuid: d'erfgen. <span class="naam"
id="n6">Bellens Adr.</span>, west: <span class="naam" id="n7">Van Eynde
Jan</span>, noord: <span class="naam" id="n8">Hermans Peeter</span> en een
perceel land genaamd den <span class="plaats" id="p4">langen reep</span>.
Palende oost: d'erfgen. <span class="naam" id="n9">Bellens Adr.</span> en
den <span class="plaats" id="p5">Aert</span>, zuid: de <span class="plaats"
id="p6">Bijlestraete</span> west: <span class="naam" id="n10">Verdonck
Maria</span>, noord: <span class="naam" id="n11">Van Hove Jan</span>.</p>
 
M

Mark Clements

Dafke8 said:
must link the persons name to the text where his name was in.
But now i'm trying to get a list with all the names in the text. I already
have an array with the text but when i try to get all the names in a list, i
only get the first name.

foreach $i (@inputtext){
$i =~ /<span\sclass="naam"\sid="n[0-9]+">(.*?)<\/span>/g;
@test = $1;
}

take a look at man perlop - /g on a match operator means the expression returns a list of
matches in parentheses. You want something like:

my @matches = ();
foreach my $line( @inputtext){
push @matches,$line =~ m!<span\s+class="naam"\s+id="n[0-9]+">(.*?)</span>!g;
}

Mark
 
P

Paul Lalli

Hi all,

I'm making a perl script that takes names of people out of a text and than I
must link the persons name to the text where his name was in.
But now i'm trying to get a list with all the names in the text. I already
have an array with the text but when i try to get all the names in a list, i
only get the first name.
Below is the code i use for it:
(every name is between a <span class="naam" id="n1"></span> tag)

foreach $i (@inputtext){
$i =~ /<span\sclass="naam"\sid="n[0-9]+">(.*?)<\/span>/g;
@test = $1;
}

Does anyone know what the problem is with my expression?


There are, unfortunately, several problems:
1) You're reassigning the value of @test each time through the loop,
rather than pushing new values onto it.
2) You're reassigning the value of @test *every* time through the loop,
regardless of whether the pattern match succeeded.
3) You're using a global pattern match but only (theoretically) storing
one of the possible matches, rather than all of them.
4) You're only looking for the whole pattern on each line, ignoring the
possibility that the pattern could span multiple lines of your input.
5) And finally, you're using regular expressions to parse HTML, rather
than one of the several HTML Parsing modules available on CPAN.

Paul Lalli
 
D

Dafke8

take a look at man perlop - /g on a match operator means the expression returns a list of
matches in parentheses. You want something like:

my @matches = ();
foreach my $line( @inputtext){
push @matches,$line =~
m! said:

Thank you, now it works. But I have one question, why do you use ! for the
reg. expr. and not /, i've tried it with / and it did not work.
 
M

Mark Clements

Dafke8 said:
Thank you, now it works. But I have one question, why do you use ! for the
reg. expr. and not /, i've tried it with / and it did not work.
because it means you can avoid escaping the / in </span>. There are a
number of characters that you can use to delimit a regular expression.
See Paul's post: he makes a number of very good points.

Mark
 
A

Ala Qumsieh

There are a number of characters that you can use
to delimit a regular expression.

You can pretty much use *ANY* character. Whether that's a good thing or not
is totally debatable.

$x = 'you can even use an m as a delimiter';
print "yes\n" if $x =~ m m\mm;

--Ala
 
D

Dafke8

There are, unfortunately, several problems:
1) You're reassigning the value of @test each time through the loop,
rather than pushing new values onto it.
2) You're reassigning the value of @test *every* time through the loop,
regardless of whether the pattern match succeeded.
3) You're using a global pattern match but only (theoretically) storing
one of the possible matches, rather than all of them.
4) You're only looking for the whole pattern on each line, ignoring the
possibility that the pattern could span multiple lines of your input.
5) And finally, you're using regular expressions to parse HTML, rather
than one of the several HTML Parsing modules available on CPAN.

Paul Lalli

Thank you all for your help, I'm not so long working with perl. Because of
that it's all quit new for me and i'm learning every day more and more.

Tnx,

Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top