newbie help

R

Ram

How do I search for just the ordsts start(<ordsts>) and end tags(</ordsts>)
and the data between them, and get just the last matched one. Also would
need an idea of how to get the last two matches.

Thanks for the pointers.


Sample Input file:
<logos>
<ordsts>
<gname>
</gname>
</ordsts>
<ordadd>
<aname>
</aname>
</ordadd>
</logos>
<customer>
<contact>
<pname>
</pname>
</contact>
<ordsts>
<name>
</name>
</ordsts>
<shipname>
<sname>
</sname>
</shipname>
</customer>
<ordsts>
<doc_hdr>
<type_code>ORDSTS</type_code>
<type_suffix>LE</type_suffix>
<direction>IN</direction>
</doc_hdr>
<ord_keys>
<ordno>200000</ordno>
</ord_keys>
<req_obj>
<obj>order_header</obj>
<obj>order_line</obj>
</req_obj>
</ordsts>
<order> <doc_hdr> <type_code>ORDER</type_code>
<type_suffix>LE</type_suffix> <direction>IN</direction> <client_da
a>User Supplied Data</client_data> <client_id>User Supplied
Data</client_id> <correlation_id>414D51204C45555343433033202020
040001EEE00042583</correlation_id>
<response_channel>CC.ORDER.REPLY</response_channel>
<correlation_id>41,4d,51,20,4c,45,55
53,43,43,30,33,20,20,20,20,40,0,1e,ee,0,4,25,83,</correlation_id>
<response_channel>LEUSCS01::CC.ORDER.REPLY.CS.S.Q</response_c
annel> </doc_hdr> <customer> <cus_num>3374831</cus_num>
<bill_to> <contact> <con_num>2</con_num> </
ontact> </bill_to> <ship_to> <address>
<adr_num>1</adr_num> </address> <taxwaregeocode> <
eocode>331003600</geocode></order>
<ordsts> <doc_hdr> <type_code>ORDER</type_code>
<type_suffix>LE</type_suffix> <direction>IN</direction> <client_d
ta>User Supplied Data</client_data> <client_id>User Supplied
Data</client_id> <correlation_id>414D51204C4555534343303320202
2040001EEE00042583</correlation_id>
<response_channel>CC.ORDER.REPLY</response_channel>
<correlation_id>41,4d,51,20,4c,45,5
,53,43,43,30,33,20,20,20,20,40,0,1e,ee,0,4,25,83,</correlation_id>
<response_channel>LEUSCS01::CC.ORDER.REPLY.CS.S.Q</response_
hannel> </doc_hdr> <customer> <cus_num>3374831</cus_num>
<bill_to> <contact> <con_num>2</con_num> <
contact> </bill_to> <ship_to> <address>
<adr_num>1</adr_num> </address> <taxwaregeocode>
geocode>331003600</geocode></ordsts>
 
J

J Krugman

In said:
Assuming the data is in $_:
my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;

Why doesn't this match everthing between the very first <ordsts>
in the file and the last </ordsts>? Isn't the regexp engine supposed
to give the longest match?

jill
 
G

Gunnar Hjalmarsson

J said:
Why doesn't this match everthing between the very first <ordsts> in
the file and the last </ordsts>?

Because the first .* is greedy.
Isn't the regexp engine supposed to give the longest match?

Nope.

Please read about greediness in perldoc perlre.
 
R

Ram

This string does not match if <ordsts> and </ordsts> has child tags spread
across multiple lines.

If I stick this to the end of file, it does not match:
<ordsts>
<gname>
</gname>
</ordsts>
But it matches:
<ordsts> <gname> </gname> </ordsts>

For my case, it should match the both, including the child tags.

Thanks!!
 
C

Chris

Ram said:
How do I search for just the ordsts start(<ordsts>) and end tags(</ordsts>)
and the data between them, and get just the last matched one. Also would
need an idea of how to get the last two matches.

Thanks for the pointers.

[snipped sample XML]

If this is XML, as it appears to be, you might do better parsing and get
better overall mileage from using XML::Simple or one of its close cousins.

(Wondering if this is the "Ram" that *I* know. If so, I hope you are
doing well.)

Chris
 
G

Gunnar Hjalmarsson

[ Please do not top post! ]
This string does not match if <ordsts> and </ordsts> has child
tags spread across multiple lines.

It's not a string, it's a regular expression, and it does match over
multiple lines.
If I stick this to the end of file, it does not match:
<ordsts>
<gname>
</gname>
</ordsts>
But it matches:
<ordsts> <gname> </gname> </ordsts>

Would you mind showing us the code you used to end up to that conclusion?
For my case, it should match the both, including the child tags.

And my suggestion does that perfectly well.

Have you began to study perldoc perlre yet? You'd better do so right
away, and don't forget to read about the /s modifier.
 
G

gnari

[note: if you do not top-post then it is more likely we want to help.
it si annoying when you put your follow-up at the top of your message,
quoting the message you are rplying to under that (in this case in whole)]

This string does not match if <ordsts> and </ordsts> has child tags spread
across multiple lines.
...

key sentence, perhaps?

are you matching one line at a time?

gnari
 
J

James Willmore

[please don't top post - reordered to proper format] On Wed, 04 Feb 2004
This string does not match if <ordsts> and </ordsts> has child tags
spread across multiple lines.

If I stick this to the end of file, it does not match: <ordsts>
<gname>
</gname>
</ordsts>
But it matches:
<ordsts> <gname> </gname> </ordsts>

For my case, it should match the both, including the child tags.

I'd follow the suggestion offered by Chris Olive - use an XML module to
parse your data. It will save you lots of time and effort - and reduce
the amount of "mistakes" made in parsing. Right now, if someone changes
the format of the file, you'll have to go through a similar type exercise
again in the future.

Again, it's just a suggestion :)

HTH

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
You never know how many friends you have until you rent a house
<on the beach.
 
R

Ram

Script I used:

#!/usr/bin/perl
use strict;
my $el;
open(ONE, "ordsts.txt" ) or die "Can't open file $! \n";
while (<ONE>) {
#print "$_ \n";
my @lastmatch = /.*(<ordsts>.*<\/ordsts>)/s;
print "@lastmatch \n";
$el= my @lastmatch;
}
print "$el \n";



I am not the Ram you know!!


Chris said:
Ram said:
How do I search for just the ordsts start(<ordsts>) and end
tags( said:
and the data between them, and get just the last matched one. Also would
need an idea of how to get the last two matches.

Thanks for the pointers.

[snipped sample XML]

If this is XML, as it appears to be, you might do better parsing and get
better overall mileage from using XML::Simple or one of its close cousins.

(Wondering if this is the "Ram" that *I* know. If so, I hope you are
doing well.)

Chris
-----
Chris Olive
chris -at- technologEase -dot- com
http://www.technologEase.com
(pronounced "technologies")
 
G

Gunnar Hjalmarsson

Ram said:
Script I used:

#!/usr/bin/perl
use strict;
my $el;
open(ONE, "ordsts.txt" ) or die "Can't open file $! \n";
while (<ONE>) {
#print "$_ \n";
my @lastmatch = /.*(<ordsts>.*<\/ordsts>)/s;
print "@lastmatch \n";
$el= my @lastmatch;
}
print "$el \n";

It proves that gnari guessed right: You are applying the regex to one
line at a time, which obviously can't work.

Try this instead:

#!/usr/bin/perl
use strict;
use warnings;
open ONE, "ordsts.txt" or die "Can't open file $!";
$_ = do { local $/; <ONE> }; # slurp file into $_
close ONE;
my ($el) = /.*(<ordsts>.*<\/ordsts>).*/s;
print "$el\n";
 
T

Tad McClellan

[ Please do not post upside-down followups ]


Ram said:
This string does not match


Does not match *what* ?

if <ordsts> and </ordsts> has child tags spread
across multiple lines.


How are you getting the multiple lines into $_ ?



That _will_ match across multiple lines.

You are probably running afoul of this Frequently Asked Question:

I'm having trouble matching over more than one line. What's wrong?
 
R

Ram

Excellent, a lot to learn!!
Gunnar Hjalmarsson said:
It proves that gnari guessed right: You are applying the regex to one
line at a time, which obviously can't work.

Try this instead:

#!/usr/bin/perl
use strict;
use warnings;
open ONE, "ordsts.txt" or die "Can't open file $!";
$_ = do { local $/; <ONE> }; # slurp file into $_
close ONE;
my ($el) = /.*(<ordsts>.*<\/ordsts>).*/s;
print "$el\n";
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top