newbie help

Discussion in 'Perl Misc' started by Ram, Feb 3, 2004.

  1. Ram

    Ram Guest

    How do I search for just the ordsts start(<ordsts>) and end tags(</ordsts>)
    and the data between them, and get just the last matched one. Also would
    need an idea of how to get the last two matches.

    Thanks for the pointers.


    Sample Input file:
    <logos>
    <ordsts>
    <gname>
    </gname>
    </ordsts>
    <ordadd>
    <aname>
    </aname>
    </ordadd>
    </logos>
    <customer>
    <contact>
    <pname>
    </pname>
    </contact>
    <ordsts>
    <name>
    </name>
    </ordsts>
    <shipname>
    <sname>
    </sname>
    </shipname>
    </customer>
    <ordsts>
    <doc_hdr>
    <type_code>ORDSTS</type_code>
    <type_suffix>LE</type_suffix>
    <direction>IN</direction>
    </doc_hdr>
    <ord_keys>
    <ordno>200000</ordno>
    </ord_keys>
    <req_obj>
    <obj>order_header</obj>
    <obj>order_line</obj>
    </req_obj>
    </ordsts>
    <order> <doc_hdr> <type_code>ORDER</type_code>
    <type_suffix>LE</type_suffix> <direction>IN</direction> <client_da
    a>User Supplied Data</client_data> <client_id>User Supplied
    Data</client_id> <correlation_id>414D51204C45555343433033202020
    040001EEE00042583</correlation_id>
    <response_channel>CC.ORDER.REPLY</response_channel>
    <correlation_id>41,4d,51,20,4c,45,55
    53,43,43,30,33,20,20,20,20,40,0,1e,ee,0,4,25,83,</correlation_id>
    <response_channel>LEUSCS01::CC.ORDER.REPLY.CS.S.Q</response_c
    annel> </doc_hdr> <customer> <cus_num>3374831</cus_num>
    <bill_to> <contact> <con_num>2</con_num> </
    ontact> </bill_to> <ship_to> <address>
    <adr_num>1</adr_num> </address> <taxwaregeocode> <
    eocode>331003600</geocode></order>
    <ordsts> <doc_hdr> <type_code>ORDER</type_code>
    <type_suffix>LE</type_suffix> <direction>IN</direction> <client_d
    ta>User Supplied Data</client_data> <client_id>User Supplied
    Data</client_id> <correlation_id>414D51204C4555534343303320202
    2040001EEE00042583</correlation_id>
    <response_channel>CC.ORDER.REPLY</response_channel>
    <correlation_id>41,4d,51,20,4c,45,5
    ,53,43,43,30,33,20,20,20,20,40,0,1e,ee,0,4,25,83,</correlation_id>
    <response_channel>LEUSCS01::CC.ORDER.REPLY.CS.S.Q</response_
    hannel> </doc_hdr> <customer> <cus_num>3374831</cus_num>
    <bill_to> <contact> <con_num>2</con_num> <
    contact> </bill_to> <ship_to> <address>
    <adr_num>1</adr_num> </address> <taxwaregeocode>
    geocode>331003600</geocode></ordsts>
     
    Ram, Feb 3, 2004
    #1
    1. Advertising

  2. Ram wrote:
    > How do I search for just the ordsts start(<ordsts>) and end
    > tags(</ordsts>) and the data between them, and get just the last
    > matched one.


    Assuming the data is in $_:

    my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;

    > Also would need an idea of how to get the last two matches.


    I leave that as an excercise to you. :)

    > Thanks for the pointers.


    http://www.perldoc.com/perl5.8.0/pod/perlre.html

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Feb 3, 2004
    #2
    1. Advertising

  3. Ram

    J Krugman Guest

    In <bvp3d5$ujeo2$-berlin.de> Gunnar Hjalmarsson <> writes:

    >Assuming the data is in $_:


    > my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;


    Why doesn't this match everthing between the very first <ordsts>
    in the file and the last </ordsts>? Isn't the regexp engine supposed
    to give the longest match?

    jill
     
    J Krugman, Feb 3, 2004
    #3
  4. J Krugman wrote:
    > Gunnar Hjalmarsson writes:
    >> Assuming the data is in $_:
    >>
    >> my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;

    >
    > Why doesn't this match everthing between the very first <ordsts> in
    > the file and the last </ordsts>?


    Because the first .* is greedy.

    > Isn't the regexp engine supposed to give the longest match?


    Nope.

    Please read about greediness in perldoc perlre.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Feb 3, 2004
    #4
  5. Ram

    J Krugman Guest

    In <bvp7nu$v8rpc$-berlin.de> Gunnar Hjalmarsson <> writes:

    >J Krugman wrote:
    >> Gunnar Hjalmarsson writes:
    >>> Assuming the data is in $_:
    >>>
    >>> my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;

    >>
    >> Why doesn't this match everthing between the very first <ordsts> in
    >> the file and the last </ordsts>?


    >Because the first .* is greedy.


    OK, I missed that. Thanks.

    jill
     
    J Krugman, Feb 3, 2004
    #5
  6. Ram

    Ram Guest

    This string does not match if <ordsts> and </ordsts> has child tags spread
    across multiple lines.

    If I stick this to the end of file, it does not match:
    <ordsts>
    <gname>
    </gname>
    </ordsts>
    But it matches:
    <ordsts> <gname> </gname> </ordsts>

    For my case, it should match the both, including the child tags.

    Thanks!!



    "Gunnar Hjalmarsson" <> wrote in message
    news:bvp3d5$ujeo2$-berlin.de...
    > Ram wrote:
    > > How do I search for just the ordsts start(<ordsts>) and end
    > > tags(</ordsts>) and the data between them, and get just the last
    > > matched one.

    >
    > Assuming the data is in $_:
    >
    > my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;
    >
    > > Also would need an idea of how to get the last two matches.

    >
    > I leave that as an excercise to you. :)
    >
    > > Thanks for the pointers.

    >
    > http://www.perldoc.com/perl5.8.0/pod/perlre.html
    >
    > --
    > Gunnar Hjalmarsson
    > Email: http://www.gunnar.cc/cgi-bin/contact.pl
    >
     
    Ram, Feb 4, 2004
    #6
  7. Ram

    Chris Guest

    Ram wrote:
    > How do I search for just the ordsts start(<ordsts>) and end tags(</ordsts>)
    > and the data between them, and get just the last matched one. Also would
    > need an idea of how to get the last two matches.
    >
    > Thanks for the pointers.
    >
    > [snipped sample XML]


    If this is XML, as it appears to be, you might do better parsing and get
    better overall mileage from using XML::Simple or one of its close cousins.

    (Wondering if this is the "Ram" that *I* know. If so, I hope you are
    doing well.)

    Chris
    -----
    Chris Olive
    chris -at- technologEase -dot- com
    http://www.technologEase.com
    (pronounced "technologies")
     
    Chris, Feb 4, 2004
    #7
  8. [ Please do not top post! ]

    Ram wrote:
    > Gunnar Hjalmarsson wrote:
    >>
    >> Assuming the data is in $_:
    >>
    >> my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;

    >
    > This string does not match if <ordsts> and </ordsts> has child
    > tags spread across multiple lines.


    It's not a string, it's a regular expression, and it does match over
    multiple lines.

    > If I stick this to the end of file, it does not match:
    > <ordsts>
    > <gname>
    > </gname>
    > </ordsts>
    > But it matches:
    > <ordsts> <gname> </gname> </ordsts>


    Would you mind showing us the code you used to end up to that conclusion?

    > For my case, it should match the both, including the child tags.


    And my suggestion does that perfectly well.

    Have you began to study perldoc perlre yet? You'd better do so right
    away, and don't forget to read about the /s modifier.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Feb 4, 2004
    #8
  9. Ram

    gnari Guest

    "Ram" <> wrote in message
    news:bvr8mb$jaj$...

    [note: if you do not top-post then it is more likely we want to help.
    it si annoying when you put your follow-up at the top of your message,
    quoting the message you are rplying to under that (in this case in whole)]


    > This string does not match if <ordsts> and </ordsts> has child tags

    spread
    > across multiple lines.
    > ...


    > "Gunnar Hjalmarsson" <> wrote in message
    > news:bvp3d5$ujeo2$-berlin.de...
    > >
    > > Assuming the data is in $_:


    key sentence, perhaps?

    > >
    > > my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;


    are you matching one line at a time?

    gnari
     
    gnari, Feb 4, 2004
    #9
  10. [please don't top post - reordered to proper format] On Wed, 04 Feb 2004
    11:05:06 -0600, Ram wrote:
    > "Gunnar Hjalmarsson" <> wrote in message
    > news:bvp3d5$ujeo2$-berlin.de...
    >> Ram wrote:
    >> > How do I search for just the ordsts start(<ordsts>) and end
    >> > tags(</ordsts>) and the data between them, and get just the last
    >> > matched one.

    >>
    >> Assuming the data is in $_:
    >>
    >> my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;
    >>
    >> > Also would need an idea of how to get the last two matches.

    >>
    >> I leave that as an excercise to you. :)
    >>
    >> > Thanks for the pointers.

    >>
    >> http://www.perldoc.com/perl5.8.0/pod/perlre.html
    >>
    >> --
    >> Gunnar Hjalmarsson
    >> Email: http://www.gunnar.cc/cgi-bin/contact.pl
    >>

    > This string does not match if <ordsts> and </ordsts> has child tags
    > spread across multiple lines.
    >
    > If I stick this to the end of file, it does not match: <ordsts>
    > <gname>
    > </gname>
    > </ordsts>
    > But it matches:
    > <ordsts> <gname> </gname> </ordsts>
    >
    > For my case, it should match the both, including the child tags.


    I'd follow the suggestion offered by Chris Olive - use an XML module to
    parse your data. It will save you lots of time and effort - and reduce
    the amount of "mistakes" made in parsing. Right now, if someone changes
    the format of the file, you'll have to go through a similar type exercise
    again in the future.

    Again, it's just a suggestion :)

    HTH

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    You never know how many friends you have until you rent a house
    <on the beach.
     
    James Willmore, Feb 4, 2004
    #10
  11. Ram

    Ram Guest

    Script I used:

    #!/usr/bin/perl
    use strict;
    my $el;
    open(ONE, "ordsts.txt" ) or die "Can't open file $! \n";
    while (<ONE>) {
    #print "$_ \n";
    my @lastmatch = /.*(<ordsts>.*<\/ordsts>)/s;
    print "@lastmatch \n";
    $el= my @lastmatch;
    }
    print "$el \n";



    I am not the Ram you know!!


    "Chris" <> wrote in message
    news:3uaUb.32014$...
    > Ram wrote:
    > > How do I search for just the ordsts start(<ordsts>) and end

    tags(</ordsts>)
    > > and the data between them, and get just the last matched one. Also would
    > > need an idea of how to get the last two matches.
    > >
    > > Thanks for the pointers.
    > >
    > > [snipped sample XML]

    >
    > If this is XML, as it appears to be, you might do better parsing and get
    > better overall mileage from using XML::Simple or one of its close cousins.
    >
    > (Wondering if this is the "Ram" that *I* know. If so, I hope you are
    > doing well.)
    >
    > Chris
    > -----
    > Chris Olive
    > chris -at- technologEase -dot- com
    > http://www.technologEase.com
    > (pronounced "technologies")
    >
     
    Ram, Feb 4, 2004
    #11
  12. Ram <> wrote:

    > Subject: newbie help



    Please put the subject of your article in the Subject of your article.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Feb 4, 2004
    #12
  13. Ram wrote:
    > Script I used:
    >
    > #!/usr/bin/perl
    > use strict;
    > my $el;
    > open(ONE, "ordsts.txt" ) or die "Can't open file $! \n";
    > while (<ONE>) {
    > #print "$_ \n";
    > my @lastmatch = /.*(<ordsts>.*<\/ordsts>)/s;
    > print "@lastmatch \n";
    > $el= my @lastmatch;
    > }
    > print "$el \n";


    It proves that gnari guessed right: You are applying the regex to one
    line at a time, which obviously can't work.

    Try this instead:

    #!/usr/bin/perl
    use strict;
    use warnings;
    open ONE, "ordsts.txt" or die "Can't open file $!";
    $_ = do { local $/; <ONE> }; # slurp file into $_
    close ONE;
    my ($el) = /.*(<ordsts>.*<\/ordsts>).*/s;
    print "$el\n";

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Feb 4, 2004
    #13
  14. [ Please do not post upside-down followups ]


    Ram <> wrote:

    > This string does not match



    Does not match *what* ?


    > if <ordsts> and </ordsts> has child tags spread
    > across multiple lines.



    How are you getting the multiple lines into $_ ?


    > "Gunnar Hjalmarsson" <> wrote in message
    > news:bvp3d5$ujeo2$-berlin.de...


    >> Assuming the data is in $_:
    >>
    >> my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;



    That _will_ match across multiple lines.

    You are probably running afoul of this Frequently Asked Question:

    I'm having trouble matching over more than one line. What's wrong?


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Feb 5, 2004
    #14
  15. Ram

    Ram Guest

    Excellent, a lot to learn!!
    "Gunnar Hjalmarsson" <> wrote in message
    news:bvs00f$109jb0$-berlin.de...
    > Ram wrote:
    > > Script I used:
    > >
    > > #!/usr/bin/perl
    > > use strict;
    > > my $el;
    > > open(ONE, "ordsts.txt" ) or die "Can't open file $! \n";
    > > while (<ONE>) {
    > > #print "$_ \n";
    > > my @lastmatch = /.*(<ordsts>.*<\/ordsts>)/s;
    > > print "@lastmatch \n";
    > > $el= my @lastmatch;
    > > }
    > > print "$el \n";

    >
    > It proves that gnari guessed right: You are applying the regex to one
    > line at a time, which obviously can't work.
    >
    > Try this instead:
    >
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    > open ONE, "ordsts.txt" or die "Can't open file $!";
    > $_ = do { local $/; <ONE> }; # slurp file into $_
    > close ONE;
    > my ($el) = /.*(<ordsts>.*<\/ordsts>).*/s;
    > print "$el\n";
    >
    > --
    > Gunnar Hjalmarsson
    > Email: http://www.gunnar.cc/cgi-bin/contact.pl
    >
     
    Ram, Feb 5, 2004
    #15
  16. Ram

    gnari Guest

    "Ram" <> wrote in message
    news:bvu4h9$b9g$...
    > Excellent, a lot to learn!!


    specially about top-posting.

    [snipped top-posted quoted whole article]

    gnari
     
    gnari, Feb 5, 2004
    #16
  17. Ram

    Ram Guest

    top posting: response to gnari

    Is this the correct way to post (not top-posting), while responding to
    gnari!!

    "Tad McClellan" <> wrote in message
    news:...
    > Ram <> wrote:
    >
    > > Subject: newbie help

    >
    >
    > Please put the subject of your article in the Subject of your article.
    >
    >
    > --
    > Tad McClellan SGML consulting
    > Perl programming
    > Fort Worth, Texas
     
    Ram, Feb 5, 2004
    #17
  18. Re: top posting: response to gnari

    Ram <> wrote:

    > Is this the correct way to post (not top-posting),



    No it isn't.

    *plonk*



    [ snip TOFU ]

    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Feb 5, 2004
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. James Bond
    Replies:
    0
    Views:
    561
    James Bond
    Aug 3, 2004
  2. Id0x
    Replies:
    4
    Views:
    1,222
    Erik Max Francis
    Jul 21, 2003
  3. JohnE

    newbie with newbie questions

    JohnE, Aug 17, 2009, in forum: ASP .Net
    Replies:
    3
    Views:
    534
    Gregory A. Beamer
    Aug 17, 2009
  4. LeTubs
    Replies:
    6
    Views:
    137
    Tad McClellan
    Jan 25, 2004
  5. Replies:
    56
    Views:
    771
    alex23
    Dec 27, 2012
Loading...

Share This Page