parsing XML using a regular expression

Discussion in 'Perl Misc' started by Leif Wessman, Sep 8, 2004.

  1. Leif Wessman

    Leif Wessman Guest

    Hi!

    I'm trying to parse some xml with a regular expression (yes, i know
    that there is several XML modules that I can use).

    My problem is that I'm not that good in creating regular expressions.
    The following code does not work as expected. I have a list of items in
    xml. Each item has an id and an optional name (no <name>-tag or
    <name/>). Each item can also have other tags that I'm not interested
    in.

    I'm trying to parse this simle xml document so that I extract the id
    for each item and the name (if it's there).

    However, the output of my program only displays the id:s, not any name.
    That's my first problem. My second problem is that I would like to know
    if it's possible to make my code more efficient (faster and using less
    memory). In reality my xml-file can be quite large.

    My code:
    --------

    #!/usr/bin/perl
    use strict;
    use warnings;

    open (XML, "<items.xml") or die "open: $!";
    my $xml;
    while(my $line = <XML>) {
    $xml = $xml . $line;
    }

    while ($xml =~
    /<item>.*?<id>(.*?)<\/id>.*?(<name>(.*?)<\/name>)?.*?<\/item>/gs) {
    print "id : $1\n";
    if ($3) {
    print "name: $3\n";
    }
    }

    My xml-document:
    ----------------
    <xml>
    <item>
    <id>mf3</id>
    <color>blue</color>
    <name>moto F3</name>
    </item>
    <item>
    <id>nk1</id>
    </item>
    <item>
    <id>jk8</id>
    <name/>
    </item>
    <item>
    <id>la2</id>
    <name>labo 2</name>
    </item>
    <xml>
    My output:
    ----------
    id : mf3
    id : nk1
    id : jk8
    id : la2


    Leif
    Leif Wessman, Sep 8, 2004
    #1
    1. Advertising

  2. Leif Wessman <> wrote:

    > I'm trying to parse some xml with a regular expression (yes, i know
    > that there is several XML modules that I can use).



    You have headed off the 2nd question.

    The 1st question is: why do you want to do it with regular expressions
    rather than with a real parse?

    If you tell us the constraints that prompt your approach, that will
    help us a lot for providing advice...


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Sep 8, 2004
    #2
    1. Advertising

  3. Leif Wessman

    ChrisO Guest

    Bernard El-Hagin wrote:
    > "Leif Wessman" <> wrote:
    >
    >
    > This
    >
    >
    >
    >>I'm trying to parse some xml with a regular expression (yes, i
    >>know that there is several XML modules that I can use).

    >
    >
    >
    > when put together with this
    >
    >
    >
    >>My problem is that I'm not that good in creating regular
    >>expressions. [...]

    >
    >
    >
    > suggests using one of the modules you claim to know about.
    >


    But he's not allowed to use a module because his professor has
    specifically indicated that that is not an option for his assignment.
    Seems pretty clear to me... ;-)

    -ceo
    ChrisO, Sep 9, 2004
    #3
  4. On Thu, 09 Sep 2004 01:11:40 +0000, ChrisO wrote:
    > But he's not allowed to use a module because his professor has
    > specifically indicated that that is not an option for his assignment.
    > Seems pretty clear to me... ;-)


    Are you serious, or joking?

    Is there really a professor teaching regexs to parse XML? And giving
    assignments on it?

    In that case, the *correct* solution is to drop the class while you can
    still get a refund. All the wonderful examples for regexes in the world
    and (s)he chooses the one that can instill deep and abiding bad habits.

    (Caveat: This is acceptable if the end result is to teach that regexes
    aren't sufficient, in the school of hard knocks. In which case I think I
    admire him/her.)
    Jeremy Bowers, Sep 9, 2004
    #4
  5. Leif Wessman

    ChrisO Guest

    Jeremy Bowers wrote:
    > On Thu, 09 Sep 2004 01:11:40 +0000, ChrisO wrote:
    >
    >>But he's not allowed to use a module because his professor has
    >>specifically indicated that that is not an option for his assignment.
    >>Seems pretty clear to me... ;-)

    >
    >
    > Are you serious, or joking?
    >


    I can't think of any other reason why someone would want to parse XML
    and specifically state that they "didn't want" to use the XML modules...
    (twice)? It seems to me to say loudly that this is a class room
    assignment. I've seen worse requirements handed out...

    -ceo
    ChrisO, Sep 9, 2004
    #5
  6. Leif Wessman

    Helgi Briem Guest

    On 9 Sep 2004 00:44:18 -0700, "Leif Wessman" <>
    wrote:

    Don't top-post. It annoys the regulars and severely damages
    your chances of recieving useful help to your questions.

    >I was trying to find a general solution to parsing both HTML and
    >xml-files. And I didn't know that regular expressions was such a bad
    >idea when parsing XML. Now I know, and now I will build a solution
    >using regular expressions for HTML and an XML-parser for the XML-files.


    Using regular expressions to parse HTML is just as bad
    as using them to parse XML. HTML, after all, is just a
    subset of XML. Use the appropriate modules to parse
    HTML.

    For details on why this is a bad idea, read the FAQ:

    perldoc -q "remove HTML"

    --
    Helgi Briem hbriem AT simnet DOT is

    Never worry about anything that you see on the news.
    To get on the news it must be sufficiently rare
    that your chances of being involved are negligible!
    Helgi Briem, Sep 9, 2004
    #6
  7. Leif Wessman

    Tim Green Guest

    "Leif Wessman" <> wrote in
    news:chn2ah$:

    > Hi!
    >
    > I'm trying to parse some xml with a regular expression (yes, i know
    > that there is several XML modules that I can use).


    See <http://www.cs.sfu.ca/~cameron/REX.html>

    XML Shallow Parsing with Regular Expressions

    --
    ###### |\^/| Timothy C. Green, CD, PEng, MEng
    ###### _|\| |/|_
    ###### > < TrainsCan, Train Scan News
    ###### >_./|\._< http://www.TrainsCan.com
    Tim Green, Sep 9, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,289
  2. Bill Chiu
    Replies:
    4
    Views:
    439
    Uwe Schnitker
    Sep 12, 2003
  3. ArdGre
    Replies:
    9
    Views:
    480
    Mike Schilling
    Jan 9, 2007
  4. Erik Wasser
    Replies:
    5
    Views:
    450
    Peter J. Holzer
    Mar 5, 2006
  5. Replies:
    4
    Views:
    163
    Michele Dondi
    Dec 28, 2007
Loading...

Share This Page