getElementsByTagName, tag does not exist

Discussion in 'Perl Misc' started by hymie!, Oct 11, 2005.

  1. hymie!

    hymie! Guest

    Greetings. I'm just starting to dabble in XML, and I've run across a
    problem.

    I'm going through my XML document using this construct:

    use XML::DOM;
    my $parser = new XML::DOM::parser
    or bail ("Unable to create XML parser");
    my $story = $parser->parse($data);
    $out{"SOURCE"} = $story->getElementsByTagName("Source")->
    item(0)-> getFirstChild->getData;
    $out{"DATE"} = $story->getElementsByTagName("Publication_Date")->
    item(0)-> getFirstChild->getData;
    $out{"TEXT"} = $story->getElementsByTagName("Body_Text")->
    item(0)-> getFirstChild->getData or die "$!";

    Everything works fine, until I get to a $story where Body_Text doesn't
    exist. I've looked through all of the XML::DOM docs that I can find,
    but I can't find either a way to test if Body_Text exists, or what happens
    when Body_Text doesn't exist. The script stops with no obvious
    diagnostic output -- it appears that the "die" never happens.

    Can somebody show me the light?

    hymie! http://www.smart.net/~hymowitz hymie_@_lactose.homelinux.net
    ===============================================================================
    I've got an answer. I'm going to fly away. What have I got to lose?
    --Crosby, Stills, and Nash
    ===============================================================================
    hymie!, Oct 11, 2005
    #1
    1. Advertising

  2. hymie!

    Matt Garrish Guest

    "hymie!" <hymie_@_lactose.homelinux.net> wrote in message
    news:...
    > Greetings. I'm just starting to dabble in XML, and I've run across a
    > problem.
    >
    > I'm going through my XML document using this construct:
    >
    > use XML::DOM;
    > my $parser = new XML::DOM::parser
    > or bail ("Unable to create XML parser");
    > my $story = $parser->parse($data);
    > $out{"SOURCE"} = $story->getElementsByTagName("Source")->
    > item(0)-> getFirstChild->getData;
    > $out{"DATE"} = $story->getElementsByTagName("Publication_Date")->
    > item(0)-> getFirstChild->getData;
    > $out{"TEXT"} = $story->getElementsByTagName("Body_Text")->
    > item(0)-> getFirstChild->getData or die "$!";
    >
    > Everything works fine, until I get to a $story where Body_Text doesn't
    > exist. I've looked through all of the XML::DOM docs that I can find,
    > but I can't find either a way to test if Body_Text exists, or what happens
    > when Body_Text doesn't exist. The script stops with no obvious
    > diagnostic output -- it appears that the "die" never happens.
    >


    It's pretty clear from the docs that getElementsByTagName returns an array
    containing all the nodes found, so why don't you just check whether there
    are any nodes? As per the documentation:

    [untested]

    my $nodes = $story->getElementsByTagName('Body_Text');
    unless ($nodes->getLength > 0) {
    print "Sorry, no nodes!\n"
    }

    I would recommend XML-Libxml as a better alternative to XML-DOM, though.

    Matt
    Matt Garrish, Oct 12, 2005
    #2
    1. Advertising

  3. hymie!

    Guest

    On Tue, 11 Oct 2005 10:28:01 -0500, hymie_@_lactose.homelinux.net
    (hymie!) wrote:

    >Greetings. I'm just starting to dabble in XML, and I've run across a
    >problem.
    >
    >I'm going through my XML document using this construct:
    >
    >use XML::DOM;
    >my $parser = new XML::DOM::parser
    > or bail ("Unable to create XML parser");
    >my $story = $parser->parse($data);
    >$out{"SOURCE"} = $story->getElementsByTagName("Source")->
    > item(0)-> getFirstChild->getData;
    >$out{"DATE"} = $story->getElementsByTagName("Publication_Date")->
    > item(0)-> getFirstChild->getData;
    >$out{"TEXT"} = $story->getElementsByTagName("Body_Text")->
    > item(0)-> getFirstChild->getData or die "$!";
    >
    >Everything works fine, until I get to a $story where Body_Text doesn't
    >exist. I've looked through all of the XML::DOM docs that I can find,
    >but I can't find either a way to test if Body_Text exists, or what happens
    >when Body_Text doesn't exist. The script stops with no obvious
    >diagnostic output -- it appears that the "die" never happens.
    >
    >Can somebody show me the light?
    >
    >hymie! http://www.smart.net/~hymowitz hymie_@_lactose.homelinux.net
    >===============================================================================
    >I've got an answer. I'm going to fly away. What have I got to lose?
    > --Crosby, Stills, and Nash
    >===============================================================================

    The alternative to DOM is SAX, widely used in modern code.
    Its basically a simple event driven model, calling handlers
    when the basic structured xml components are encountered. This
    allows you to control going from xml to internal data structures
    and/or back out to xml. Expat provides hooking handlers to most
    of the current W3c constructs. These are just the basic ones.
    Its up to you to extract the data into internal structures.
    For that XML:Simple is a good tool. With Expat you can accumulate
    nested data in a single string. Then Simple will create nested
    Perl structures using tag names. Then you can Dumper it.
    But, nobody uses Xml that doesen't know ahead of time what those
    structures are both out and in. This is a way to control/validate/
    populate them. SAX gives you a much simpler model and allows
    much better control of the data. If you need more information
    let me know. Getting Xerces working is a chore (you could do
    without it for now, its only being used for schema checking here).
    This code chunk sample is from 7,000 line code I wrote that was
    converted
    to a binary with Perl2Exe (including Xerces). I've chopped it up,
    you can't see or know what it does so it will look nasty but
    all the clues are there for you to investigate SAX and thats enough.
    -gluck


    ---
    This code is chopped out of a large practical xml code base and is NOT
    cut & paste workable. Its just for instructional purposes
    for the poster to give a flavor of SAX: Simple Api Xml.

    use XML::Xerces;

    use XML::parser::Expat;
    use XML::Simple;

    ## main
    {
    ## Initialize program / build list of xml files (ie: glob)
    for (@XmlFiles)
    {
    /.+$dlimsep(.+)$/; (defined elsewhere for win/unix os)
    $XML_File = $1;
    Log ($XML_File);

    ## Validate Schema with Xerces
    ## note: Xerces is being used for schema validation
    and
    ## as backup xml integrity (done elsewhere)
    next if (!ValidateSchema ($_));

    if (!open(SAMP, $_)) {
    Log (...);
    next;
    }

    ## Parse xml and integrity check (Expat-SAX)
    my $parser = new XML::parser::Expat;
    $parser->setHandlers('Start' => \&stag_h,
    'End' => \&etag_h,
    'Char' => \&cdata_h);
    $parser->setHandlers('Comment' => \&comment_h) if
    ($hVars{'CommentLogging'});

    eval {$parser->parse(*SAMP)};
    if ($@) {
    ## xml integrity failed -log this error
    $@ =~ s/^[\x20\n\t]+//; $@ =~
    s/[\x20\n\t]+$//;
    # attempt strip off program line,col info at
    end
    $@ =~ s/(at line [0-9]+,.+)?at .+ line
    [0-9]+$/$1/;
    Log (...error...);
    }
    close(SAMP);
    $parser->release;
    }
    }

    ########################################################
    # EXPAT Event Handlers - start/end/content (defaults)
    ########################################################
    ##
    sub stag_h # -- Start Tag --
    {
    my ($p, $element, %atts) = @_;
    $element = uc($element);
    $last_content = '';
    $last_syntax_content = '';

    ## -- construct & Print start tag --
    my $tag = "\<$element\>";
    if ($XML_PRINT) {
    printf ("%3d", $p->current_line);
    print get_indent();
    print "$tag";
    print " Attr" if (keys %atts);
    foreach my $key (keys %atts) {
    print ", $key=".$atts{$key};
    }
    print "\n";
    }
    $tab_lev++;

    ## -- set Detached special content handler --
    if (exists ($Content_hash{$element}) &&
    $Content_hash{$element}->[1]) {
    $p->setHandlers('Char' =>
    $Content_hash{$element}->[0]);
    }

    ## do something with attributes
    ## start keying (populating) your data structures
    ## set flags, etc ...
    }

    ##
    ##
    sub etag_h # -- End Tag --
    {
    my ($p, $element) = @_;
    $element = uc($element);

    ## -- Construct & Print end tag --
    my $tag = "\</$element\>";
    $tab_lev--;
    if ($XML_PRINT) {
    printf ("%3d", $p->current_line);
    print get_indent();
    print "$tag\n";
    }
    ## -- store last Content in hash (do more stuff)

    ## then:
    $last_content = '';

    ## -- Restore default content handler --
    if (exists ($Content_hash{$element}) &&
    $Content_hash{$element}->[1]) {
    $p->setHandlers('Char' => \&cdata_h);
    my $last = (@Action) - 1;
    my $aref = $Action[$last];
    $Content_hash{$element}->[2]($last_syntax_content,
    $aref, $element);
    }
    }

    ##
    ##
    sub cdata_h # -- Default Content Data --
    {
    my ($p, $str) = @_;
    # use original for entities, incase reparse
    $str = $p->original_string;
    # remove leading/trailing space, newline, tab
    $str =~ s/^[\x20\n\t]+//; $str =~ s/[\x20\n\t]+$//;
    if (length ($str) > 0)
    {
    if ($XML_PRINT) {
    printf ("%3d", $p->current_line);
    print get_indent();
    print "$str (".length($str).")\n";
    }
    $last_content .= $str;
    }
    }

    ##
    ##
    sub comment_h # -- Default Comment Data --
    {
    my ($p, $str) = @_;
    # use original for entities, incase reparse
    $str = $p->original_string;
    # remove leading/trailing space, newline, tab
    $str =~ s/^[\x20\n\t]+//; $str =~ s/[\x20\n\t]+$//;
    if (length ($str) > 0)
    {
    printf (" %d,%d\n",
    $p->current_line,$p->current_column);
    }
    }

    ##
    ##
    sub cdata_x_h # -- Special Content Data --
    {
    my ($p, $str) = @_;
    cdata_h ($p, $str);
    # remove leading/trailing space, newline, tab
    $str =~ s/^[\x20\n\t]+//; $str =~ s/[\x20\n\t]+$//;
    $last_syntax_content .= $str if (length ($str) > 0);
    }

    ########################################################
    # Xerces - too much to explain
    ########################################################
    #
    sub ValidateSchema {
    my ($xfile) = @_;
    #my $valerr = 0;

    # Docs:
    http://xml.apache.org/xerces-c/apiDocs/classAbstractDOMParser.html#z869_9
    my $Xparser = XML::Xerces::XercesDOMParser->new();
    $Xparser->setValidationScheme(1);
    $Xparser->setDoNamespaces(1);
    $Xparser->setDoSchema(1);
    #$Xparser->setValidationSchemaFullChecking(1); # full
    constraint (if enabled, may be time-consuming)


    $Xparser->setExternalNoNamespaceSchemaLocation($hVdef{'Schema'});

    my $ERROR_HANDLER = XLoggingErrorHandler->new(\&LogX_warn,
    \&LogX_error, \&LogX_ferror, );
    #my $ERROR_HANDLER = XML::Xerces::perlErrorHandler->new();
    $Xparser->setErrorHandler($ERROR_HANDLER);

    # no need for eval on parse with handlers.. just insurance on
    die
    eval {$Xparser->parse
    (XML::Xerces::LocalFileInputSource->new($xfile));};
    if ($@) {
    }
    return 1;
    }

    ## handlers (alot more not shown)
    , Oct 12, 2005
    #3
  4. hymie!

    hymie! Guest

    In our last episode, the evil Dr. Lacto had captured our hero,
    "Matt Garrish" <>, who said:

    >"hymie!" <hymie_@_lactose.homelinux.net> wrote in message
    >news:...


    >> $out{"TEXT"} = $story->getElementsByTagName("Body_Text")->
    >> item(0)-> getFirstChild->getData or die "$!";
    >>
    >> Everything works fine, until I get to a $story where Body_Text doesn't
    >> exist. [...]
    >> The script stops with no obvious
    >> diagnostic output -- it appears that the "die" never happens.


    >It's pretty clear from the docs that getElementsByTagName returns an array
    >containing all the nodes found, so why don't you just check whether there
    >are any nodes? As per the documentation:


    Thanks for pointing out my mistake. My program doesn't die at the
    getElementsByTagName , it apparantly fails at the ->item(0) .

    >my $nodes = $story->getElementsByTagName('Body_Text');
    >unless ($nodes->getLength > 0) {
    > print "Sorry, no nodes!\n"
    >}


    All is happy now.

    hymie! http://www.smart.net/~hymowitz
    ===============================================================================
    My brothers and sisters all hated me, cuz I was an only child.
    --'Weird Al' Yankovic
    ===============================================================================
    hymie!, Oct 12, 2005
    #4
  5. hymie!

    Matt Garrish Guest

    <> wrote in message
    news:...
    > On Tue, 11 Oct 2005 23:11:25 -0700, wrote:


    [ TOFU corrected ]

    >>The alternative to DOM is SAX, widely used in modern code.
    >>Its basically a simple event driven model, calling handlers
    >>when the basic structured xml components are encountered. This
    >>allows you to control going from xml to internal data structures
    >>and/or back out to xml. Expat provides hooking handlers to most
    >>of the current W3c constructs. These are just the basic ones.
    >>Its up to you to extract the data into internal structures.
    >>For that XML:Simple is a good tool. With Expat you can accumulate
    >>nested data in a single string. Then Simple will create nested
    >>Perl structures using tag names. Then you can Dumper it.


    > You know, I'm gonna go one step further here and say:
    > If you use nodes your some kind of a dumb ass. Not the
    > modern thinking at all !!


    Your logic has me convinced. Oh wait, there is no logic. You do realize that
    you're just trying to create your own DOM tree by jumping through a bunch of
    hoops, right? SAX will let you do that, but I don't see how what you're
    suggesting would be an improvement over a regular DOM tree.

    SAX has its uses (especially for large documents you don't want to read into
    memory), but it's hardly a reason not to use DOM. If I need the whole
    document in memory before beginning to process, SAX is a needlessly complex
    way of doing that. If I just want to fire off events as I come across
    elements, DOM is a needlessly complex way of doing that.

    Picking one method over the other requires analyzing what your needs are.
    Simplistic statements like sax for everything aren't helpful.

    Matt
    Matt Garrish, Oct 20, 2005
    #5
  6. hymie!

    Guest

    On Thu, 20 Oct 2005 08:04:45 -0400, "Matt Garrish"
    <> wrote:

    Actually, your right! Analysis always does the trick.
    I personally would not do DOM, I want intimate controll.
    I feel I can controll the entire event driven model,
    on data large and small. Dom will fade imho, but
    you have a point..

    >
    ><> wrote in message
    >news:...
    >> On Tue, 11 Oct 2005 23:11:25 -0700, wrote:

    >
    >[ TOFU corrected ]
    >
    >>>The alternative to DOM is SAX, widely used in modern code.
    >>>Its basically a simple event driven model, calling handlers
    >>>when the basic structured xml components are encountered. This
    >>>allows you to control going from xml to internal data structures
    >>>and/or back out to xml. Expat provides hooking handlers to most
    >>>of the current W3c constructs. These are just the basic ones.
    >>>Its up to you to extract the data into internal structures.
    >>>For that XML:Simple is a good tool. With Expat you can accumulate
    >>>nested data in a single string. Then Simple will create nested
    >>>Perl structures using tag names. Then you can Dumper it.

    >
    >> You know, I'm gonna go one step further here and say:
    >> If you use nodes your some kind of a dumb ass. Not the
    >> modern thinking at all !!

    >
    >Your logic has me convinced. Oh wait, there is no logic. You do realize that
    >you're just trying to create your own DOM tree by jumping through a bunch of
    >hoops, right? SAX will let you do that, but I don't see how what you're
    >suggesting would be an improvement over a regular DOM tree.
    >
    >SAX has its uses (especially for large documents you don't want to read into
    >memory), but it's hardly a reason not to use DOM. If I need the whole
    >document in memory before beginning to process, SAX is a needlessly complex
    >way of doing that. If I just want to fire off events as I come across
    >elements, DOM is a needlessly complex way of doing that.
    >
    >Picking one method over the other requires analyzing what your needs are.
    >Simplistic statements like sax for everything aren't helpful.
    >
    >Matt
    >
    , Oct 22, 2005
    #6
  7. hymie!

    Guest

    On Thu, 20 Oct 2005 08:04:45 -0400, "Matt Garrish"

    I didn't getcha on this statement..

    <> wrote:
    >Simplistic statements like sax for everything aren't helpful.
    >
    >Matt
    >
    , Oct 22, 2005
    #7
  8. hymie!

    Matt Garrish Guest

    <> wrote in message
    news:...
    > On Thu, 20 Oct 2005 08:04:45 -0400, "Matt Garrish"
    > <> wrote:
    >>Simplistic statements like sax for everything aren't helpful.

    >
    > I didn't getcha on this statement..
    >


    You only talk about DOM and SAX in your original post (and being the two
    most common and supported parsers, that's not surprising). By following up
    and saying you'll never use nodes again you're implying that SAX is the only
    way to go. That's all that was meant.

    Matt
    Matt Garrish, Oct 22, 2005
    #8
  9. hymie!

    Guest

    On Sat, 22 Oct 2005 09:57:21 -0400, "Matt Garrish"
    <> wrote:
    Your right Matt. I sometimes make those comments on
    Friday nights after the pub. Dissregard those,
    and sorry for them. But I am usually coherent on here
    and am interrested in technical development.
    I mean no harm. - thanks!
    >
    ><> wrote in message
    >news:...
    >> On Thu, 20 Oct 2005 08:04:45 -0400, "Matt Garrish"
    >> <> wrote:
    >>>Simplistic statements like sax for everything aren't helpful.

    >>
    >> I didn't getcha on this statement..
    >>

    >
    >You only talk about DOM and SAX in your original post (and being the two
    >most common and supported parsers, that's not surprising). By following up
    >and saying you'll never use nodes again you're implying that SAX is the only
    >way to go. That's all that was meant.
    >
    >Matt
    >
    , Oct 23, 2005
    #9
  10. hymie!

    Guest

    On Sun, 23 Oct 2005 15:04:54 -0700, wrote:
    Actually, DOM is exponentionaly irrelavent in modern xml parsing.
    Its use is fadinding like an old 59 Buick in 1965.
    >On Sat, 22 Oct 2005 09:57:21 -0400, "Matt Garrish"
    ><> wrote:
    >Your right Matt. I sometimes make those comments on
    >Friday nights after the pub. Dissregard those,
    >and sorry for them. But I am usually coherent on here
    >and am interrested in technical development.
    >I mean no harm. - thanks!
    >>
    >><> wrote in message
    >>news:...
    >>> On Thu, 20 Oct 2005 08:04:45 -0400, "Matt Garrish"
    >>> <> wrote:
    >>>>Simplistic statements like sax for everything aren't helpful.
    >>>
    >>> I didn't getcha on this statement..
    >>>

    >>
    >>You only talk about DOM and SAX in your original post (and being the two
    >>most common and supported parsers, that's not surprising). By following up
    >>and saying you'll never use nodes again you're implying that SAX is the only
    >>way to go. That's all that was meant.
    >>
    >>Matt
    >>
    , Oct 26, 2005
    #10
  11. hymie!

    Matt Garrish Guest

    <> wrote in message
    news:...
    >
    > Actually, DOM is exponentionaly irrelavent in modern xml parsing.
    > Its use is fadinding like an old 59 Buick in 1965.
    >


    I get the impression you just like to hear yourself talk, so I'm not even
    going to bother arguing this. I would suggest in the future you not get
    carried away by the latest fad technology you just googled.

    Matt
    Matt Garrish, Oct 27, 2005
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bill Johnson
    Replies:
    0
    Views:
    1,216
    Bill Johnson
    Jul 8, 2005
  2. shruds
    Replies:
    1
    Views:
    707
    John C. Bollinger
    Jan 27, 2006
  3. LT
    Replies:
    7
    Views:
    2,070
    Phlip
    Jul 25, 2004
  4. André
    Replies:
    2
    Views:
    679
    André
    Jun 23, 2008
  5. Bill Johnson

    CS0234 Global does not exist ... but it genuinely does

    Bill Johnson, Jul 8, 2005, in forum: ASP .Net Datagrid Control
    Replies:
    0
    Views:
    178
    Bill Johnson
    Jul 8, 2005
Loading...

Share This Page