Re: ignoring namespaces?

Discussion in 'XML' started by Joe Kesselman, Jun 4, 2010.

  1. > So - using XML::LibXML, is there a way
    > of using XPaths, without namespaces?


    Can't vouch for that tool.

    You can, if you insist on doing so, write XPaths which are specifically
    testing the localname rather than the qualified name
    /*[localname()="foo"]/@*[localname()="bar"]
    though in some processors the performance of this variant will be
    inferior to the proper namespace-aware path. And of course the increased
    verbosity makes it harder to write, harder to read, and harder to maintain.

    If at all possible, I really recommend hammering on people to fix the
    documents and use namespaces correctly. This will continue to cause
    problems, and not every XML tool will let you construct this sort of
    workaround. You can pay the cost to fix them now, or you can wait and
    fix them in a complete panic (probably at greater cost) later.

    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
     
    Joe Kesselman, Jun 4, 2010
    #1
    1. Advertising

  2. bugbear wrote:
    >> If at all possible, I really recommend hammering on people to fix the
    >> documents and use namespaces correctly.

    >
    > Too late. Legacy applications and legacy files make this impossible.


    Understood. As I say, that's going to continue to add to their costs in
    the future, but if they can't/won't get everything fixed now, that's
    their choice.

    "The customer is not always right. The customer is the one with the
    money. Sometimes you have to choose between being right and getting the
    money."

    (This is one reason for always having file formats -- in XML or any
    other representation -- carry version numbers. That gives you some hope
    of being able to recognize newer data, and process it more efficiently,
    while still supporting the "quirks mode" needed by older/sloppier
    instances.)

    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
     
    Joe Kesselman, Jun 5, 2010
    #2
    1. Advertising

  3. Joe Kesselman

    Peter Flynn Guest

    bugbear wrote:
    [...]
    > I also considered walking the entire tree REMOVING namespaces,
    > but that doesn't sound like a high performance solution.


    sed -e "s+<\([/]*\)\([^:]*:\)+<\1+g" ?

    ///Peter
     
    Peter Flynn, Jun 6, 2010
    #3
  4. Joe Kesselman

    P. Lepin Guest

    Peter Flynn wrote:
    > bugbear wrote:
    > [...]
    >> I also considered walking the entire tree REMOVING namespaces,
    >> but that doesn't sound like a high performance solution.

    >
    > sed -e "s+<\([/]*\)\([^:]*:\)+<\1+g" ?


    Haven't posted anything for a long while, but I cannot keep quiet after
    seeing this.

    That's barbarous, sir! Just barbarous!

    (smileys implied)

    --
    P. Lepin
     
    P. Lepin, Jun 7, 2010
    #4
  5. Joe Kesselman

    Peter Flynn Guest

    bugbear wrote:
    > Peter Flynn wrote:
    >> bugbear wrote:
    >> [...]
    >>> I also considered walking the entire tree REMOVING namespaces,
    >>> but that doesn't sound like a high performance solution.

    >>
    >> sed -e "s+<\([/]*\)\([^:]*:\)+<\1+g" ?

    >
    > Given that my problem (corrupt data) cannot be solved by a "squeaky
    > clean" solution (*), that's strangely appealing.


    I always counsel to avoid the non-XML approach because it carries no
    guarantee that the object you elect to operate on is actually what you
    think it is.

    (At least, a formal XML method like XSLT/XPath doesn't have any
    "guarantee" as such, but at least I can be reasonably certain that if I
    select the fifth paragraph of section 4 of chapter 6, then that is what
    I will get, leaving aside my own programming errors.)

    But there are times (and invalid XML is one of them) when a combination
    of sed, awk, grep, tr, and the rest if the tribe, including Perl, Emacs,
    Python, and your own personal favourite, are the only viable solution.

    sed has the advantage and disadvantage of being spectacularly fast: get
    it wrong and it will eat your data. Properly tested, however, the above
    will remove all namespace prefixes to element type names within the
    document element. It will not remove the xmlns:* namespace binding
    attributes from the root element start-tag, nor will it remove
    namespaces prefixes from attributes anywhere (the addition of more REs,
    alternations, subexpressions, and backreferences to achieve this is left
    as an exercise to the reader :). Because it is unparsed, it *will*
    remove the namespace prefixes from examples of XML markup in CDATA
    marked sections in documentation, for example.

    P. Lepin wrote:
    > Peter Flynn wrote:
    >> bugbear wrote:
    >> [...]
    >>> I also considered walking the entire tree REMOVING namespaces,
    >>> but that doesn't sound like a high performance solution.

    >> sed -e "s+<\([/]*\)\([^:]*:\)+<\1+g" ?

    >
    > Haven't posted anything for a long while, but I cannot keep quiet
    > after seeing this.
    >
    > That's barbarous, sir! Just barbarous!
    > (smileys implied)


    Peh. I have seen *far* worse [better], both in the Humanities and the
    Natural Sciences, trying to coerce evilly-formed documents into XML :)

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
     
    Peter Flynn, Jun 7, 2010
    #5
  6. On Mon, 07 Jun 2010 09:55:07 +0100, bugbear wrote:

    > Peter Flynn wrote:
    >> bugbear wrote:
    >> [...]
    >>> I also considered walking the entire tree REMOVING namespaces, but
    >>> that doesn't sound like a high performance solution.

    >>
    >> sed -e "s+<\([/]*\)\([^:]*:\)+<\1+g" ?

    >
    > Given that my problem
    > (corrupt data) cannot be solved
    > by a "squeaky clean" solution (*),
    > that's strangely appealing.


    It is also very error prone, but may be acceptable. To improve on the
    above solution, do split it in two steps. First step, a custom program
    (instead of sed) cleans up the files and produces clean files without
    namespaces, second step program(s) processes those clean files.

    By creating a separate program for the first step, you can have it do
    checks to see if the output it produces is sensible and die (to let you
    investigate the problem) if it is not.

    After cleaning the files, all programs that process them (second step)
    don't have to carry convoluted logic to deal with the dirty files).

    M4
     
    Martijn Lievaart, Jun 7, 2010
    #6
  7. Joe Kesselman

    Guest

    On Mon, 07 Jun 2010 12:32:15 +0100, Peter Flynn <> wrote:

    >bugbear wrote:
    >> Peter Flynn wrote:
    >>> bugbear wrote:
    >>> [...]
    >>>> I also considered walking the entire tree REMOVING namespaces,
    >>>> but that doesn't sound like a high performance solution.
    >>>
    >>> sed -e "s+<\([/]*\)\([^:]*:\)+<\1+g" ?

    >>
    >> Given that my problem (corrupt data) cannot be solved by a "squeaky
    >> clean" solution (*), that's strangely appealing.

    >
    >I always counsel to avoid the non-XML approach because it carries no
    >guarantee that the object you elect to operate on is actually what you
    >think it is.
    >
    >(At least, a formal XML method like XSLT/XPath doesn't have any
    >"guarantee" as such, but at least I can be reasonably certain that if I
    >select the fifth paragraph of section 4 of chapter 6, then that is what
    >I will get, leaving aside my own programming errors.)
    >
    >But there are times (and invalid XML is one of them) when a combination
    >of sed, awk, grep, tr, and the rest if the tribe, including Perl, Emacs,
    >Python, and your own personal favourite, are the only viable solution.
    >
    >sed has the advantage and disadvantage of being spectacularly fast: get
    >it wrong and it will eat your data. Properly tested, however, the above
    >will remove all namespace prefixes to element type names within the
    >document element. It will not remove the xmlns:* namespace binding
    >attributes from the root element start-tag, nor will it remove
    >namespaces prefixes from attributes anywhere (the addition of more REs,
    >alternations, subexpressions, and backreferences to achieve this is left
    >as an exercise to the reader :). Because it is unparsed, it *will*
    >remove the namespace prefixes from examples of XML markup in CDATA
    >marked sections in documentation, for example.
    >


    This might parse it (with a slight bit of validation)
    using regex, while changing just specific parts of the source xml
    dealing with namespace in tags and/or attributes.

    -sln

    # -----------------------------------------------------------
    # rx_xml_fixnamespace.pl
    # -sln, 6/7/2010
    #
    # Util to search/replace xml namespace from tags/attributes
    # -----------------------------------------------------------

    use strict;
    use warnings;

    ## Initialization
    ##

    my $Name = "[A-Za-z_:][\\w:.-]*";
    my $SkipName = "[A-Za-z_][\\w.-]*";
    my $rxskip_tag = "(?: $SkipName )"; # Skip tags
    my $rxskip_attr = "(?: $SkipName )"; # Skip attribute's
    my $rxtag = "(?: $Name )"; # Tags
    my $rxattr = "(?: $Name )"; # Attribute's


    use re 'eval';
    my $topen = 0;

    my $Rxmarkup = qr
    {
    (?(?{$topen}) # Begin Conditional

    # Have open <TAG> ?
    (?:
    # Try to match next attribute
    (?:
    \s*=\s* (?:".*?"|'.*?') \K
    |
    \s* (?<=\s)
    (?: $rxskip_attr \K | \K (?<ATTR> $rxattr) )
    (?= \s*=\s* (?:".*?"|'.*?'))
    )
    (?= [^>]*? \s* /? > )
    |
    # No more attr's
    (?{$topen = 0})
    )
    |
    # Look for new open or close <TAG>
    (?:
    [^<]*
    (?:
    # Things that hide markup:
    # - Comments/CDATA
    (?: <!
    (?:
    \[CDATA\[.*?\]\]
    | --.*?--
    | \[[A-Z][A-Z\ ]*\[.*?\]\]
    )
    > \K

    )
    |
    # Specific markup we seek:
    # - TAG
    <
    (?:
    /* $rxskip_tag \K (?= \s* /* >)
    |
    /* \K (?<TAG> $rxtag ) (?= \s* /* >)
    |
    (?: $rxskip_tag \K | \K (?<TAG> $rxtag ) )
    (?= \s [^>]*? \s* /? > )
    (?{$topen = 1})
    )
    )
    |
    < \K
    )
    ) # End Conditional
    }xs;

    ## Code
    ##

    my $xml = join '', <DATA>;
    $xml =~ s/$Rxmarkup/ fixnamespace( $+{TAG}, $+{ATTR} ) /eg;
    print "\n",$xml;

    exit (0);


    ## Subs
    ##

    sub fixnamespace {

    if (defined $_[0]) {
    my $tag = $_[0];
    if ($tag =~ s/^[^:]*://) {
    print "Replaced\t$_[0]\n with \t$tag\n";
    }
    return $tag;
    }
    if (defined $_[1]) {
    my $attr = $_[1];
    if ($attr =~ s/^[^:]*://) {
    print "Replaced\t$_[1]\n with \t$attr\n";
    }
    return $attr;
    }
    return "";
    }


    __DATA__

    <?xml version="1.0" encoding="UTF-8" standalone="no" ?>

    <Profile xmlns="xxxxxxxxx" name="" version="1.1" xmlns:xsi="http://
    www.w3.org/2001/XMLSchema-instance" junk="">

    <monday:Application Name="App1" Id="/Local/App/App1"
    Id2="/Local/App/App2" services="1" policy=""
    StartApp="" Bal="5" sessInt="500" WaterMark="1.0"/>

    <AppProfileGuid>586e3456dt</AppProfileGuid>

    </Profile>

    <Application
    Name="App99" Id='/Dummy/Test/iii' Services="3"
    policy="99" monday:StartApp="2" Bal="7" sessInt="27"
    tuesday:WaterMark="4.3" />

    <wednesday:Application Id="/testing"
    Name="App100" monday:Id="/Dum
    my/Test/iii
    " Services="4"
    policy="99" StartApp="2" Bal="7" sessInt="27"
    WaterMark="4.3"/>

    <Application
    Name="Yyee" Id="/Dat/Inp/Out" Services="5"
    policy="88" StartApp="" Bal="1" sessInt="8"
    thrusday:WaterMark="2.1"/>

    <![CDATA[ <Applic:ation Name="App" Id=""/> ]]>

    <AppProfile:Guid>586e3456dt</AppProfile:Guid>
    <AppProfile:Guid>a46y2hktt7</AppProfile:Guid>
    <AppProfile:Guid>mi6j77mae6</AppProfile:Guid>
    </Profile>
     
    , Jun 7, 2010
    #7
  8. Joe Kesselman

    Peter Flynn Guest

    wrote:
    > On Mon, 07 Jun 2010 12:32:15 +0100, Peter Flynn <> wrote:

    [...]
    >> I always counsel to avoid the non-XML approach

    [...]
    > This might parse it (with a slight bit of validation)


    It occurs to me that you can combine both methods, iff the document is
    well-formed.

    Run onsgmls -wxml /usr/share/xml/declaration/xml.dcl doc.xml >doc.esis
    to get the ESIS, and then tweak the W3C's esis2xml.py script to re-form
    the XML document, omitting the namespaces. Or write your own in Perl...

    ///Peter
     
    Peter Flynn, Jun 9, 2010
    #8
  9. Joe Kesselman

    Guest

    On Wed, 09 Jun 2010 22:02:32 +0100, Peter Flynn <> wrote:

    > wrote:
    >> On Mon, 07 Jun 2010 12:32:15 +0100, Peter Flynn <> wrote:

    >[...]
    >>> I always counsel to avoid the non-XML approach

    >[...]
    >> This might parse it (with a slight bit of validation)

    >
    >It occurs to me that you can combine both methods, iff the document is
    >well-formed.
    >
    >Run onsgmls -wxml /usr/share/xml/declaration/xml.dcl doc.xml >doc.esis
    >to get the ESIS, and then tweak the W3C's esis2xml.py script to re-form
    >the XML document, omitting the namespaces. Or write your own in Perl...
    >
    >///Peter


    Hey thanks!
     
    , Jun 12, 2010
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. GIMME
    Replies:
    2
    Views:
    893
    GIMME
    Feb 11, 2004
  2. Jonathan Hollinger

    ASP.NET Processor Ignoring DataLists?

    Jonathan Hollinger, Aug 18, 2003, in forum: ASP .Net
    Replies:
    3
    Views:
    412
    Ken Cox [Microsoft MVP]
    Aug 19, 2003
  3. Ron Icard

    ASP.NET ignoring all breakpoints

    Ron Icard, Aug 22, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    449
    Ron Icard
    Aug 22, 2003
  4. Pete
    Replies:
    2
    Views:
    1,372
  5. dmtr
    Replies:
    10
    Views:
    2,967
Loading...

Share This Page