NITF: cant load objDOM because of HTML-entities

Discussion in 'XML' started by Ragnar Heil, Nov 18, 2004.

  1. Ragnar Heil

    Ragnar Heil Guest

    Hi,

    I am receiving news from a press-agency in NITF-XML.
    Then I want to import them into my CMS using XML&SOAP.
    The import-tool runs fine if I have got an xml-document with real
    German special characters, not HTML entities.

    Unfortunately I receive the news with entities and get this error
    (translate from German):
    Parse Error in input XML file: Reference to a not definded entity
    'auml'.

    my code:
    Set objDom = CreateObject("MSXML2.DOMDocument.3.0")
    objDom.setProperty "SelectionLanguage", "XPath"
    objDom.async = False objDom.setProperty "SelectionNamespaces",
    "xmlns:tcmapi='http://www.tridion.com/ContentManager/5.0/TCMAPI'"
    objDom.Load (strFilePath & strXmlFileName)
    If Not objDom.parseError.reason = "" Then
    WriteToLog "Parse Error in input XML file: " &
    objDom.parseError.reason
    End If

    thanks for your help!
    Ragnar
    Ragnar Heil, Nov 18, 2004
    #1
    1. Advertising

  2. Ragnar Heil wrote:


    > I am receiving news from a press-agency in NITF-XML.
    > Then I want to import them into my CMS using XML&SOAP.
    > The import-tool runs fine if I have got an xml-document with real
    > German special characters, not HTML entities.
    >
    > Unfortunately I receive the news with entities and get this error
    > (translate from German):
    > Parse Error in input XML file: Reference to a not definded entity
    > 'auml'.
    >
    > my code:
    > Set objDom = CreateObject("MSXML2.DOMDocument.3.0")
    > objDom.setProperty "SelectionLanguage", "XPath"
    > objDom.async = False objDom.setProperty "SelectionNamespaces",
    > "xmlns:tcmapi='http://www.tridion.com/ContentManager/5.0/TCMAPI'"
    > objDom.Load (strFilePath & strXmlFileName)
    > If Not objDom.parseError.reason = "" Then
    > WriteToLog "Parse Error in input XML file: " &
    > objDom.parseError.reason
    > End If


    Well if an XML document uses entity references those entities need to be
    defined thus if @auml; is used there needs to be an entity declaration
    in the document type definition that declares the entity, otherwise the
    XML is not well-formed.

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Nov 18, 2004
    #2
    1. Advertising

  3. Ragnar Heil

    Ragnar Heil Guest

    Martin Honnen <> wrote in news:419ce15d$0$28979$9b4e6d93
    @newsread4.arcor-online.net:

    > Well if an XML document uses entity references those entities need to be
    > defined thus if @auml; is used there needs to be an entity declaration
    > in the document type definition that declares the entity, otherwise the
    > XML is not well-formed.


    Hi Martin,

    now I have seen that this thread talks about a similar issue
    Subject: XML: "undefined entity"
    news:cnifpk$22e$

    yes, you are right, entity references have to be defined in the DTD like
    <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">

    I am really wondering why the NITF-files have no reference to a DTD.
    I could modify the NITF.dtd on our server but not the incoming files.
    Would you do it? take the incoming files and add a DTD-reference to them?
    Then I also can do another way of hacking and replace all entities with the
    real special characters (Umlaute).
    Ragnar Heil, Nov 18, 2004
    #3
  4. Ragnar Heil wrote:
    > I am receiving news from a press-agency in NITF-XML.
    > Then I want to import them into my CMS using XML&SOAP.
    > The import-tool runs fine if I have got an xml-document with real
    > German special characters, not HTML entities.
    >
    > Unfortunately I receive the news with entities


    Tell the press agency to send XML:
    a) use characters directly with the appropriat encoding, or
    b) use numerical references (e.g. ΓΌ for german u umlaut).
    and to add a document type declaration.

    If you have a contract with them to get NITF-XML, they have to fulfill
    their part (send NITF-XML and not some code that looks like XML).
    --
    Johannes Koch
    In te domine speravi; non confundar in aeternum.
    (Te Deum, 4th cent.)
    Johannes Koch, Nov 19, 2004
    #4
  5. Ragnar Heil wrote:


    > I am really wondering why the NITF-files have no reference to a DTD.
    > I could modify the NITF.dtd on our server but not the incoming files.
    > Would you do it? take the incoming files and add a DTD-reference to them?


    If someone tells you that he is going to provide XML and it is not XML
    then you should probably insist that XML is being sent and not something
    that fullfills some rules of XML but not others. Otherwise you are
    forced to fix their not well-formed markup and as you can't use existing
    XML parsers to that you are left with some text processing.

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Nov 19, 2004
    #5
  6. Ragnar Heil

    Ragnar Heil Guest

    Johannes Koch <> wrote in news:305s27F2tclrsU1@uni-
    berlin.de:

    > If you have a contract with them to get NITF-XML, they have to fulfill
    > their part (send NITF-XML and not some code that looks like XML).


    HI Johannes and Martin,

    now I talked to a technical person from the press agency.
    They are aware that their NITF-xml-documents are not valid and wellformed
    :-(

    Now I am thinking of ways how to load the news-file into my objDOM without
    getting an error message from the parser which checks the validation


    Ragnar
    Ragnar Heil, Nov 19, 2004
    #6
  7. Ragnar Heil wrote:
    > now I talked to a technical person from the press agency.
    > They are aware that their NITF-xml-documents are not valid and wellformed
    > :-(


    And they don't want to change it?
    --
    Johannes Koch
    In te domine speravi; non confundar in aeternum.
    (Te Deum, 4th cent.)
    Johannes Koch, Nov 19, 2004
    #7
  8. Ragnar Heil

    Ragnar Heil Guest

    Johannes Koch <> wrote in
    news::

    > And they don't want to change it?


    well, I am going to mention this to DPA ;-)

    Are you aware of any tools which convert files with entities to files with
    Umlaute?


    Ragnar
    Ragnar Heil, Nov 19, 2004
    #8
  9. Ragnar Heil wrote:
    > well, I am going to mention this to DPA ;-)


    Good luck :)

    > Are you aware of any tools which convert files with entities to files with
    > Umlaute?


    Maybe, recode can do this.
    --
    Johannes Koch
    In te domine speravi; non confundar in aeternum.
    (Te Deum, 4th cent.)
    Johannes Koch, Nov 19, 2004
    #9
  10. Ragnar Heil

    Andy Dingley Guest

    On 18 Nov 2004 09:09:44 -0800, (Ragnar Heil) wrote:

    >I am receiving news from a press-agency in NITF-XML.


    Most (some ? / many ? / nearly all ?) NITF / NewsML / RSS feeds become
    invalid whenever they encounters an accented character. You have no
    practical hope of fixing this, because the organisations are beyond
    your control and you really just have to deal with the garbage they're
    sending you. Raise the issue with them, complain as loudly as you
    can, but don't expect them to fix it.

    I use some very ugly pre-processor code before the parser. If the
    first parse attempt fails for this reason, I re-try with a version
    that has had a reference to an appropriate local DTD added to it.

    --
    Smert' spamionam
    Andy Dingley, Nov 22, 2004
    #10
  11. Ragnar Heil

    Ragnar Heil Guest

    Johannes Koch <> wrote in
    news::

    > Maybe, recode can do this.


    Now I am using SED which works fine. I also had HTMLTidy running, same
    positive results


    Ragnar
    Ragnar Heil, Nov 23, 2004
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mike R
    Replies:
    0
    Views:
    536
    Mike R
    Apr 20, 2004
  2. Nina

    NITF to HTML how?

    Nina, Apr 23, 2004, in forum: XML
    Replies:
    3
    Views:
    854
    Patrick O'Lone
    May 15, 2004
  3. Robert Oschler
    Replies:
    8
    Views:
    730
    Christopher T King
    Jul 31, 2004
  4. Nagaraj
    Replies:
    1
    Views:
    840
    Lionel B
    Mar 1, 2007
  5. Jim Higson
    Replies:
    3
    Views:
    215
    Eric Amick
    Jul 25, 2004
Loading...

Share This Page