Is it possible with xerces ?

Discussion in 'XML' started by Manuel Yguel, Feb 18, 2004.

  1. Manuel Yguel

    Manuel Yguel Guest

    I try to parse an indented xml file with dom xerces c++.
    the file is like that :
    <root>
    <child1>
    <field1> foo </field1>
    <field2> bar </field2>
    </child1>
    <child2>
    <field1> foo </field1>
    <field2> bar </field2>
    </child2>
    </root>

    where return an white spaces are in the xml file. So the program I
    writed with dom give me this tree :
    root has five childs :
    text-node child1 text-node child2 text-node

    the text of the first text-node is "\n "
    the text of the second text-node is "\n "
    the text of the third text-node is "\n"

    these text-node of spaces occurs at each step in the tree hierarchy.

    Is it possible to strip these nodes automatically ?

    XML standard question : does this xml code respects the xml standard ?

    <child2> some text
    <field1> foo </field1>
    <field2> bar </field2>
    </child2>

    "some text" is in the same depth of field1 and field2 but is a text. So
    there is a soap of text and element. I thougth that the text must be a
    leaf of the tree ... So does it respects the standard ?

    Thanks
     
    Manuel Yguel, Feb 18, 2004
    #1
    1. Advertising

  2. Manuel Yguel wrote:
    > I try to parse an indented xml file with dom xerces c++.
    > the file is like that :
    > <root>
    > <child1>
    > <field1> foo </field1>
    > <field2> bar </field2>
    > </child1>
    > <child2>
    > <field1> foo </field1>
    > <field2> bar </field2>
    > </child2>
    > </root>
    >
    > where return an white spaces are in the xml file. So the program I
    > writed with dom give me this tree :
    > root has five childs :
    > text-node child1 text-node child2 text-node
    >
    > the text of the first text-node is "\n "
    > the text of the second text-node is "\n "
    > the text of the third text-node is "\n"
    >
    > these text-node of spaces occurs at each step in the tree hierarchy.
    >
    > Is it possible to strip these nodes automatically ?


    yes : there is an option that allows to strip ignorable whitespaces, but
    you must give a grammar that defines where are ignorable whitespaces,
    like this :

    <!ELEMENT root (child1,child2)>

    >
    > XML standard question : does this xml code respects the xml standard ?
    >
    > <child2> some text
    > <field1> foo </field1>
    > <field2> bar </field2>
    > </child2>
    >
    > "some text" is in the same depth of field1 and field2 but is a text. So
    > there is a soap of text and element. I thougth that the text must be a
    > leaf of the tree ... So does it respects the standard ?


    yes : an element may contain :
    -nothing (empty element)
    -subelements
    -text
    -text and subelements

    >
    > Thanks
    >



    --
    Cordialement,

    ///
    (. .)
    -----ooO--(_)--Ooo-----
    | Philippe Poulard |
    -----------------------
     
    Philippe Poulard, Feb 18, 2004
    #2
    1. Advertising

  3. Manuel Yguel

    Manuel Yguel Guest

    Philippe Poulard wrote:
    > Manuel Yguel wrote:
    >
    >> I try to parse an indented xml file with dom xerces c++.
    >> the file is like that :
    >> <root>
    >> <child1>
    >> <field1> foo </field1>
    >> <field2> bar </field2>
    >> </child1>
    >> <child2>
    >> <field1> foo </field1>
    >> <field2> bar </field2>
    >> </child2>
    >> </root>
    >>
    >> where return an white spaces are in the xml file. So the program I
    >> writed with dom give me this tree :
    >> root has five childs :
    >> text-node child1 text-node child2 text-node
    >>
    >> the text of the first text-node is "\n "
    >> the text of the second text-node is "\n "
    >> the text of the third text-node is "\n"
    >>
    >> these text-node of spaces occurs at each step in the tree hierarchy.
    >>
    >> Is it possible to strip these nodes automatically ?

    >
    >
    > yes : there is an option that allows to strip ignorable whitespaces, but
    > you must give a grammar that defines where are ignorable whitespaces,
    > like this :
    >
    > <!ELEMENT root (child1,child2)>
    >

    thanks, but after how do you use the grammar with the parser ?
    >>
    >> XML standard question : does this xml code respects the xml standard ?
    >>
    >> <child2> some text
    >> <field1> foo </field1>
    >> <field2> bar </field2>
    >> </child2>
    >>
    >> "some text" is in the same depth of field1 and field2 but is a text.
    >> So there is a soap of text and element. I thougth that the text must
    >> be a leaf of the tree ... So does it respects the standard ?

    >
    >
    > yes : an element may contain :
    > -nothing (empty element)
    > -subelements
    > -text
    > -text and subelements
    >
    >>
    >> Thanks
    >>

    >
    >
     
    Manuel Yguel, Feb 23, 2004
    #3
  4. Manuel Yguel wrote:
    > Philippe Poulard wrote:
    >
    >> Manuel Yguel wrote:
    >>
    >>> I try to parse an indented xml file with dom xerces c++.
    >>> the file is like that :
    >>> <root>
    >>> <child1>
    >>> <field1> foo </field1>
    >>> <field2> bar </field2>
    >>> </child1>
    >>> <child2>
    >>> <field1> foo </field1>
    >>> <field2> bar </field2>
    >>> </child2>
    >>> </root>
    >>>
    >>> where return an white spaces are in the xml file. So the program I
    >>> writed with dom give me this tree :
    >>> root has five childs :
    >>> text-node child1 text-node child2 text-node
    >>>
    >>> the text of the first text-node is "\n "
    >>> the text of the second text-node is "\n "
    >>> the text of the third text-node is "\n"
    >>>
    >>> these text-node of spaces occurs at each step in the tree hierarchy.
    >>>
    >>> Is it possible to strip these nodes automatically ?

    >>
    >>
    >>
    >> yes : there is an option that allows to strip ignorable whitespaces,
    >> but you must give a grammar that defines where are ignorable
    >> whitespaces, like this :
    >>
    >> <!ELEMENT root (child1,child2)>
    >>

    > thanks, but after how do you use the grammar with the parser ?
    >


    use the <!DOCTYPE> declaration
    you should have a look at the spec
    --
    Cordialement,

    ///
    (. .)
    -----ooO--(_)--Ooo-----
    | Philippe Poulard |
    -----------------------
     
    Philippe Poulard, Feb 24, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stefan Siegl
    Replies:
    2
    Views:
    744
    David Zimmerman
    Jul 17, 2003
  2. Markus
    Replies:
    1
    Views:
    479
    Markus
    Nov 22, 2005
  3. cvissy
    Replies:
    0
    Views:
    609
    cvissy
    Nov 16, 2004
  4. Replies:
    4
    Views:
    294
    Kenny McCormack
    Feb 21, 2006
  5. Replies:
    10
    Views:
    491
    Chris Gonnerman
    Dec 14, 2007
Loading...

Share This Page