Parsing a generic data file

Discussion in 'XML' started by Jasper, Dec 14, 2007.

  1. Jasper

    Jasper Guest

    Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for ideas
    on how to parse a data file.

    I dont know XML but I know it parses data in text format.

    I have a structured data file of the general form shown below. I dont have
    any definition of the data. Basically it looks like it is hierarchical,
    token/data pairs defined by brackets and square brackets.

    I would like to parse this out into some sort of data object(s) in C++ so
    that I can gain programmatic access to the variables.

    My app is C++ so the solution must be the same. Also it must be very
    lightweight and *very* fast as I must decode multiple pages in realtime.

    Would adapting an XML parser to do this be a possible solution?

    Any pointers/ideas/references/code snippets/observations appreciated.

    TIA

    Basic example showing data structure (whitespaces and carriage returns added
    by me for clarity).

    {

    "teacher":{
    "name":
    "Mr Borat",
    "age":
    "35",
    "Nationality":
    "Kazakhstan"},


    "Class":{
    "Semester":
    "Summer",
    "Room":
    null,
    "Subject":
    "Politics",
    "Notes":
    "We're happy, you happy?"},

    "Students":
    [
    {
    "Smith":
    [{"First Name":"Mary","sex":"Female"}],
    "Brown":
    [{"First Name":"John","sex":"Male"}],
    "Jackson":
    [{"First Name":"Jackie","sex":"Female"}]
    }
    ],


    "Grades":
    [
    {
    "Test":
    [{"grade":"A","points":68},{"grade":"B","points":25},{"grade":"C","points":15}],
    "Test":
    [{"grade":"C","points":2},{"grade":"B","points":29},{"grade":"A","points":55}],
    "Test":
    [{"grade":"C","points":2},{"grade":"A","points":72},{"grade":"A","points":65}]
    }
    ]

    }
     
    Jasper, Dec 14, 2007
    #1
    1. Advertising

  2. Jasper

    Pavel Lepin Guest

    Jasper <> wrote in
    <>:
    > I have a structured data file of the general form shown
    > below. I dont have any definition of the data. Basically
    > it looks like it is hierarchical, token/data pairs defined
    > by brackets and square brackets.
    >
    > I would like to parse this out into some sort of data
    > object(s) in C++ so that I can gain programmatic access
    > to the variables.
    >
    > My app is C++ so the solution must be the same. Also it
    > must be very lightweight and *very* fast as I must decode
    > multiple pages in realtime.


    Well, representing data like that in XML is not a problem in
    itself, even if you cannot define a more strict schema than
    just free-form key/value pairs. The problem is that you're
    probably not very likely to get the extreme performance you
    seem to want with a canned parser. DOM parsers,
    specifically, would be way too cumbersome for your needs.
    So it's likely to boil down to either writing your own
    streaming parser, or using a streaming parser like expat or
    any random SAX parser out there for maximum performance,
    and even then you might not get what you need.

    > Would adapting an XML parser to do this be a possible
    > solution?


    Not enough data. Try it, profile it, there's no other way to
    know.

    > Any pointers/ideas/references/code snippets/observations
    > appreciated.


    You might want to look into S-expressions as well. You'll
    save on overhead, and I believe there are some quite fast
    S-expression parsers written in C and C++ out there.

    --
    ....also, I submit that we all must honourably commit seppuku
    right now rather than serve the Dark Side by producing the
    HTML 5 spec.
     
    Pavel Lepin, Dec 14, 2007
    #2
    1. Advertising

  3. "Jasper" <> wrote in message
    news:...
    > Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for
    > ideas on how to parse a data file.
    >
    > I dont know XML but I know it parses data in text format.
    >
    > I have a structured data file of the general form shown below. I dont have
    > any definition of the data. Basically it looks like it is hierarchical,
    > token/data pairs defined by brackets and square brackets.
    >
    > I would like to parse this out into some sort of data object(s) in C++ so
    > that I can gain programmatic access to the variables.
    >
    > My app is C++ so the solution must be the same. Also it must be very
    > lightweight and *very* fast as I must decode multiple pages in realtime.
    >
    > Would adapting an XML parser to do this be a possible solution?
    >
    > Any pointers/ideas/references/code snippets/observations appreciated.
    >
    > TIA
    >
    > Basic example showing data structure (whitespaces and carriage returns
    > added by me for clarity).
    >
    > {
    >
    > "teacher":{
    > "name":
    > "Mr Borat",
    > "age":
    > "35",
    > "Nationality":
    > "Kazakhstan"},
    >
    >
    > "Class":{
    > "Semester":
    > "Summer",
    > "Room":
    > null,
    > "Subject":
    > "Politics",
    > "Notes":
    > "We're happy, you happy?"},
    >
    > "Students":
    > [
    > {
    > "Smith":
    > [{"First Name":"Mary","sex":"Female"}],
    > "Brown":
    > [{"First Name":"John","sex":"Male"}],
    > "Jackson":
    > [{"First Name":"Jackie","sex":"Female"}]
    > }
    > ],
    >
    >
    > "Grades":
    > [
    > {
    > "Test":
    >
    > [{"grade":"A","points":68},{"grade":"B","points":25},{"grade":"C","points":15}],
    > "Test":
    >
    > [{"grade":"C","points":2},{"grade":"B","points":29},{"grade":"A","points":55}],
    > "Test":
    >
    > [{"grade":"C","points":2},{"grade":"A","points":72},{"grade":"A","points":65}]
    > }
    > ]
    >
    > }
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >

    Looks like JSON to me, search for a JSON library.
    JSON is a way of representing objects using string literals that is used for
    passing information to clients that use JavaScript.

    --

    Joe Fawcett (MVP - XML)
    http://joe.fawcett.name
     
    msnews.microsoft.com, Dec 14, 2007
    #3
  4. Jasper

    Jasper Guest

    "Pavel Lepin" <> wrote in message
    news:fjtbuf$hju$...
    >
    > Jasper <> wrote in
    > <>:


    >
    > Well, representing data like that in XML is not a problem in
    > itself, even if you cannot define a more strict schema than
    > just free-form key/value pairs. The problem is that you're
    > probably not very likely to get the extreme performance you
    > seem to want with a canned parser. DOM parsers,
    > specifically, would be way too cumbersome for your needs.


    Yes, I thought as much.

    > So it's likely to boil down to either writing your own
    > streaming parser, or using a streaming parser like expat or
    > any random SAX parser out there for maximum performance,
    > and even then you might not get what you need.


    OK I'll take a look.

    >> Would adapting an XML parser to do this be a possible
    >> solution?

    >
    > Not enough data. Try it, profile it, there's no other way to
    > know.
    >
    >> Any pointers/ideas/references/code snippets/observations
    >> appreciated.

    >
    > You might want to look into S-expressions as well. You'll
    > save on overhead, and I believe there are some quite fast
    > S-expression parsers written in C and C++ out there.



    Thanks, again.

    ..
     
    Jasper, Dec 14, 2007
    #4
  5. Jasper

    Jasper Guest

    "msnews.microsoft.com" <> wrote in message
    news:...
    > "Jasper" <> wrote in message
    > news:...
    >> Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for
    >> ideas on how to parse a data file.
    >>

    > Looks like JSON to me, search for a JSON library.
    > JSON is a way of representing objects using string literals that is used
    > for passing information to clients that use JavaScript.
    >


    Does it? Makes sense if that's true. I was sure it fit some sort of "web
    format" but I didn't know which.
    I presume there must be some sort of C++ code available to parse it out.

    I'll take a look.

    Thanks
     
    Jasper, Dec 14, 2007
    #5
  6. Jasper

    Jasper Guest

    "Pavel Lepin" <> wrote in message
    news:fjtbuf$hju$...
    >
    > Jasper <> wrote in
    > <>:


    >
    > Well, representing data like that in XML is not a problem in
    > itself, even if you cannot define a more strict schema than
    > just free-form key/value pairs. The problem is that you're
    > probably not very likely to get the extreme performance you
    > seem to want with a canned parser. DOM parsers,
    > specifically, would be way too cumbersome for your needs.


    Yes, I thought as much.

    > So it's likely to boil down to either writing your own
    > streaming parser, or using a streaming parser like expat or
    > any random SAX parser out there for maximum performance,
    > and even then you might not get what you need.


    OK I'll take a look.

    >> Would adapting an XML parser to do this be a possible
    >> solution?

    >
    > Not enough data. Try it, profile it, there's no other way to
    > know.
    >
    >> Any pointers/ideas/references/code snippets/observations
    >> appreciated.

    >
    > You might want to look into S-expressions as well. You'll
    > save on overhead, and I believe there are some quite fast
    > S-expression parsers written in C and C++ out there.



    Thanks, again.

    ..
     
    Jasper, Dec 14, 2007
    #6
  7. Jasper

    Lynn Guest

    "Jasper" <> wrote in message
    news:...

    > Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for
    > ideas on how to parse a data file.


    can't you create arrays of C++ structs or classes to hold this data? As for
    parsing it, if you don't want to write your own parser there has to be an
    abundance of libraries out there you could use out of the box, no?
    Efficiency will vary but I can't see why any decent commercial product, if
    not your own code, would not be *very* fast

    I guess I'm not seeing why you would use XML or XML tools to intermediate
    this process when the data is not coming at you in XML and you've given no
    indication that you need to out it as XML for other processes to consume ...
    ?


    > I dont know XML but I know it parses data in text format.
    >
    > I have a structured data file of the general form shown below. I dont have
    > any definition of the data. Basically it looks like it is hierarchical,
    > token/data pairs defined by brackets and square brackets.
    >
    > I would like to parse this out into some sort of data object(s) in C++ so
    > that I can gain programmatic access to the variables.
    >
    > My app is C++ so the solution must be the same. Also it must be very
    > lightweight and *very* fast as I must decode multiple pages in realtime.
    >
    > Would adapting an XML parser to do this be a possible solution?
    >
    > Any pointers/ideas/references/code snippets/observations appreciated.
    >
    > TIA
    >
    > Basic example showing data structure (whitespaces and carriage returns
    > added by me for clarity).
    >
    > {
    >
    > "teacher":{
    > "name":
    > "Mr Borat",
    > "age":
    > "35",
    > "Nationality":
    > "Kazakhstan"},
    >
    >
    > "Class":{
    > "Semester":
    > "Summer",
    > "Room":
    > null,
    > "Subject":
    > "Politics",
    > "Notes":
    > "We're happy, you happy?"},
    >
    > "Students":
    > [
    > {
    > "Smith":
    > [{"First Name":"Mary","sex":"Female"}],
    > "Brown":
    > [{"First Name":"John","sex":"Male"}],
    > "Jackson":
    > [{"First Name":"Jackie","sex":"Female"}]
    > }
    > ],
    >
    >
    > "Grades":
    > [
    > {
    > "Test":
    >
    > [{"grade":"A","points":68},{"grade":"B","points":25},{"grade":"C","points":15}],
    > "Test":
    >
    > [{"grade":"C","points":2},{"grade":"B","points":29},{"grade":"A","points":55}],
    > "Test":
    >
    > [{"grade":"C","points":2},{"grade":"A","points":72},{"grade":"A","points":65}]
    > }
    > ]
    >
    > }
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
     
    Lynn, Dec 14, 2007
    #7
  8. "Jasper" <> wrote in message
    news:...
    >
    > "msnews.microsoft.com" <> wrote in message
    > news:...
    > > "Jasper" <> wrote in message
    > > news:...
    > >> Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for
    > >> ideas on how to parse a data file.
    > >>

    > > Looks like JSON to me, search for a JSON library.
    > > JSON is a way of representing objects using string literals that is used
    > > for passing information to clients that use JavaScript.
    > >

    >
    > Does it? Makes sense if that's true. I was sure it fit some sort of "web
    > format" but I didn't know which.
    > I presume there must be some sort of C++ code available to parse it out.
    >


    It is JSON. You would need to be looking at the Javascript eval method to
    parse it. The returned object would then have a heiarchy you could pull
    data from e.g.:-

    var x = o.Class.Subject

    x == "Politics" // will be true

    However the structure is somewhat suspect.

    The students array contains only one object on which all students are
    placed. Each student having their last name as the attribute ID for their
    object (what happens if the class is attended by more than one Smith?).
    This object is in turn an array containing only one object.

    The Grades array suffers the same problem where again inappropriate use of
    { } causes the array to contain only one object and in this case the same
    identifier "Test" used multiple times resulting in it being redefined and
    only containing the last entry.

    Here is a cleaner version (although I'm not entirely happy with the
    identifiers "Last Name" and "First Name" containing a space it is legal):-

    {

    "teacher":{
    "name": "Mr Borat",
    "age": 35,
    "Nationality": "Kazakhstan"
    },


    "Class":{
    "Semester": "Summer",
    "Room": null,
    "Subject": "Politics",
    "Notes": "We're happy, you happy?"
    },

    "Students":
    [
    {"Last Name":"Smith",
    "First Name":"Mary","sex":"Female"},
    {"Last Name":"Brown",
    "First Name":"John","sex":"Male"},
    {"Last Name":"Jackson",
    "First Name":"Jackie","sex":"Female"}
    ],


    "Grades":
    [
    {"Test":"Name of a Test",
    Points: {"A":68,"B":25,"C":15}}
    {"Test":"Name of a different test",
    Points: {"A":55,"B":29,"C":2}}
    {"Test": "Name of yet another test",
    Points: {"A":72,"B":65,"C":2}}
    ]

    }


    --
    Anthony Jones - MVP ASP/ASP.NET
     
    Anthony Jones, Dec 14, 2007
    #8
  9. Jasper

    Guest

    The FXSL library has a json-document() function (written entirely in
    XSLT
    2.0 and using the FXSL's LR parsing framework (also written entirely
    in XSLT
    2.0) ).

    When this transformation:

    <xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:f="http://fxsl.sf.net/"
    exclude-result-prefixes="f xs"
    >

    <xsl:import href="../f/func-json-document.xsl"/>

    <xsl:eek:utput omit-xml-declaration="yes" indent="yes"/>

    <xsl:variable name="vstrParam" as="xs:string">
    {

    "teacher":{
    "name":
    "Mr Borat",
    "age":
    "35",
    "Nationality":
    "Kazakhstan"
    },


    "Class":{
    "Semester":
    "Summer",
    "Room":
    "null",
    "Subject":
    "Politics",
    "Notes":
    "We're happy, you happy?"
    },

    "Students":
    [
    {
    "Smith":
    [{"First_Name":"Mary","sex":"Female"}],
    "Brown":
    [{"First_Name":"John","sex":"Male"}],
    "Jackson":
    [{"First_Name":"Jackie","sex":"Female"}]
    }
    ],


    "Grades":
    [
    {
    "Test":
    [{"grade":"A","points":68},{"grade":"B","points":25},
    {"grade":"C","points":15}],
    "Test":
    [{"grade":"C","points":2},{"grade":"B","points":29},
    {"grade":"A","points":55}],
    "Test":
    [{"grade":"C","points":2},{"grade":"A","points":72},
    {"grade":"A","points":65}]
    }
    ]

    }
    </xsl:variable>

    <xsl:template match="/">
    <xsl:sequence select="f:json-document($vstrParam)"/>
    </xsl:template>
    </xsl:stylesheet>

    is applied (containing essentially your original data, with "First
    Name"
    changed to "First_Name", and null changed to "null

    the following result is produced:

    <teacher>
    <name>Mr Borat</name>
    <age>35</age>
    <Nationality>Kazakhstan</Nationality>
    </teacher>
    <Class>
    <Semester>Summer</Semester>
    <Room>null</Room>
    <Subject>Politics</Subject>
    <Notes>We're happy, you happy?</Notes>
    </Class>
    <Students>
    <Smith>
    <First_Name>Mary</First_Name>
    <sex>Female</sex>
    </Smith>
    <Brown>
    <First_Name>John</First_Name>
    <sex>Male</sex>
    </Brown>
    <Jackson>
    <First_Name>Jackie</First_Name>
    <sex>Female</sex>
    </Jackson>
    </Students>
    <Grades>
    <Test>
    <grade>A</grade>
    <points>68</points>
    </Test>
    <Test>
    <grade>B</grade>
    <points>25</points>
    </Test>
    <Test>
    <grade>C</grade>
    <points>15</points>
    </Test>
    <Test>
    <grade>C</grade>
    <points>2</points>
    </Test>
    <Test>
    <grade>B</grade>
    <points>29</points>
    </Test>
    <Test>
    <grade>A</grade>
    <points>55</points>
    </Test>
    <Test>
    <grade>C</grade>
    <points>2</points>
    </Test>
    <Test>
    <grade>A</grade>
    <points>72</points>
    </Test>
    <Test>
    <grade>A</grade>
    <points>65</points>
    </Test>
    </Grades>

    One can use json-document() in any XPath expressions, for example,
    getting
    all female students is as easy as:

    f:json-document($vstrParam)/Students/*[sex = 'Female']

    and produces:

    <Smith>
    <First_Name>Mary</First_Name>
    <sex>Female</sex>
    </Smith>
    <Jackson>
    <First_Name>Jackie</First_Name>
    <sex>Female</sex>
    </Jackson>


    I will fix the implementation of json-document() to replace whitespace
    in
    element names with underscores and to process the unquoted string
    null.


    Cheers,
    Dimitre Novatchev



    On Dec 13, 7:52 pm, "Jasper" <> wrote:
    > Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for ideas
    > on how to parse a data file.
    >
    > I dont know XML but I know it parses data in text format.
    >
    > I have a structured data file of the general form shown below. I dont have
    > any definition of the data. Basically it looks like it is hierarchical,
    > token/data pairs defined by brackets and square brackets.
    >
    > I would like to parse this out into some sort of data object(s) in C++ so
    > that I can gain programmatic access to the variables.
    >
    > My app is C++ so the solution must be the same. Also it must be very
    > lightweight and *very* fast as I must decode multiple pages in realtime.
    >
    > Would adapting an XML parser to do this be a possible solution?
    >
    > Any pointers/ideas/references/code snippets/observations appreciated.
    >
    > TIA
    >
    > Basic example showing data structure (whitespaces and carriage returns added
    > by me for clarity).
    >
    > {
    >
    > "teacher":{
    > "name":
    > "Mr Borat",
    > "age":
    > "35",
    > "Nationality":
    > "Kazakhstan"},
    >
    > "Class":{
    > "Semester":
    > "Summer",
    > "Room":
    > null,
    > "Subject":
    > "Politics",
    > "Notes":
    > "We're happy, you happy?"},
    >
    > "Students":
    > [
    > {
    > "Smith":
    > [{"First Name":"Mary","sex":"Female"}],
    > "Brown":
    > [{"First Name":"John","sex":"Male"}],
    > "Jackson":
    > [{"First Name":"Jackie","sex":"Female"}]}
    >
    > ],
    >
    > "Grades":
    > [
    > {
    > "Test":
    > [{"grade":"A","points":68},{"grade":"B","points":25},{"grade":"C","points":-15}],
    > "Test":
    > [{"grade":"C","points":2},{"grade":"B","points":29},{"grade":"A","points":5-5}],
    > "Test":
    > [{"grade":"C","points":2},{"grade":"A","points":72},{"grade":"A","points":6-5}]}
    >
    > ]
    >
    >
    >
    > }- Hide quoted text -
    >
    > - Show quoted text -
     
    , Dec 15, 2007
    #9
  10. <> wrote in message
    news:...
    > The FXSL library has a json-document() function (written entirely in
    > XSLT
    > 2.0 and using the FXSL's LR parsing framework (also written entirely
    > in XSLT
    > 2.0) ).
    >
    > When this transformation:
    >
    > <xsl:stylesheet version="2.0"
    > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    > xmlns:xs="http://www.w3.org/2001/XMLSchema"
    > xmlns:f="http://fxsl.sf.net/"
    > exclude-result-prefixes="f xs"
    > >

    > <xsl:import href="../f/func-json-document.xsl"/>
    >
    > <xsl:eek:utput omit-xml-declaration="yes" indent="yes"/>
    >
    > <xsl:variable name="vstrParam" as="xs:string">
    > {
    >
    > "teacher":{
    > "name":
    > "Mr Borat",
    > "age":
    > "35",
    > "Nationality":
    > "Kazakhstan"
    > },
    >
    >
    > "Class":{
    > "Semester":
    > "Summer",
    > "Room":
    > "null",
    > "Subject":
    > "Politics",
    > "Notes":
    > "We're happy, you happy?"
    > },
    >
    > "Students":
    > [
    > {
    > "Smith":
    > [{"First_Name":"Mary","sex":"Female"}],
    > "Brown":
    > [{"First_Name":"John","sex":"Male"}],
    > "Jackson":
    > [{"First_Name":"Jackie","sex":"Female"}]
    > }
    > ],
    >
    >
    > "Grades":
    > [
    > {
    > "Test":
    > [{"grade":"A","points":68},{"grade":"B","points":25},
    > {"grade":"C","points":15}],
    > "Test":
    > [{"grade":"C","points":2},{"grade":"B","points":29},
    > {"grade":"A","points":55}],
    > "Test":
    > [{"grade":"C","points":2},{"grade":"A","points":72},
    > {"grade":"A","points":65}]
    > }
    > ]
    >
    > }
    > </xsl:variable>
    >
    > <xsl:template match="/">
    > <xsl:sequence select="f:json-document($vstrParam)"/>
    > </xsl:template>
    > </xsl:stylesheet>
    >
    > is applied (containing essentially your original data, with "First
    > Name"
    > changed to "First_Name", and null changed to "null
    >
    > the following result is produced:
    >
    > <teacher>
    > <name>Mr Borat</name>
    > <age>35</age>
    > <Nationality>Kazakhstan</Nationality>
    > </teacher>
    > <Class>
    > <Semester>Summer</Semester>
    > <Room>null</Room>
    > <Subject>Politics</Subject>
    > <Notes>We're happy, you happy?</Notes>
    > </Class>
    > <Students>
    > <Smith>
    > <First_Name>Mary</First_Name>
    > <sex>Female</sex>
    > </Smith>
    > <Brown>
    > <First_Name>John</First_Name>
    > <sex>Male</sex>
    > </Brown>
    > <Jackson>
    > <First_Name>Jackie</First_Name>
    > <sex>Female</sex>
    > </Jackson>
    > </Students>
    > <Grades>
    > <Test>
    > <grade>A</grade>
    > <points>68</points>
    > </Test>
    > <Test>
    > <grade>B</grade>
    > <points>25</points>
    > </Test>
    > <Test>
    > <grade>C</grade>
    > <points>15</points>
    > </Test>
    > <Test>
    > <grade>C</grade>
    > <points>2</points>
    > </Test>
    > <Test>
    > <grade>B</grade>
    > <points>29</points>
    > </Test>
    > <Test>
    > <grade>A</grade>
    > <points>55</points>
    > </Test>
    > <Test>
    > <grade>C</grade>
    > <points>2</points>
    > </Test>
    > <Test>
    > <grade>A</grade>
    > <points>72</points>
    > </Test>
    > <Test>
    > <grade>A</grade>
    > <points>65</points>
    > </Test>
    > </Grades>
    >
    > One can use json-document() in any XPath expressions, for example,
    > getting
    > all female students is as easy as:
    >
    > f:json-document($vstrParam)/Students/*[sex = 'Female']
    >
    > and produces:
    >
    > <Smith>
    > <First_Name>Mary</First_Name>
    > <sex>Female</sex>
    > </Smith>
    > <Jackson>
    > <First_Name>Jackie</First_Name>
    > <sex>Female</sex>
    > </Jackson>
    >
    >
    > I will fix the implementation of json-document() to replace whitespace
    > in
    > element names with underscores and to process the unquoted string
    > null.
    >


    The question arises as to whether the output XML should represent the data
    that would be available in the set of generated objects had the JSON been
    eval'd?

    Perhaps the Grades section should look like this:-

    <Grades>
    <Test>
    <grade>C</grade>
    <points>2</points>
    </Test>
    <Test>
    <grade>A</grade>
    <points>72</points>
    </Test>
    <Test>
    <grade>A</grade>
    <points>65</points>
    </Test>
    </Grades>

    since only this data would appear in the an eval of the JSON?



    --
    Anthony Jones - MVP ASP/ASP.NET
     
    Anthony Jones, Dec 16, 2007
    #10
  11. Jasper

    Guest

    > The question arises as to whether the output XML should represent the data
    > that would be available in the set of generated objects had the JSON been
    > eval'd?
    >
    > Perhaps the Grades section should look like this:-
    >
    > <Grades>
    > <Test>
    > <grade>C</grade>
    > <points>2</points>
    > </Test>
    > <Test>
    > <grade>A</grade>
    > <points>72</points>
    > </Test>
    > <Test>
    > <grade>A</grade>
    > <points>65</points>
    > </Test>
    > </Grades>
    >
    > since only this data would appear in the an eval of the JSON?
    >


    The answer is clearly: No.

    It is the definition of JSON (and the convertors from XML to JSON use
    this) that a sequence of repeating xml elements with the same name are
    represented as an ARRAY in JSON.

    We don't care what an JScript interpreter would do with the data, but
    we must implement a truthful and lossless conversion. Not producing
    all <test /> and <grade /> elements results in data loss.


    Cheers,
    Dimitre Novatchev



    On Dec 16, 8:10 am, "Anthony Jones" <> wrote:
    > <> wrote in message
    >
    > news:...
    >
    >
    >
    >
    >
    > > The FXSL library has a json-document() function (written entirely in
    > > XSLT
    > > 2.0 and using the FXSL's LR parsing framework (also written entirely
    > > in XSLT
    > > 2.0) ).

    >
    > > When this transformation:

    >
    > > <xsl:stylesheet version="2.0"
    > > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    > > xmlns:xs="http://www.w3.org/2001/XMLSchema"
    > > xmlns:f="http://fxsl.sf.net/"
    > > exclude-result-prefixes="f xs"

    >
    > > <xsl:import href="../f/func-json-document.xsl"/>

    >
    > > <xsl:eek:utput omit-xml-declaration="yes" indent="yes"/>

    >
    > > <xsl:variable name="vstrParam" as="xs:string">
    > > {

    >
    > > "teacher":{
    > > "name":
    > > "Mr Borat",
    > > "age":
    > > "35",
    > > "Nationality":
    > > "Kazakhstan"
    > > },

    >
    > > "Class":{
    > > "Semester":
    > > "Summer",
    > > "Room":
    > > "null",
    > > "Subject":
    > > "Politics",
    > > "Notes":
    > > "We're happy, you happy?"
    > > },

    >
    > > "Students":
    > > [
    > > {
    > > "Smith":
    > > [{"First_Name":"Mary","sex":"Female"}],
    > > "Brown":
    > > [{"First_Name":"John","sex":"Male"}],
    > > "Jackson":
    > > [{"First_Name":"Jackie","sex":"Female"}]
    > > }
    > > ],

    >
    > > "Grades":
    > > [
    > > {
    > > "Test":
    > > [{"grade":"A","points":68},{"grade":"B","points":25},
    > > {"grade":"C","points":15}],
    > > "Test":
    > > [{"grade":"C","points":2},{"grade":"B","points":29},
    > > {"grade":"A","points":55}],
    > > "Test":
    > > [{"grade":"C","points":2},{"grade":"A","points":72},
    > > {"grade":"A","points":65}]
    > > }
    > > ]

    >
    > > }
    > > </xsl:variable>

    >
    > > <xsl:template match="/">
    > > <xsl:sequence select="f:json-document($vstrParam)"/>
    > > </xsl:template>
    > > </xsl:stylesheet>

    >
    > > is applied (containing essentially your original data, with "First
    > > Name"
    > > changed to "First_Name", and null changed to "null

    >
    > > the following result is produced:

    >
    > > <teacher>
    > > <name>Mr Borat</name>
    > > <age>35</age>
    > > <Nationality>Kazakhstan</Nationality>
    > > </teacher>
    > > <Class>
    > > <Semester>Summer</Semester>
    > > <Room>null</Room>
    > > <Subject>Politics</Subject>
    > > <Notes>We're happy, you happy?</Notes>
    > > </Class>
    > > <Students>
    > > <Smith>
    > > <First_Name>Mary</First_Name>
    > > <sex>Female</sex>
    > > </Smith>
    > > <Brown>
    > > <First_Name>John</First_Name>
    > > <sex>Male</sex>
    > > </Brown>
    > > <Jackson>
    > > <First_Name>Jackie</First_Name>
    > > <sex>Female</sex>
    > > </Jackson>
    > > </Students>
    > > <Grades>
    > > <Test>
    > > <grade>A</grade>
    > > <points>68</points>
    > > </Test>
    > > <Test>
    > > <grade>B</grade>
    > > <points>25</points>
    > > </Test>
    > > <Test>
    > > <grade>C</grade>
    > > <points>15</points>
    > > </Test>
    > > <Test>
    > > <grade>C</grade>
    > > <points>2</points>
    > > </Test>
    > > <Test>
    > > <grade>B</grade>
    > > <points>29</points>
    > > </Test>
    > > <Test>
    > > <grade>A</grade>
    > > <points>55</points>
    > > </Test>
    > > <Test>
    > > <grade>C</grade>
    > > <points>2</points>
    > > </Test>
    > > <Test>
    > > <grade>A</grade>
    > > <points>72</points>
    > > </Test>
    > > <Test>
    > > <grade>A</grade>
    > > <points>65</points>
    > > </Test>
    > > </Grades>

    >
    > > One can use json-document() in any XPath expressions, for example,
    > > getting
    > > all female students is as easy as:

    >
    > > f:json-document($vstrParam)/Students/*[sex = 'Female']

    >
    > > and produces:

    >
    > > <Smith>
    > > <First_Name>Mary</First_Name>
    > > <sex>Female</sex>
    > > </Smith>
    > > <Jackson>
    > > <First_Name>Jackie</First_Name>
    > > <sex>Female</sex>
    > > </Jackson>

    >
    > > I will fix the implementation of json-document() to replace whitespace
    > > in
    > > element names with underscores and to process the unquoted string
    > > null.

    >
    > --
    > Anthony Jones - MVP ASP/ASP.NET- Hide quoted text -
    >
    > - Show quoted text -
     
    , Dec 17, 2007
    #11
  12. Jasper

    Guest

    I also think that a more appropriate JSON representation than:

    "Grades":
    [
    {
    "Test":
    [{"grade":"A","points":68},{"grade":"B","points":25},
    {"grade":"C","points":15}],
    "Test":
    [{"grade":"C","points":2},{"grade":"B","points":29},
    {"grade":"A","points":55}],
    "Test":
    [{"grade":"C","points":2},{"grade":"A","points":72},
    {"grade":"A","points":65}]
    }
    ]


    should have been:

    "Grades":

    {
    "Test":
    [
    {"grade":"A","points":68,"grade":"B","points":
    25,"grade":"C","points":15},

    {"grade":"C","points":2, "grade":"B","points":29,
    "grade":"A","points":55},

    {"grade":"C","points":2, "grade":"A","points":72,
    "grade":"A","points":65}
    ]
    }

    Also, instead of:

    "Students":
    [
    {
    "Smith":
    [{"First Name":"Mary","sex":"Female"}],
    "Brown":
    [{"1First Name":"John","sex":"Male"}],
    "Jackson":
    [{"2First Name":"Jackie","sex":"Female"}]
    }
    ],

    it is better to have just:

    "Students":
    {
    "Smith":
    {"First Name":"Mary","sex":"Female"},
    "Brown":
    {"1First Name":"John","sex":"Male"},
    "Jackson":
    {"2First Name":"Jackie","sex":"Female"}
    }
    ,


    Maybe, the original data was produced by a faulty XML --> JSON
    convertor.

    BTW, I have updated the FXSL CVS with the newest f:json-document(),
    which correctly produces XML element names from any JSON string.

    The correct treatment of null will follow shortly.


    Cheers,
    Dimitre Novatchev


    On Dec 16, 8:10 am, "Anthony Jones" <> wrote:
    > <> wrote in message
    >
    > news:...
    >
    >
    >
    >
    >
    > > The FXSL library has a json-document() function (written entirely in
    > > XSLT
    > > 2.0 and using the FXSL's LR parsing framework (also written entirely
    > > in XSLT
    > > 2.0) ).

    >
    > > When this transformation:

    >
    > > <xsl:stylesheet version="2.0"
    > > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    > > xmlns:xs="http://www.w3.org/2001/XMLSchema"
    > > xmlns:f="http://fxsl.sf.net/"
    > > exclude-result-prefixes="f xs"

    >
    > > <xsl:import href="../f/func-json-document.xsl"/>

    >
    > > <xsl:eek:utput omit-xml-declaration="yes" indent="yes"/>

    >
    > > <xsl:variable name="vstrParam" as="xs:string">
    > > {

    >
    > > "teacher":{
    > > "name":
    > > "Mr Borat",
    > > "age":
    > > "35",
    > > "Nationality":
    > > "Kazakhstan"
    > > },

    >
    > > "Class":{
    > > "Semester":
    > > "Summer",
    > > "Room":
    > > "null",
    > > "Subject":
    > > "Politics",
    > > "Notes":
    > > "We're happy, you happy?"
    > > },

    >
    > > "Students":
    > > [
    > > {
    > > "Smith":
    > > [{"First_Name":"Mary","sex":"Female"}],
    > > "Brown":
    > > [{"First_Name":"John","sex":"Male"}],
    > > "Jackson":
    > > [{"First_Name":"Jackie","sex":"Female"}]
    > > }
    > > ],

    >
    > > "Grades":
    > > [
    > > {
    > > "Test":
    > > [{"grade":"A","points":68},{"grade":"B","points":25},
    > > {"grade":"C","points":15}],
    > > "Test":
    > > [{"grade":"C","points":2},{"grade":"B","points":29},
    > > {"grade":"A","points":55}],
    > > "Test":
    > > [{"grade":"C","points":2},{"grade":"A","points":72},
    > > {"grade":"A","points":65}]
    > > }
    > > ]

    >
    > > }
    > > </xsl:variable>

    >
    > > <xsl:template match="/">
    > > <xsl:sequence select="f:json-document($vstrParam)"/>
    > > </xsl:template>
    > > </xsl:stylesheet>

    >
    > > is applied (containing essentially your original data, with "First
    > > Name"
    > > changed to "First_Name", and null changed to "null

    >
    > > the following result is produced:

    >
    > > <teacher>
    > > <name>Mr Borat</name>
    > > <age>35</age>
    > > <Nationality>Kazakhstan</Nationality>
    > > </teacher>
    > > <Class>
    > > <Semester>Summer</Semester>
    > > <Room>null</Room>
    > > <Subject>Politics</Subject>
    > > <Notes>We're happy, you happy?</Notes>
    > > </Class>
    > > <Students>
    > > <Smith>
    > > <First_Name>Mary</First_Name>
    > > <sex>Female</sex>
    > > </Smith>
    > > <Brown>
    > > <First_Name>John</First_Name>
    > > <sex>Male</sex>
    > > </Brown>
    > > <Jackson>
    > > <First_Name>Jackie</First_Name>
    > > <sex>Female</sex>
    > > </Jackson>
    > > </Students>
    > > <Grades>
    > > <Test>
    > > <grade>A</grade>
    > > <points>68</points>
    > > </Test>
    > > <Test>
    > > <grade>B</grade>
    > > <points>25</points>
    > > </Test>
    > > <Test>
    > > <grade>C</grade>
    > > <points>15</points>
    > > </Test>
    > > <Test>
    > > <grade>C</grade>
    > > <points>2</points>
    > > </Test>
    > > <Test>
    > > <grade>B</grade>
    > > <points>29</points>
    > > </Test>
    > > <Test>
    > > <grade>A</grade>
    > > <points>55</points>
    > > </Test>
    > > <Test>
    > > <grade>C</grade>
    > > <points>2</points>
    > > </Test>
    > > <Test>
    > > <grade>A</grade>
    > > <points>72</points>
    > > </Test>
    > > <Test>
    > > <grade>A</grade>
    > > <points>65</points>
    > > </Test>
    > > </Grades>

    >
    > > One can use json-document() in any XPath expressions, for example,
    > > getting
    > > all female students is as easy as:

    >
    > > f:json-document($vstrParam)/Students/*[sex = 'Female']

    >
    > > and produces:

    >
    > > <Smith>
    > > <First_Name>Mary</First_Name>
    > > <sex>Female</sex>
    > > </Smith>
    > > <Jackson>
    > > <First_Name>Jackie</First_Name>
    > > <sex>Female</sex>
    > > </Jackson>

    >
    > > I will fix the implementation of json-document() to replace whitespace
    > > in
    > > element names with underscores and to process the unquoted string
    > > null.

    >
    > The question arises as to whether the output XML should represent the data
    > that would be available in the set of generated objects had the JSON been
    > eval'd?
    >
    > Perhaps the Grades section should look like this:-
    >
    > <Grades>
    > <Test>
    > <grade>C</grade>
    > <points>2</points>
    > </Test>
    > <Test>
    > <grade>A</grade>
    > <points>72</points>
    > </Test>
    > <Test>
    > <grade>A</grade>
    > <points>65</points>
    > </Test>
    > </Grades>
    >
    > since only this data would appear in the an eval of the JSON?
    >
    > --
    > Anthony Jones - MVP ASP/ASP.NET- Hide quoted text -
    >
    > - Show quoted text -
     
    , Dec 17, 2007
    #12
  13. <> wrote in message
    news:...
    > > The question arises as to whether the output XML should represent the

    data
    > > that would be available in the set of generated objects had the JSON

    been
    > > eval'd?
    > >
    > > Perhaps the Grades section should look like this:-
    > >
    > > <Grades>
    > > <Test>
    > > <grade>C</grade>
    > > <points>2</points>
    > > </Test>
    > > <Test>
    > > <grade>A</grade>
    > > <points>72</points>
    > > </Test>
    > > <Test>
    > > <grade>A</grade>
    > > <points>65</points>
    > > </Test>
    > > </Grades>
    > >
    > > since only this data would appear in the an eval of the JSON?
    > >

    >
    > The answer is clearly: No.
    >


    Oh, I thought the raison d'ĂȘtre behind JSON was that a data structure could
    be serialised to a string that could be passed to Javascript and
    re-assembled easily by using the Eval statement.

    > It is the definition of JSON (and the convertors from XML to JSON use
    > this) that a sequence of repeating xml elements with the same name are
    > represented as an ARRAY in JSON.


    Is there a spec? Where does it say that?

    >
    > We don't care what an JScript interpreter would do with the data, but
    > we must implement a truthful and lossless conversion. Not producing
    > all <test /> and <grade /> elements results in data loss.
    >


    Agreed. I'm willing to be shown wrong on this but if you're right than JSON
    is bust and pointless.

    --
    Anthony Jones - MVP ASP/ASP.NET
     
    Anthony Jones, Dec 22, 2007
    #13
  14. <> wrote in message
    news:...
    > I also think that a more appropriate JSON representation than:
    >
    > "Grades":
    > [
    > {
    > "Test":
    > [{"grade":"A","points":68},{"grade":"B","points":25},
    > {"grade":"C","points":15}],
    > "Test":
    > [{"grade":"C","points":2},{"grade":"B","points":29},
    > {"grade":"A","points":55}],
    > "Test":
    > [{"grade":"C","points":2},{"grade":"A","points":72},
    > {"grade":"A","points":65}]
    > }
    > ]
    >
    >
    > should have been:
    >
    > "Grades":
    >
    > {
    > "Test":
    > [
    > {"grade":"A","points":68,"grade":"B","points":
    > 25,"grade":"C","points":15},
    >
    > {"grade":"C","points":2, "grade":"B","points":29,
    > "grade":"A","points":55},
    >
    > {"grade":"C","points":2, "grade":"A","points":72,
    > "grade":"A","points":65}
    > ]
    > }
    >


    We're just guessing at the intent but that appears to be an object called
    Grades that contains just one member an array called Test containing what
    appears to be grades required to pass each test. Seems a little convoluted
    and how is each test identified? Ordinal position?

    > Also, instead of:
    >
    > "Students":
    > [
    > {
    > "Smith":
    > [{"First Name":"Mary","sex":"Female"}],
    > "Brown":
    > [{"1First Name":"John","sex":"Male"}],
    > "Jackson":
    > [{"2First Name":"Jackie","sex":"Female"}]
    > }
    > ],
    >
    > it is better to have just:
    >
    > "Students":
    > {
    > "Smith":
    > {"First Name":"Mary","sex":"Female"},
    > "Brown":
    > {"1First Name":"John","sex":"Male"},
    > "Jackson":
    > {"2First Name":"Jackie","sex":"Female"}
    > }
    > ,


    And if you have two students with the last name Smith? Smith magically
    becomes an array?

    >Maybe, the original data was produced by a faulty XML --> JSON
    >convertor.


    Its difficult to make sense of what appears to be faulty both as JSON and as
    a logical structure.

    --
    Anthony Jones - MVP ASP/ASP.NET
     
    Anthony Jones, Dec 22, 2007
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Murat Tasan
    Replies:
    1
    Views:
    8,051
    Chaitanya
    Feb 3, 2009
  2. Replies:
    2
    Views:
    437
  3. Roedy Green

    Parsing generic XML

    Roedy Green, Jun 11, 2008, in forum: Java
    Replies:
    1
    Views:
    373
    Owen Jacobson
    Jun 11, 2008
  4. minlearn
    Replies:
    2
    Views:
    457
    red floyd
    Mar 13, 2009
  5. Adam Sanderson

    Generic Parsing Library

    Adam Sanderson, Aug 16, 2005, in forum: Ruby
    Replies:
    8
    Views:
    201
    James Edward Gray II
    Aug 19, 2005
Loading...

Share This Page