Parsing a generic data file

Jasper · Dec 14, 2007

Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for ideas
on how to parse a data file.

I dont know XML but I know it parses data in text format.

I have a structured data file of the general form shown below. I dont have
any definition of the data. Basically it looks like it is hierarchical,
token/data pairs defined by brackets and square brackets.

I would like to parse this out into some sort of data object(s) in C++ so
that I can gain programmatic access to the variables.

My app is C++ so the solution must be the same. Also it must be very
lightweight and *very* fast as I must decode multiple pages in realtime.

Would adapting an XML parser to do this be a possible solution?

Any pointers/ideas/references/code snippets/observations appreciated.

TIA

Basic example showing data structure (whitespaces and carriage returns added
by me for clarity).

{

"teacher":{
"name":
"Mr Borat",
"age":
"35",
"Nationality":
"Kazakhstan"},

"Class":{
"Semester":
"Summer",
"Room":
null,
"Subject":
"Politics",
"Notes":
"We're happy, you happy?"},

"Students":
[
{
"Smith":
[{"First Name":"Mary","sex":"Female"}],
"Brown":
[{"First Name":"John","sex":"Male"}],
"Jackson":
[{"First Name":"Jackie","sex":"Female"}]
}
],

"Grades":
[
{
"Test":
[{"grade":"A","points":68},{"grade":"B","points":25},{"grade":"C","points":15}],
"Test":
[{"grade":"C","points":2},{"grade":"B","points":29},{"grade":"A","points":55}],
"Test":
[{"grade":"C","points":2},{"grade":"A","points":72},{"grade":"A","points":65}]
}
]

}

Pavel Lepin · Dec 14, 2007

Jasper said:
I have a structured data file of the general form shown
below. I dont have any definition of the data. Basically
it looks like it is hierarchical, token/data pairs defined
by brackets and square brackets.

I would like to parse this out into some sort of data
object(s) in C++ so that I can gain programmatic access
to the variables.

My app is C++ so the solution must be the same. Also it
must be very lightweight and *very* fast as I must decode
multiple pages in realtime.

Well, representing data like that in XML is not a problem in
itself, even if you cannot define a more strict schema than
just free-form key/value pairs. The problem is that you're
probably not very likely to get the extreme performance you
seem to want with a canned parser. DOM parsers,
specifically, would be way too cumbersome for your needs.
So it's likely to boil down to either writing your own
streaming parser, or using a streaming parser like expat or
any random SAX parser out there for maximum performance,
and even then you might not get what you need.

Would adapting an XML parser to do this be a possible
solution?

Not enough data. Try it, profile it, there's no other way to
know.

Any pointers/ideas/references/code snippets/observations
appreciated.

You might want to look into S-expressions as well. You'll
save on overhead, and I believe there are some quite fast
S-expression parsers written in C and C++ out there.

msnews.microsoft.com · Dec 14, 2007

Jasper said:
Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for
ideas on how to parse a data file.

I dont know XML but I know it parses data in text format.

I have a structured data file of the general form shown below. I dont have
any definition of the data. Basically it looks like it is hierarchical,
token/data pairs defined by brackets and square brackets.

I would like to parse this out into some sort of data object(s) in C++ so
that I can gain programmatic access to the variables.

My app is C++ so the solution must be the same. Also it must be very
lightweight and *very* fast as I must decode multiple pages in realtime.

Would adapting an XML parser to do this be a possible solution?

Any pointers/ideas/references/code snippets/observations appreciated.

TIA

Basic example showing data structure (whitespaces and carriage returns
added by me for clarity).

{

"teacher":{
"name":
"Mr Borat",
"age":
"35",
"Nationality":
"Kazakhstan"},

"Class":{
"Semester":
"Summer",
"Room":
null,
"Subject":
"Politics",
"Notes":
"We're happy, you happy?"},

"Students":
[
{
"Smith":
[{"First Name":"Mary","sex":"Female"}],
"Brown":
[{"First Name":"John","sex":"Male"}],
"Jackson":
[{"First Name":"Jackie","sex":"Female"}]
}
],

"Grades":
[
{
"Test":

[{"grade":"A","points":68},{"grade":"B","points":25},{"grade":"C","points":15}],
"Test":

[{"grade":"C","points":2},{"grade":"B","points":29},{"grade":"A","points":55}],
"Test":

[{"grade":"C","points":2},{"grade":"A","points":72},{"grade":"A","points":65}]
}
]

}

Looks like JSON to me, search for a JSON library.
JSON is a way of representing objects using string literals that is used for
passing information to clients that use JavaScript.

Jasper · Dec 14, 2007

Pavel Lepin said:
<[email protected]>:

Well, representing data like that in XML is not a problem in
itself, even if you cannot define a more strict schema than
just free-form key/value pairs. The problem is that you're
probably not very likely to get the extreme performance you
seem to want with a canned parser. DOM parsers,
specifically, would be way too cumbersome for your needs.

Yes, I thought as much.

So it's likely to boil down to either writing your own
streaming parser, or using a streaming parser like expat or
any random SAX parser out there for maximum performance,
and even then you might not get what you need.

OK I'll take a look.

Not enough data. Try it, profile it, there's no other way to
know.

You might want to look into S-expressions as well. You'll
save on overhead, and I believe there are some quite fast
S-expression parsers written in C and C++ out there.

Thanks, again.

..

Jasper · Dec 14, 2007

msnews.microsoft.com said:
Looks like JSON to me, search for a JSON library.
JSON is a way of representing objects using string literals that is used
for passing information to clients that use JavaScript.

Does it? Makes sense if that's true. I was sure it fit some sort of "web
format" but I didn't know which.
I presume there must be some sort of C++ code available to parse it out.

I'll take a look.

Thanks

Jasper · Dec 14, 2007

Pavel Lepin said:
<[email protected]>:

Well, representing data like that in XML is not a problem in
itself, even if you cannot define a more strict schema than
just free-form key/value pairs. The problem is that you're
probably not very likely to get the extreme performance you
seem to want with a canned parser. DOM parsers,
specifically, would be way too cumbersome for your needs.

Yes, I thought as much.

So it's likely to boil down to either writing your own
streaming parser, or using a streaming parser like expat or
any random SAX parser out there for maximum performance,
and even then you might not get what you need.

OK I'll take a look.

Not enough data. Try it, profile it, there's no other way to
know.

You might want to look into S-expressions as well. You'll
save on overhead, and I believe there are some quite fast
S-expression parsers written in C and C++ out there.

Thanks, again.

..

Lynn · Dec 14, 2007

Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for
ideas on how to parse a data file.

can't you create arrays of C++ structs or classes to hold this data? As for
parsing it, if you don't want to write your own parser there has to be an
abundance of libraries out there you could use out of the box, no?
Efficiency will vary but I can't see why any decent commercial product, if
not your own code, would not be *very* fast

I guess I'm not seeing why you would use XML or XML tools to intermediate
this process when the data is not coming at you in XML and you've given no
indication that you need to out it as XML for other processes to consume ...
?

I dont know XML but I know it parses data in text format.

I have a structured data file of the general form shown below. I dont have
any definition of the data. Basically it looks like it is hierarchical,
token/data pairs defined by brackets and square brackets.

I would like to parse this out into some sort of data object(s) in C++ so
that I can gain programmatic access to the variables.

My app is C++ so the solution must be the same. Also it must be very
lightweight and *very* fast as I must decode multiple pages in realtime.

Would adapting an XML parser to do this be a possible solution?

Any pointers/ideas/references/code snippets/observations appreciated.

TIA

Basic example showing data structure (whitespaces and carriage returns
added by me for clarity).

{

"teacher":{
"name":
"Mr Borat",
"age":
"35",
"Nationality":
"Kazakhstan"},

"Class":{
"Semester":
"Summer",
"Room":
null,
"Subject":
"Politics",
"Notes":
"We're happy, you happy?"},

"Students":
[
{
"Smith":
[{"First Name":"Mary","sex":"Female"}],
"Brown":
[{"First Name":"John","sex":"Male"}],
"Jackson":
[{"First Name":"Jackie","sex":"Female"}]
}
],

"Grades":
[
{
"Test":

[{"grade":"A","points":68},{"grade":"B","points":25},{"grade":"C","points":15}],
"Test":

[{"grade":"C","points":2},{"grade":"B","points":29},{"grade":"A","points":55}],
"Test":

[{"grade":"C","points":2},{"grade":"A","points":72},{"grade":"A","points":65}]
}
]

}

Anthony Jones · Dec 14, 2007

Jasper said:
Does it? Makes sense if that's true. I was sure it fit some sort of "web
format" but I didn't know which.
I presume there must be some sort of C++ code available to parse it out.

It is JSON. You would need to be looking at the Javascript eval method to
parse it. The returned object would then have a heiarchy you could pull
data from e.g.:-

var x = o.Class.Subject

x == "Politics" // will be true

However the structure is somewhat suspect.

The students array contains only one object on which all students are
placed. Each student having their last name as the attribute ID for their
object (what happens if the class is attended by more than one Smith?).
This object is in turn an array containing only one object.

The Grades array suffers the same problem where again inappropriate use of
{ } causes the array to contain only one object and in this case the same
identifier "Test" used multiple times resulting in it being redefined and
only containing the last entry.

Here is a cleaner version (although I'm not entirely happy with the
identifiers "Last Name" and "First Name" containing a space it is legal):-

{

"teacher":{
"name": "Mr Borat",
"age": 35,
"Nationality": "Kazakhstan"
},

"Class":{
"Semester": "Summer",
"Room": null,
"Subject": "Politics",
"Notes": "We're happy, you happy?"
},

"Students":
[
{"Last Name":"Smith",
"First Name":"Mary","sex":"Female"},
{"Last Name":"Brown",
"First Name":"John","sex":"Male"},
{"Last Name":"Jackson",
"First Name":"Jackie","sex":"Female"}
],

"Grades":
[
{"Test":"Name of a Test",
Points: {"A":68,"B":25,"C":15}}
{"Test":"Name of a different test",
Points: {"A":55,"B":29,"C":2}}
{"Test": "Name of yet another test",
Points: {"A":72,"B":65,"C":2}}
]

}

dnovatchev · Dec 15, 2007

The FXSL library has a json-document() function (written entirely in
XSLT
2.0 and using the FXSL's LR parsing framework (also written entirely
in XSLT
2.0) ).

When this transformation:

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:f="http://fxsl.sf.net/"
exclude-result-prefixes="f xs"

>

<xsl:import href="../f/func-json-document.xsl"/>

<xsl

utput omit-xml-declaration="yes" indent="yes"/>

<xsl:variable name="vstrParam" as="xs:string">
{

"teacher":{
"name":
"Mr Borat",
"age":
"35",
"Nationality":
"Kazakhstan"
},

"Class":{
"Semester":
"Summer",
"Room":
"null",
"Subject":
"Politics",
"Notes":
"We're happy, you happy?"
},

"Students":
[
{
"Smith":
[{"First_Name":"Mary","sex":"Female"}],
"Brown":
[{"First_Name":"John","sex":"Male"}],
"Jackson":
[{"First_Name":"Jackie","sex":"Female"}]
}
],

"Grades":
[
{
"Test":
[{"grade":"A","points":68},{"grade":"B","points":25},
{"grade":"C","points":15}],
"Test":
[{"grade":"C","points":2},{"grade":"B","points":29},
{"grade":"A","points":55}],
"Test":
[{"grade":"C","points":2},{"grade":"A","points":72},
{"grade":"A","points":65}]
}
]

}
</xsl:variable>

<xsl:template match="/">
<xsl:sequence select="f:json-document($vstrParam)"/>
</xsl:template>
</xsl:stylesheet>

is applied (containing essentially your original data, with "First
Name"
changed to "First_Name", and null changed to "null

the following result is produced:

<teacher>
<name>Mr Borat</name>
<age>35</age>
<Nationality>Kazakhstan</Nationality>
</teacher>
<Class>
<Semester>Summer</Semester>
<Room>null</Room>
<Subject>Politics</Subject>
<Notes>We're happy, you happy?</Notes>
</Class>
<Students>
<Smith>
<First_Name>Mary</First_Name>
<sex>Female</sex>
</Smith>
<Brown>
<First_Name>John</First_Name>
<sex>Male</sex>
</Brown>
<Jackson>
<First_Name>Jackie</First_Name>
<sex>Female</sex>
</Jackson>
</Students>
<Grades>
<Test>
<grade>A</grade>
<points>68</points>
</Test>
<Test>
<grade>B</grade>
<points>25</points>
</Test>
<Test>
<grade>C</grade>
<points>15</points>
</Test>
<Test>
<grade>C</grade>
<points>2</points>
</Test>
<Test>
<grade>B</grade>
<points>29</points>
</Test>
<Test>
<grade>A</grade>
<points>55</points>
</Test>
<Test>
<grade>C</grade>
<points>2</points>
</Test>
<Test>
<grade>A</grade>
<points>72</points>
</Test>
<Test>
<grade>A</grade>
<points>65</points>
</Test>
</Grades>

One can use json-document() in any XPath expressions, for example,
getting
all female students is as easy as:

f:json-document($vstrParam)/Students/*[sex = 'Female']

and produces:

<Smith>
<First_Name>Mary</First_Name>
<sex>Female</sex>
</Smith>
<Jackson>
<First_Name>Jackie</First_Name>
<sex>Female</sex>
</Jackson>

I will fix the implementation of json-document() to replace whitespace
in
element names with underscores and to process the unquoted string
null.

Cheers,
Dimitre Novatchev

Hi, Maybe this is off-topic, but perhaps you can help. I'm looking for ideas
on how to parse a data file.

I dont know XML but I know it parses data in text format.

I have a structured data file of the general form shown below. I dont have
any definition of the data. Basically it looks like it is hierarchical,
token/data pairs defined by brackets and square brackets.

I would like to parse this out into some sort of data object(s) in C++ so
that I can gain programmatic access to the variables.

My app is C++ so the solution must be the same. Also it must be very
lightweight and *very* fast as I must decode multiple pages in realtime.

Would adapting an XML parser to do this be a possible solution?

Any pointers/ideas/references/code snippets/observations appreciated.

TIA

Basic example showing data structure (whitespaces and carriage returns added
by me for clarity).

{

"teacher":{
"name":
"Mr Borat",
"age":
"35",
"Nationality":
"Kazakhstan"},

"Class":{
"Semester":
"Summer",
"Room":
null,
"Subject":
"Politics",
"Notes":
"We're happy, you happy?"},

"Students":
[
{
"Smith":
[{"First Name":"Mary","sex":"Female"}],
"Brown":
[{"First Name":"John","sex":"Male"}],
"Jackson":
[{"First Name":"Jackie","sex":"Female"}]}

],

"Grades":
[
{
"Test":
[{"grade":"A","points":68},{"grade":"B","points":25},{"grade":"C","points":-15}],
"Test":
[{"grade":"C","points":2},{"grade":"B","points":29},{"grade":"A","points":5-5}],
"Test":
[{"grade":"C","points":2},{"grade":"A","points":72},{"grade":"A","points":6-5}]}

]

}- Hide quoted text -

- Show quoted text -

Anthony Jones · Dec 16, 2007

The FXSL library has a json-document() function (written entirely in
XSLT
2.0 and using the FXSL's LR parsing framework (also written entirely
in XSLT
2.0) ).

When this transformation:

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:f="http://fxsl.sf.net/"
exclude-result-prefixes="f xs"<xsl:import href="../f/func-json-document.xsl"/>

<xslutput omit-xml-declaration="yes" indent="yes"/>

<xsl:variable name="vstrParam" as="xs:string">
{

"teacher":{
"name":
"Mr Borat",
"age":
"35",
"Nationality":
"Kazakhstan"
},

"Class":{
"Semester":
"Summer",
"Room":
"null",
"Subject":
"Politics",
"Notes":
"We're happy, you happy?"
},

"Students":
[
{
"Smith":
[{"First_Name":"Mary","sex":"Female"}],
"Brown":
[{"First_Name":"John","sex":"Male"}],
"Jackson":
[{"First_Name":"Jackie","sex":"Female"}]
}
],

"Grades":
[
{
"Test":
[{"grade":"A","points":68},{"grade":"B","points":25},
{"grade":"C","points":15}],
"Test":
[{"grade":"C","points":2},{"grade":"B","points":29},
{"grade":"A","points":55}],
"Test":
[{"grade":"C","points":2},{"grade":"A","points":72},
{"grade":"A","points":65}]
}
]

}
</xsl:variable>

<xsl:template match="/">
<xsl:sequence select="f:json-document($vstrParam)"/>
</xsl:template>
</xsl:stylesheet>

is applied (containing essentially your original data, with "First
Name"
changed to "First_Name", and null changed to "null

the following result is produced:

<teacher>
<name>Mr Borat</name>
<age>35</age>
<Nationality>Kazakhstan</Nationality>
</teacher>
<Class>
<Semester>Summer</Semester>
<Room>null</Room>
<Subject>Politics</Subject>
<Notes>We're happy, you happy?</Notes>
</Class>
<Students>
<Smith>
<First_Name>Mary</First_Name>
<sex>Female</sex>
</Smith>
<Brown>
<First_Name>John</First_Name>
<sex>Male</sex>
</Brown>
<Jackson>
<First_Name>Jackie</First_Name>
<sex>Female</sex>
</Jackson>
</Students>
<Grades>
<Test>
<grade>A</grade>
<points>68</points>
</Test>
<Test>
<grade>B</grade>
<points>25</points>
</Test>
<Test>
<grade>C</grade>
<points>15</points>
</Test>
<Test>
<grade>C</grade>
<points>2</points>
</Test>
<Test>
<grade>B</grade>
<points>29</points>
</Test>
<Test>
<grade>A</grade>
<points>55</points>
</Test>
<Test>
<grade>C</grade>
<points>2</points>
</Test>
<Test>
<grade>A</grade>
<points>72</points>
</Test>
<Test>
<grade>A</grade>
<points>65</points>
</Test>
</Grades>

One can use json-document() in any XPath expressions, for example,
getting
all female students is as easy as:

f:json-document($vstrParam)/Students/*[sex = 'Female']

and produces:

<Smith>
<First_Name>Mary</First_Name>
<sex>Female</sex>
</Smith>
<Jackson>
<First_Name>Jackie</First_Name>
<sex>Female</sex>
</Jackson>

I will fix the implementation of json-document() to replace whitespace
in
element names with underscores and to process the unquoted string
null.

The question arises as to whether the output XML should represent the data
that would be available in the set of generated objects had the JSON been
eval'd?

Perhaps the Grades section should look like this:-

<Grades>
<Test>
<grade>C</grade>
<points>2</points>
</Test>
<Test>
<grade>A</grade>
<points>72</points>
</Test>
<Test>
<grade>A</grade>
<points>65</points>
</Test>
</Grades>

since only this data would appear in the an eval of the JSON?

dnovatchev · Dec 17, 2007

The question arises as to whether the output XML should represent the data

that would be available in the set of generated objects had the JSON been
eval'd?

Perhaps the Grades section should look like this:-

<Grades>
<Test>
<grade>C</grade>
<points>2</points>
</Test>
<Test>
<grade>A</grade>
<points>72</points>
</Test>
<Test>
<grade>A</grade>
<points>65</points>
</Test>
</Grades>

since only this data would appear in the an eval of the JSON?

The answer is clearly: No.

It is the definition of JSON (and the convertors from XML to JSON use
this) that a sequence of repeating xml elements with the same name are
represented as an ARRAY in JSON.

We don't care what an JScript interpreter would do with the data, but
we must implement a truthful and lossless conversion. Not producing
all <test /> and <grade /> elements results in data loss.

Cheers,
Dimitre Novatchev

The FXSL library has a json-document() function (written entirely in
XSLT
2.0 and using the FXSL's LR parsing framework (also written entirely
in XSLT
2.0) ).

Click to expand...

When this transformation:

Click to expand...

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:f="http://fxsl.sf.net/"
exclude-result-prefixes="f xs"

Click to expand...

<xsl:import href="../f/func-json-document.xsl"/>

Click to expand...

<xslutput omit-xml-declaration="yes" indent="yes"/>

Click to expand...

<xsl:variable name="vstrParam" as="xs:string">
{

Click to expand...

"teacher":{
"name":
"Mr Borat",
"age":
"35",
"Nationality":
"Kazakhstan"
},

Click to expand...

"Class":{
"Semester":
"Summer",
"Room":
"null",
"Subject":
"Politics",
"Notes":
"We're happy, you happy?"
},

"Students":
[
{
"Smith":
[{"First_Name":"Mary","sex":"Female"}],
"Brown":
[{"First_Name":"John","sex":"Male"}],
"Jackson":
[{"First_Name":"Jackie","sex":"Female"}]
}
],

Click to expand...

"Grades":
[
{
"Test":
[{"grade":"A","points":68},{"grade":"B","points":25},
{"grade":"C","points":15}],
"Test":
[{"grade":"C","points":2},{"grade":"B","points":29},
{"grade":"A","points":55}],
"Test":
[{"grade":"C","points":2},{"grade":"A","points":72},
{"grade":"A","points":65}]
}
]

Click to expand...

}
</xsl:variable>

Click to expand...

<xsl:template match="/">
<xsl:sequence select="f:json-document($vstrParam)"/>
</xsl:template>
</xsl:stylesheet>

Click to expand...

is applied (containing essentially your original data, with "First
Name"
changed to "First_Name", and null changed to "null

Click to expand...

the following result is produced:

Click to expand...

<teacher>
<name>Mr Borat</name>
<age>35</age>
<Nationality>Kazakhstan</Nationality>
</teacher>
<Class>
<Semester>Summer</Semester>
<Room>null</Room>
<Subject>Politics</Subject>
<Notes>We're happy, you happy?</Notes>
</Class>
<Students>
<Smith>
<First_Name>Mary</First_Name>
<sex>Female</sex>
</Smith>
<Brown>
<First_Name>John</First_Name>
<sex>Male</sex>
</Brown>
<Jackson>
<First_Name>Jackie</First_Name>
<sex>Female</sex>
</Jackson>
</Students>
<Grades>
<Test>
<grade>A</grade>
<points>68</points>
</Test>
<Test>
<grade>B</grade>
<points>25</points>
</Test>
<Test>
<grade>C</grade>
<points>15</points>
</Test>
<Test>
<grade>C</grade>
<points>2</points>
</Test>
<Test>
<grade>B</grade>
<points>29</points>
</Test>
<Test>
<grade>A</grade>
<points>55</points>
</Test>
<Test>
<grade>C</grade>
<points>2</points>
</Test>
<Test>
<grade>A</grade>
<points>72</points>
</Test>
<Test>
<grade>A</grade>
<points>65</points>
</Test>
</Grades>

Click to expand...

One can use json-document() in any XPath expressions, for example,
getting
all female students is as easy as:

Click to expand...

f:json-document($vstrParam)/Students/*[sex = 'Female']

Click to expand...

and produces:

I will fix the implementation of json-document() to replace whitespace
in
element names with underscores and to process the unquoted string
null.

Click to expand...

dnovatchev · Dec 17, 2007

I also think that a more appropriate JSON representation than:

"Grades":
[
{
"Test":
[{"grade":"A","points":68},{"grade":"B","points":25},
{"grade":"C","points":15}],
"Test":
[{"grade":"C","points":2},{"grade":"B","points":29},
{"grade":"A","points":55}],
"Test":
[{"grade":"C","points":2},{"grade":"A","points":72},
{"grade":"A","points":65}]
}
]

should have been:

"Grades":

{
"Test":
[
{"grade":"A","points":68,"grade":"B","points":
25,"grade":"C","points":15},

{"grade":"C","points":2, "grade":"B","points":29,
"grade":"A","points":55},

{"grade":"C","points":2, "grade":"A","points":72,
"grade":"A","points":65}
]
}

Also, instead of:

"Students":
[
{
"Smith":
[{"First Name":"Mary","sex":"Female"}],
"Brown":
[{"1First Name":"John","sex":"Male"}],
"Jackson":
[{"2First Name":"Jackie","sex":"Female"}]
}
],

it is better to have just:

"Students":
{
"Smith":
{"First Name":"Mary","sex":"Female"},
"Brown":
{"1First Name":"John","sex":"Male"},
"Jackson":
{"2First Name":"Jackie","sex":"Female"}
}
,

Maybe, the original data was produced by a faulty XML --> JSON
convertor.

BTW, I have updated the FXSL CVS with the newest f:json-document(),
which correctly produces XML element names from any JSON string.

The correct treatment of null will follow shortly.

Cheers,
Dimitre Novatchev

The FXSL library has a json-document() function (written entirely in
XSLT
2.0 and using the FXSL's LR parsing framework (also written entirely
in XSLT
2.0) ).

Click to expand...

When this transformation:

Click to expand...

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:f="http://fxsl.sf.net/"
exclude-result-prefixes="f xs"

Click to expand...

<xsl:import href="../f/func-json-document.xsl"/>

Click to expand...

<xslutput omit-xml-declaration="yes" indent="yes"/>

Click to expand...

<xsl:variable name="vstrParam" as="xs:string">
{

Click to expand...

"teacher":{
"name":
"Mr Borat",
"age":
"35",
"Nationality":
"Kazakhstan"
},

Click to expand...

"Class":{
"Semester":
"Summer",
"Room":
"null",
"Subject":
"Politics",
"Notes":
"We're happy, you happy?"
},

"Students":
[
{
"Smith":
[{"First_Name":"Mary","sex":"Female"}],
"Brown":
[{"First_Name":"John","sex":"Male"}],
"Jackson":
[{"First_Name":"Jackie","sex":"Female"}]
}
],

Click to expand...

"Grades":
[
{
"Test":
[{"grade":"A","points":68},{"grade":"B","points":25},
{"grade":"C","points":15}],
"Test":
[{"grade":"C","points":2},{"grade":"B","points":29},
{"grade":"A","points":55}],
"Test":
[{"grade":"C","points":2},{"grade":"A","points":72},
{"grade":"A","points":65}]
}
]

Click to expand...

}
</xsl:variable>

Click to expand...

<xsl:template match="/">
<xsl:sequence select="f:json-document($vstrParam)"/>
</xsl:template>
</xsl:stylesheet>

Click to expand...

is applied (containing essentially your original data, with "First
Name"
changed to "First_Name", and null changed to "null

Click to expand...

the following result is produced:

Click to expand...

<teacher>
<name>Mr Borat</name>
<age>35</age>
<Nationality>Kazakhstan</Nationality>
</teacher>
<Class>
<Semester>Summer</Semester>
<Room>null</Room>
<Subject>Politics</Subject>
<Notes>We're happy, you happy?</Notes>
</Class>
<Students>
<Smith>
<First_Name>Mary</First_Name>
<sex>Female</sex>
</Smith>
<Brown>
<First_Name>John</First_Name>
<sex>Male</sex>
</Brown>
<Jackson>
<First_Name>Jackie</First_Name>
<sex>Female</sex>
</Jackson>
</Students>
<Grades>
<Test>
<grade>A</grade>
<points>68</points>
</Test>
<Test>
<grade>B</grade>
<points>25</points>
</Test>
<Test>
<grade>C</grade>
<points>15</points>
</Test>
<Test>
<grade>C</grade>
<points>2</points>
</Test>
<Test>
<grade>B</grade>
<points>29</points>
</Test>
<Test>
<grade>A</grade>
<points>55</points>
</Test>
<Test>
<grade>C</grade>
<points>2</points>
</Test>
<Test>
<grade>A</grade>
<points>72</points>
</Test>
<Test>
<grade>A</grade>
<points>65</points>
</Test>
</Grades>

Click to expand...

One can use json-document() in any XPath expressions, for example,
getting
all female students is as easy as:

Click to expand...

f:json-document($vstrParam)/Students/*[sex = 'Female']

Click to expand...

and produces:

I will fix the implementation of json-document() to replace whitespace
in
element names with underscores and to process the unquoted string
null.

Click to expand...

The question arises as to whether the output XML should represent the data
that would be available in the set of generated objects had the JSON been
eval'd?

Perhaps the Grades section should look like this:-

<Grades>
<Test>
<grade>C</grade>
<points>2</points>
</Test>
<Test>
<grade>A</grade>
<points>72</points>
</Test>
<Test>
<grade>A</grade>
<points>65</points>
</Test>
</Grades>

since only this data would appear in the an eval of the JSON?

Anthony Jones · Dec 22, 2007

The answer is clearly: No.

Oh, I thought the raison d'être behind JSON was that a data structure could
be serialised to a string that could be passed to Javascript and
re-assembled easily by using the Eval statement.

It is the definition of JSON (and the convertors from XML to JSON use
this) that a sequence of repeating xml elements with the same name are
represented as an ARRAY in JSON.

Is there a spec? Where does it say that?

We don't care what an JScript interpreter would do with the data, but
we must implement a truthful and lossless conversion. Not producing
all <test /> and <grade /> elements results in data loss.

Agreed. I'm willing to be shown wrong on this but if you're right than JSON
is bust and pointless.

Anthony Jones · Dec 22, 2007

I also think that a more appropriate JSON representation than:

"Grades":
[
{
"Test":
[{"grade":"A","points":68},{"grade":"B","points":25},
{"grade":"C","points":15}],
"Test":
[{"grade":"C","points":2},{"grade":"B","points":29},
{"grade":"A","points":55}],
"Test":
[{"grade":"C","points":2},{"grade":"A","points":72},
{"grade":"A","points":65}]
}
]

should have been:

"Grades":

{
"Test":
[
{"grade":"A","points":68,"grade":"B","points":
25,"grade":"C","points":15},

{"grade":"C","points":2, "grade":"B","points":29,
"grade":"A","points":55},

{"grade":"C","points":2, "grade":"A","points":72,
"grade":"A","points":65}
]
}

We're just guessing at the intent but that appears to be an object called
Grades that contains just one member an array called Test containing what
appears to be grades required to pass each test. Seems a little convoluted
and how is each test identified? Ordinal position?

Also, instead of:

"Students":
[
{
"Smith":
[{"First Name":"Mary","sex":"Female"}],
"Brown":
[{"1First Name":"John","sex":"Male"}],
"Jackson":
[{"2First Name":"Jackie","sex":"Female"}]
}
],

it is better to have just:

"Students":
{
"Smith":
{"First Name":"Mary","sex":"Female"},
"Brown":
{"1First Name":"John","sex":"Male"},
"Jackson":
{"2First Name":"Jackie","sex":"Female"}
}
,

And if you have two students with the last name Smith? Smith magically
becomes an array?

Maybe, the original data was produced by a faulty XML --> JSON
convertor.

Its difficult to make sense of what appears to be faulty both as JSON and as
a logical structure.

Efficiently Parsing Data	9	Dec 14, 2007
a little parsing challenge â˜º	70	Jul 17, 2011
reading in and parsing through a binary file	9	Feb 2, 2009
GET NEIL DEGRASSES TYSON, I ripped a hole with this one...	0	Nov 10, 2022
How to read in data from a non-xml text file in Java	0	Mar 18, 2011
thoughts on a more generic Array#partition function	13	Jul 3, 2008
loosing data while parsing xml with expat	0	Nov 19, 2003
Write a program that administers a quiz	5	Mar 17, 2005

Parsing a generic data file

Jasper

Pavel Lepin

msnews.microsoft.com

Jasper

Jasper

Jasper

Lynn

Anthony Jones

dnovatchev

Anthony Jones

dnovatchev

dnovatchev

Anthony Jones

Anthony Jones

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads