parsing XHTML fragment using JavaSCript


J

jackwootton

Hello,

I register a mutation listener on a DIV element. When a node is
inserted to the DIV I use the event object passed to my listener to
retrieve the node inserted. It could look like this.

<span class="message">
<b>Name</b>
<p>Hello World!</p>
</span>

Currently I parse this XHTML fragment using various DOM methods such
as
hasChildNodes(), nextSibling, previousSibling, etc (you get the idea).
However it is possible that the XHTML fragment that is inserted may
change, for example once every 2 months. I have attempted to make my
JavaScript as loosely coupled with the XHTML fragment as possible, but
do not feel this goes far enough. I would like to have some kind of
document, perhaps DTD, that I can use with JavaScript to parse the
XHTML
fragment. Therefore if the node inserted is changed to, for example:

<div class="message">
<b>Name</b>
<span>Hello World!</span>
</div>

Then all I would have to do would be to change the DTD (or whatever)
to
reflect the new structure, and the program would parse it correctly.

I hope I have made sense, and someone can help.

Many thanks,

Jack
 
Ad

Advertisements

T

Thomas 'PointedEars' Lahn

I register a mutation listener on a DIV element. When a node is
inserted to the DIV I use the event object passed to my listener to
retrieve the node inserted. It could look like this.

<span class="message">
<b>Name</b>
<p>Hello World!</p>
</span>

Currently I parse this XHTML fragment using various DOM methods such
as hasChildNodes(), nextSibling, previousSibling, etc (you get the idea).
However it is possible that the XHTML fragment that is inserted may
change, for example once every 2 months. I have attempted to make my
JavaScript as loosely coupled with the XHTML fragment as possible, but
do not feel this goes far enough. I would like to have some kind of
document, perhaps DTD, that I can use with JavaScript to parse the
XHTML fragment. Therefore if the node inserted is changed to, for
example:

<div class="message">
<b>Name</b>
<span>Hello World!</span>
</div>

Then all I would have to do would be to change the DTD (or whatever)

A DTD defines what element types, attributes and attribute values are
possible in what context of markup. It does not define which element
types, attributes or values are used.
to reflect the new structure, and the program would parse it correctly.

I don't see any parsing.
I hope I have made sense,

Unfortunately, you did not. First, Mutation Events are implemented
client-side, and it is unlikely that anyone will look at the document
for more than several hours, let alone more than 2 months. Second,
what are you actually trying to do when the event occurs? Third, why
do you do that?
and someone can help.

I know of no browser that uses and no DOM that provides a validating XML
parser, so you would not need to change anything to parse that with an
XML parser. But you don't even do that. Sorry, I really don't see the
point of your question.


PointedEars
 
J

jackwootton

A DTD defines what element types, attributes and attribute values are
possible in what context of markup. It does not define which element
types, attributes or values are used.


I don't see any parsing.


Unfortunately, you did not. First, Mutation Events are implemented
client-side, and it is unlikely that anyone will look at the document
for more than several hours, let alone more than 2 months. Second,
what are you actually trying to do when the event occurs? Third, why
do you do that?


I know of no browser that uses and no DOM that provides a validating XML
parser, so you would not need to change anything to parse that with an
XML parser. But you don't even do that. Sorry, I really don't see the
point of your question.

PointedEars

Firslty, thank you for replying. I hope I can be more clear.

1. I never said that a DTD is what I needed, or stated what it did. I
was using it as an example of some kind of file that describes an
valid markup in an XML file.

2. The assumptions you make about people not wanting to look at a
document for longer than a couple of hours let alone a couple of
months are irrelevant My question states the problem. I need to
overcome this problem. I have not even said that anyone wil be
looking at the document, or how it will be used, or why.

3. I understand that mutation events are client side. JavaScript is
client side, and therefore I safely assumed that registering a
mutation event using JavaScript on an XHTML object would also be
client side.

4. The point of my question is that using DOM methods to parse XHTML /
XML is too tightly coupled to the structure of the markup. I do not
find it a satisfactory solution and am therefore looking for a better
one. Perhaps one that can use some kind document that dictates the
possible structure of the nodes to be inserted., and that updating fo
this document (or whatever) wold allow the JavaScript to successfully
parse different XHTML fragments.

I hope this helps you.

Many thanks,

Jack
 
T

Thomas 'PointedEars' Lahn

(e-mail address removed) wrote:
[Full quote]

Firslty, thank you for replying. I hope I can be more clear.

I hope you learn to quote eventually; that would make a possible discussion
much easier. http://jibbering.com/faq/#FAQ2_3 pp. Quote style fixed for
the following.
A DTD defines what element types, attributes and attribute values are
possible in what context of markup. It does not define which element
types, attributes or values are used.

1. I never said that a DTD is what I needed [...]

Yes, you did propose that as a solution.
I was using it as an example of some kind of file that describes an
valid markup in an XML file.

A DTD is not suitable for that, so I pointed that out.
I register a mutation listener on a DIV element. When a node is
inserted to the DIV I use the event object passed to my listener to
retrieve the node inserted. It could look like this.
<span class="message">
<b>Name</b>
<p>Hello World!</p>
</span>
Currently I parse this XHTML fragment using various DOM methods such
as hasChildNodes(), nextSibling, previousSibling, etc (you get the
idea).
However it is possible that the XHTML fragment that is inserted may
change, for example once every 2 months.

[...] Mutation Events are implemented client-side, and it is unlikely
that anyone will look at the document for more than several hours, let
alone more than 2 months.

2. The assumptions you make about people not wanting to look at a
document for longer than a couple of hours let alone a couple of
months are irrelevant [...]

Sorry, my bad.
My question states the problem.

Unfortunately, it does not. It states the problems you are having with
your solution to the problem. See below.
3. I understand that mutation events are client side. JavaScript is
client side, and therefore I safely assumed that registering a
mutation event using JavaScript on an XHTML object would also be
client side.

It is, on an XHTML _element_. But currently there is sparse support for
XHTML and that event type at best.
4. The point of my question is that using DOM methods to parse XHTML /
XML is too tightly coupled to the structure of the markup. I do not
find it a satisfactory solution and am therefore looking for a better
one. Perhaps one that can use some kind document that dictates the
possible structure of the nodes to be inserted., and that updating fo
this document (or whatever) wold allow the JavaScript to successfully
parse different XHTML fragments.

Maybe XPath serves. Other that that, I can only think of designing your own
parsing language.

But again, it would greatly help to understand why you deem parsing to be
necessary in the first place. IOW: What it is that you trying to accomplish
here?


PointedEars
 
J

jackwootton

(e-mail address removed) wrote:
[Full quote]
Firslty, thank you for replying. I hope I can be more clear.

I hope you learn to quote eventually; that would make a possible discussion
much easier. http://jibbering.com/faq/#FAQ2_3pp. Quote style fixed for
the following.

Assuming I don't know how to quote and wasting time mentioning it
really only serves the purpose of trying to make it look like I've
done something wrong, or am somehow stupid for not quoting. It's what
I think is called a cheap-shot, normally reserved for the
playground.

1. I never said that a DTD is what I needed [...]

Yes, you did propose that as a solution.

I used the words '(or whatever)' to signify that I was using a DTD as
an example of something with may in someway present a solution to my
problem. I did not say 'I want to use a DTD to solve my problem', I
also did not say 'How can I use a DTD to solve my problem', or 'Using
a DTD does not solve my problem, why not?', and so I could go on. I
quite clearly used it as an example of the type / general area of a
solution I was looking for.
A DTD is not suitable for that, so I pointed that out.

See above.
I register a mutation listener on a DIV element. When a node is
inserted to the DIV I use the event object passed to my listener to
retrieve the node inserted. It could look like this.
<span class="message">
<b>Name</b>
<p>Hello World!</p>
</span>
Currently I parse this XHTML fragment using various DOM methods such
as hasChildNodes(), nextSibling, previousSibling, etc (you get the
idea).
However it is possible that the XHTML fragment that is inserted may
change, for example once every 2 months.
[...] Mutation Events are implemented client-side, and it is unlikely
that anyone will look at the document for more than several hours, let
alone more than 2 months.
2. The assumptions you make about people not wanting to look at a
document for longer than a couple of hours let alone a couple of
months are irrelevant [...]

Sorry, my bad.
My question states the problem.

Unfortunately, it does not. It states the problems you are having with
your solution to the problem. See below.

Yes, it does state the problems I am having with my solution (as you
put it). And then I go on to say that I therefore require a better
solution, perhaps based on something similar as a DTD. So we can see
from my post that I present firstly the problem, and then I suggest
the type of solution I'm looking for, and then ask if anyone could
help.
It is, on an XHTML _element_. But currently there is sparse support for
XHTML and that event type at best.

Yes, but since I am using it you can safely assume that I have read
about the sparse support, and what more, that it works and serves my
purpose. Again, criticise my use of mutation listeners is an attempt
at another cheap-shot which has very little to do with my original
question.
Maybe XPath serves. Other that that, I can only think of designing your own
parsing language.

Yes XPath has come up a few times on another mailing list, but the
expression to extract data would still be really quite tightly coupled
with the markup itself. What's more is the expression is defined in
the code, something I would like to try and avoid (which was my reason
for mentioning a DTD, since it's a separate file).
But again, it would greatly help to understand why you deem parsing to be
necessary in the first place. IOW: What it is that you trying to accomplish
here?

I need to parse the markup because I need to extract data from it.
 
P

pr

Yes XPath has come up a few times on another mailing list, but the
expression to extract data would still be really quite tightly coupled
with the markup itself. What's more is the expression is defined in
the code, something I would like to try and avoid (which was my reason
for mentioning a DTD, since it's a separate file).

Why not XSLT? That way you could guarantee to receive the same XML in
your javascript every time because your stylesheet would handle the
conversion. It's a lot simpler to write XSLT/XPath rules than to use
equivalent DOM methods and you can run a transformation from script on
upwards of very small units of XML/XHTML.
 
Ad

Advertisements

J

Jack

Why not XSLT? That way you could guarantee to receive the same XML in
your javascript every time because your stylesheet would handle the
conversion. It's a lot simpler to write XSLT/XPath rules than to use
equivalent DOM methods and you can run a transformation from script on
upwards of very small units of XML/XHTML.

Hello,

Just to check I understand. I would create an XML document from the
XHTML fragment I receive. I then use XSLT to transform this
(temporary) XML document into another XML document (who's structure
remains the same). I then use JavaScript to parse the resulting XML
document.

Have I understood your suggestion?
 
P

pr

Jack said:
Just to check I understand. I would create an XML document from the
XHTML fragment I receive. I then use XSLT to transform this
(temporary) XML document into another XML document (who's structure
remains the same). I then use JavaScript to parse the resulting XML
document.

Have I understood your suggestion?

Yes, that's it.
 
J

Jack

Yes, that's it.

Sorry for the delay regarding this matter (I have moved to Coventry
from London since I last posted).

I have 1 possible problem with the solution. If the mutation event
occurs regularly, perhaps 1 every second, or maybe even every 0.5
seconds, then surely the overhead involved in your proposed solution
would be too great?

Jack
 
P

pr

Jack wrote:
[...]
I have 1 possible problem with the solution. If the mutation event
occurs regularly, perhaps 1 every second, or maybe even every 0.5
seconds, then surely the overhead involved in your proposed solution
would be too great?

I would have thought half a second would be fine. You're transforming a
minuscule XML fragment using a minuscule stylesheet, which in any case
would need to be loaded just once and then could be run repeatedly. I'm
guessing it wouldn't differ much from using the DOM to do the same,
although trial and error is the only way I can suggest to prove that.
 
D

David Golightly

Hello,

I register a mutation listener on a DIV element. When a node is
inserted to the DIV I use the event object passed to my listener to
retrieve the node inserted. It could look like this.

So before talking about solutions, let's talk about your problem.

Your script is somehow getting its hands on an XHTML fragment that is
mostly the same, but might vary once in a while. Once you get the
fragment, what are you hoping to retrieve from it? Parsing implies
some intended use, whether compiling, rendering, scraping, or data
retrieval of some kind - you don't just parse for its own sake. So
stepping back from the task of parsing, what exactly do you hope to
get out of this? Put it another way: if you were to have a function
that does what you want, and takes the XHTML fragment as its input,
what would the output be? Then we can talk about how to get there.

-David
 
Ad

Advertisements

J

Jack

So before talking about solutions, let's talk about your problem.

Your script is somehow getting its hands on anXHTMLfragment that is
mostly the same, but might vary once in a while. Once you get the
fragment, what are you hoping to retrieve from it? Parsingimplies
some intended use, whether compiling, rendering, scraping, or data
retrieval of some kind - you don't just parse for its own sake. So
stepping back from the task ofparsing, what exactly do you hope to
get out of this? Put it another way: if you were to have a function
that does what you want, and takes theXHTMLfragment as its input,
what would the output be? Then we can talk about how to get there.

-David

Hello David,

Thank you for responding.

My aim is to extract the data from the XHTML fragment. The data can
be both attribute values, and actual node values. I then store the
data in a custom JavaScript object. The object is then stored in an
array, along with any other objects that were put there from previous
mutation events.

I end up with an array of objects, storing data from a sequence of
mutation events.

Jack
 
J

Jack

So before talking about solutions, let's talk about your problem.

Your script is somehow getting its hands on an XHTML fragment that is
mostly the same, but might vary once in a while. Once you get the
fragment, what are you hoping to retrieve from it? Parsing implies
some intended use, whether compiling, rendering, scraping, or data
retrieval of some kind - you don't just parse for its own sake. So
stepping back from the task of parsing, what exactly do you hope to
get out of this? Put it another way: if you were to have a function
that does what you want, and takes the XHTML fragment as its input,
what would the output be? Then we can talk about how to get there.

-David


Hello David,

Thank you for responding.

I would like to extract data contained in the XHTML fragment. The
data may come from attributes or nested nodes, or leaf node values.
The aim is to store the newly extracted data in a custom JavaScript
object. The object will then be stored in an array. This way I
should end up with an array of data, modeled in a JavaScript custom
object, that contains all data linked with the mutation event.

Jack
 
J

Jack

So before talking about solutions, let's talk about your problem.

Your script is somehow getting its hands on anXHTMLfragmentthat is
mostly the same, but might vary once in a while. Once you get thefragment, what are you hoping to retrieve from it? Parsing implies
some intended use, whether compiling, rendering, scraping, or data
retrieval of some kind - you don't just parse for its own sake. So
stepping back from the task of parsing, what exactly do you hope to
get out of this? Put it another way: if you were to have a function
that does what you want, and takes theXHTMLfragmentas its input,
what would the output be? Then we can talk about how to get there.

-David

Hello David,

Thank you for responding.

I would like to extract data contained in the XHTML fragment. The
data may come from attributes or nested nodes, or leaf node values.
The aim is to store the newly extracted data in a custom JavaScript
object. The object will then be stored in an array. This way I
should end up with an array of data, modeled in a JavaScript custom
object, that contains all data linked with the mutation event.

Jack
 
Ad

Advertisements

D

David Golightly

I would like to extract data contained in the XHTML fragment. The
data may come from attributes or nested nodes, or leaf node values.
The aim is to store the newly extracted data in a custom JavaScript
object. The object will then be stored in an array. This way I
should end up with an array of data, modeled in a JavaScript custom
object, that contains all data linked with the mutation event.

So, since the structure of the (X)HTML may change without notice, how
do you plan on managing those changes? No matter what method you
choose, you will need to be able to adjust dynamically when a change
is made to take it into account. Will the hierarchy itself change, or
just tag names? I guess to get any further, we need a sense of what
we can rely on NOT changing, and a sense for the ways in which the
fragment MAY change. We need a set of rules to follow. Define the
problem in more detail than "I need to extract data from an XHTML
fragment" - specify *what* that data should be and *how* that fragment
might look, and we can approach a solution.

If you can count on the CSS classes and heirarchy being consistent
while intermediate nodes and tag names change, and can afford the
extra 20K+ overhead, you might look into one of the many fine CSS
selector engines out there. Jack Slocum's Ext-JS engine is
lightweight and performant, if you really need the flexibility of CSS
selectors over straight DOM traversal. But if you can specify your
requirements more rigidly, you can probably come up with a solution
that's tailored to your needs and more lightweight and performant even
than that.

Barring the above, you best solution, in your JS, is to have an object
with a method for each piece of information to extract. Each method
knows how to extract its piece of information. If the fragment
structure changes, just change the corresponding method(s)
appropriately:

var fragmentExtractor = {
// *frag* is the root DOM node of your fragment
getMessage: function (frag) {
return frag.getElementsByTagName('p')
[0].childNodes[0].nodeValue;
},

getOtherThing: function (frag) {
// do more stuff
}
};

-David
 

Top