Extract links from Javascript (not using Javascript)?

C

chrisspencer02

I am looking for a method to extract the links embedded within the
Javascript in a web page: an ActiveX component, or example code in
C++/Pascal/etc. I am looking for a general solution, not one tailored
to a particular page/script.

Hopefully, the problem can be solved without recreating a complete
Javascript interpreter. Any ideas?
 
I

Ira Baxter

I am looking for a method to extract the links embedded within the
Javascript in a web page: an ActiveX component, or example code in
C++/Pascal/etc. I am looking for a general solution, not one tailored
to a particular page/script.

Hopefully, the problem can be solved without recreating a complete
Javascript interpreter. Any ideas?

If you expect to have any chance at getting at links that are anything
that other coded directly in a string liveral, you will need at least a full
JavaScript parser.
See http://www.semanticdesigns.com/Products/FrontEnds/index.html
for a JavaScript front end that is designed to be used in custom tasks
like this.
 
R

Randy Webb

(e-mail address removed) said the following on 5/26/2006 3:03 PM:
I am looking for a method to extract the links embedded within the
Javascript in a web page: an ActiveX component, or example code in
C++/Pascal/etc. I am looking for a general solution, not one tailored
to a particular page/script.

There are too many possibilities to deal with for a solution to that
question to be simple and/or general. Just too many ways that a URL can
be put together in script.

Can you give a general example of what you are trying to do though?
 
R

Randy Webb

Ira Baxter said the following on 5/26/2006 3:44 PM:
If you expect to have any chance at getting at links that are anything
that other coded directly in a string liveral, you will need at least a full
JavaScript parser.

And even that is not a guarantee of success.
See http://www.semanticdesigns.com/Products/FrontEnds/index.html
for a JavaScript front end that is designed to be used in custom tasks
like this.

It is designed to parse out any and all URL's that a document possesses?

I find that a dubious claim.
 
C

chrisspencer02

Randy said:
(e-mail address removed) said the following on 5/26/2006 3:03 PM:

There are too many possibilities to deal with for a solution to that
question to be simple and/or general. Just too many ways that a URL can
be put together in script.

Can you give a general example of what you are trying to do though?

I would like to transform web pages "in the wild" into tables of links
for a site map, regardless of whether those links are encoded in HTML,
CSS, Flash, Javascript, etc. Sounds like this is not possible,
particularly for event-driven aspects of the script like rollover image
menus?
 
R

Randy Webb

(e-mail address removed) said the following on 5/26/2006 8:44 PM:
I would like to transform web pages "in the wild" into tables of links
for a site map, regardless of whether those links are encoded in HTML,
CSS, Flash, Javascript, etc. Sounds like this is not possible,
particularly for event-driven aspects of the script like rollover image
menus?

It could be done with regards to the CSS, HTML, and JS aspects, but it
wouldn't be a pretty task to try to accomplish. Just trying to resolve
relative paths would be a major headache.
 
T

Thomas 'PointedEars' Lahn

Obviously you are not yet sure what to use, so a newsgroup dedicated to a
certain (group of) language(s), like this one, is not the place to start.
Try comp.infosystems.www.authoring.misc, or comp.lang.misc.
I would like to transform web pages "in the wild" into tables of links
for a site map,

A site map is best implemented using lists (in [X]HTML: `ul' and `ol'
elements), not tables. A table is a table is a table. [psf 3.8]
regardless of whether those links are encoded in HTML, CSS, Flash,
Javascript, etc. Sounds like this is not possible,

It is possible to a certain point (I don't think decompiling Flash is
possible easily). There is software for that already (Web spiders),
and you could use its output.
particularly for event-driven aspects of the script like rollover image
menus?

The rollover effect has to take place on existing markup, so it does not
matter here. You will have difficulties to recognize not gracefully
degrading client-side generated menus, and those that use pseudo-links
like (<a href="javascript:somefunction()">...</a>), though.

Which also tells you that unless you are using server-side J(ava)Script,
J(ava)Script is not the appropriate language for generating the site map.
However, e.g. it can help with letting the user expand/collapse it later.


PointedEars
 
C

chrisspencer02

Thomas said:
Obviously you are not yet sure what to use, so a newsgroup dedicated to a
certain (group of) language(s), like this one, is not the place to start.
Try comp.infosystems.www.authoring.misc, or comp.lang.misc.

I am not *unsure* what language to use to solve this problem; actually
I don't care. My question is about algorithms for parsing and
interpreting Javascript.

I would like to transform web pages "in the wild" into tables of links
for a site map,

A site map is best implemented using lists (in [X]HTML: `ul' and `ol'
elements), not tables. A table is a table is a table. [psf 3.8]

I do not mean "table" as in HTML table, but "table" as in raw data set.

It is possible to a certain point (I don't think decompiling Flash is
possible easily). There is software for that already (Web spiders),
and you could use its output.

Have you used any that actually extract links from Javascript? I have
not, though I know some claim to do so.

The rollover effect has to take place on existing markup, so it does not
matter here. You will have difficulties to recognize not gracefully
degrading client-side generated menus, and those that use pseudo-links
like (<a href="javascript:somefunction()">...</a>), though.

Which also tells you that unless you are using server-side J(ava)Script,
J(ava)Script is not the appropriate language for generating the site map.
However, e.g. it can help with letting the user expand/collapse it later.

Again, I am not looking to write a solution *in* Javascript
(necessarily), I am looking to read links *from* Javascript using
whatever tools are available.
 
T

Thomas 'PointedEars' Lahn

I am not *unsure* what language to use to solve this problem; actually
I don't care. My question is about algorithms for parsing and
interpreting Javascript.

Interpretation of "Javascript" would first include the recognition that
there are different implementations of ECMAScript: JavaScript, JScript,
Opera-ECMAScript, KJS; just to name the most widely distributed ones.

Whether script code executes or not, i.e. whether there is a "link" or
not, would depend entirely on how tight something is coded to a specific
implementation, let alone a specific execution environment or, object
model.

Second, if you would stick to strictly ECMAScript-conforming code as
should be expected by an interoperable Web site that is to be parsed,
the matter of interpretation includes how you want to recognize what
is a "link" or not. Because

var img = new Image();
img.src = "foo";

could be considered a link (to an image resource named `foo').

var img = new Object();
img.src = "foo";

could not.

As for recognizing links and pseudo-links such as

function updateFrame(o)
{
var f = window.parent.frames['foo'];
if (f && f.document)
{
f.document.URL = "bar/" + o.href;
return false;
}

return true;
}

<a href="blurb.html" onclick="return updateFrame(this);"

or the ill-conceived

<a href="#" onclick="location = foo + 'bar'">...</a>

<a href="javascript:someFunction()">...</a>

or even something dynamically scripted like

<script type="text/javascript">
var a = document.createElement("a");
if (a && isMethod(a.appendChild, a.addEventListener,
document.createTextNode, document.body.appendChild))
{
a.appendChild(document.createTextNode("foo"));
a.addEventListener('click',
function(e)
{
if (!e) e = window.event;
if (e)
{
(dhtml.getElem("id", "bar") || {click: function(){}}).onclick();
if (isMethod(e.stopPropagation)) e.stopPropagation();
if (isMethod(e.preventDefault)) e.preventDefault();
if (typeof e.cancelBubble != "undefined") e.cancelBubble = true;
}
},
false);

document.body.appendChild(a);
}
</script>

how would you even /know/ that there is a "link" and where it points to
without implementing the script engine along with its execution environment
itself? I think there are far too many variables here to make even an
educated guess.
Have you used any that actually extract links from Javascript?

No. Probably for good reason.
Again, I am not looking to write a solution *in* Javascript
(necessarily), I am looking to read links *from* Javascript
using whatever tools are available.

I don't think this is very much on topic here.


PointedEars
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top