I am not *unsure* what language to use to solve this problem; actually
I don't care. My question is about algorithms for parsing and
interpreting Javascript.
Interpretation of "Javascript" would first include the recognition that
there are different implementations of ECMAScript: JavaScript, JScript,
Opera-ECMAScript, KJS; just to name the most widely distributed ones.
Whether script code executes or not, i.e. whether there is a "link" or
not, would depend entirely on how tight something is coded to a specific
implementation, let alone a specific execution environment or, object
model.
Second, if you would stick to strictly ECMAScript-conforming code as
should be expected by an interoperable Web site that is to be parsed,
the matter of interpretation includes how you want to recognize what
is a "link" or not. Because
var img = new Image();
img.src = "foo";
could be considered a link (to an image resource named `foo').
var img = new Object();
img.src = "foo";
could not.
As for recognizing links and pseudo-links such as
function updateFrame(o)
{
var f = window.parent.frames['foo'];
if (f && f.document)
{
f.document.URL = "bar/" + o.href;
return false;
}
return true;
}
<a href="blurb.html" onclick="return updateFrame(this);"
or the ill-conceived
<a href="#" onclick="location = foo + 'bar'">...</a>
<a href="javascript:someFunction()">...</a>
or even something dynamically scripted like
<script type="text/javascript">
var a = document.createElement("a");
if (a && isMethod(a.appendChild, a.addEventListener,
document.createTextNode, document.body.appendChild))
{
a.appendChild(document.createTextNode("foo"));
a.addEventListener('click',
function(e)
{
if (!e) e = window.event;
if (e)
{
(dhtml.getElem("id", "bar") || {click: function(){}}).onclick();
if (isMethod(e.stopPropagation)) e.stopPropagation();
if (isMethod(e.preventDefault)) e.preventDefault();
if (typeof e.cancelBubble != "undefined") e.cancelBubble = true;
}
},
false);
document.body.appendChild(a);
}
</script>
how would you even /know/ that there is a "link" and where it points to
without implementing the script engine along with its execution environment
itself? I think there are far too many variables here to make even an
educated guess.
Have you used any that actually extract links from Javascript?
No. Probably for good reason.
Again, I am not looking to write a solution *in* Javascript
(necessarily), I am looking to read links *from* Javascript
using whatever tools are available.
I don't think this is very much on topic here.
PointedEars