Extract links from Javascript (not using Javascript)?

Discussion in 'Javascript' started by chrisspencer02@yahoo.com, May 26, 2006.

  1. Guest

    I am looking for a method to extract the links embedded within the
    Javascript in a web page: an ActiveX component, or example code in
    C++/Pascal/etc. I am looking for a general solution, not one tailored
    to a particular page/script.

    Hopefully, the problem can be solved without recreating a complete
    Javascript interpreter. Any ideas?
    , May 26, 2006
    #1
    1. Advertising

  2. Ira Baxter Guest

    <> wrote in message
    news:...
    > I am looking for a method to extract the links embedded within the
    > Javascript in a web page: an ActiveX component, or example code in
    > C++/Pascal/etc. I am looking for a general solution, not one tailored
    > to a particular page/script.
    >
    > Hopefully, the problem can be solved without recreating a complete
    > Javascript interpreter. Any ideas?


    If you expect to have any chance at getting at links that are anything
    that other coded directly in a string liveral, you will need at least a full
    JavaScript parser.
    See http://www.semanticdesigns.com/Products/FrontEnds/index.html
    for a JavaScript front end that is designed to be used in custom tasks
    like this.

    --
    Ira Baxter, CTO
    www.semanticdesigns.com
    Ira Baxter, May 26, 2006
    #2
    1. Advertising

  3. Randy Webb Guest

    said the following on 5/26/2006 3:03 PM:
    > I am looking for a method to extract the links embedded within the
    > Javascript in a web page: an ActiveX component, or example code in
    > C++/Pascal/etc. I am looking for a general solution, not one tailored
    > to a particular page/script.


    There are too many possibilities to deal with for a solution to that
    question to be simple and/or general. Just too many ways that a URL can
    be put together in script.

    Can you give a general example of what you are trying to do though?
    --
    Randy
    comp.lang.javascript FAQ - http://jibbering.com/faq & newsgroup weekly
    Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
    Randy Webb, May 26, 2006
    #3
  4. Randy Webb Guest

    Ira Baxter said the following on 5/26/2006 3:44 PM:
    > <> wrote in message
    > news:...
    >> I am looking for a method to extract the links embedded within the
    >> Javascript in a web page: an ActiveX component, or example code in
    >> C++/Pascal/etc. I am looking for a general solution, not one tailored
    >> to a particular page/script.
    >>
    >> Hopefully, the problem can be solved without recreating a complete
    >> Javascript interpreter. Any ideas?

    >
    > If you expect to have any chance at getting at links that are anything
    > that other coded directly in a string liveral, you will need at least a full
    > JavaScript parser.


    And even that is not a guarantee of success.

    > See http://www.semanticdesigns.com/Products/FrontEnds/index.html
    > for a JavaScript front end that is designed to be used in custom tasks
    > like this.


    It is designed to parse out any and all URL's that a document possesses?

    I find that a dubious claim.

    --
    Randy
    comp.lang.javascript FAQ - http://jibbering.com/faq & newsgroup weekly
    Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
    Randy Webb, May 26, 2006
    #4
  5. Guest

    Randy Webb wrote:
    > said the following on 5/26/2006 3:03 PM:
    > > I am looking for a method to extract the links embedded within the
    > > Javascript in a web page: an ActiveX component, or example code in
    > > C++/Pascal/etc. I am looking for a general solution, not one tailored
    > > to a particular page/script.

    >
    > There are too many possibilities to deal with for a solution to that
    > question to be simple and/or general. Just too many ways that a URL can
    > be put together in script.
    >
    > Can you give a general example of what you are trying to do though?


    I would like to transform web pages "in the wild" into tables of links
    for a site map, regardless of whether those links are encoded in HTML,
    CSS, Flash, Javascript, etc. Sounds like this is not possible,
    particularly for event-driven aspects of the script like rollover image
    menus?
    , May 27, 2006
    #5
  6. Randy Webb Guest

    said the following on 5/26/2006 8:44 PM:
    > Randy Webb wrote:
    >> said the following on 5/26/2006 3:03 PM:
    >>> I am looking for a method to extract the links embedded within the
    >>> Javascript in a web page: an ActiveX component, or example code in
    >>> C++/Pascal/etc. I am looking for a general solution, not one tailored
    >>> to a particular page/script.

    >> There are too many possibilities to deal with for a solution to that
    >> question to be simple and/or general. Just too many ways that a URL can
    >> be put together in script.
    >>
    >> Can you give a general example of what you are trying to do though?

    >
    > I would like to transform web pages "in the wild" into tables of links
    > for a site map, regardless of whether those links are encoded in HTML,
    > CSS, Flash, Javascript, etc. Sounds like this is not possible,
    > particularly for event-driven aspects of the script like rollover image
    > menus?
    >


    It could be done with regards to the CSS, HTML, and JS aspects, but it
    wouldn't be a pretty task to try to accomplish. Just trying to resolve
    relative paths would be a major headache.

    --
    Randy
    comp.lang.javascript FAQ - http://jibbering.com/faq & newsgroup weekly
    Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
    Randy Webb, May 27, 2006
    #6
  7. wrote:

    > Randy Webb wrote:
    >> said the following on 5/26/2006 3:03 PM:
    >> > I am looking for a method to extract the links embedded within the
    >> > Javascript in a web page: an ActiveX component, or example code in
    >> > C++/Pascal/etc.


    Obviously you are not yet sure what to use, so a newsgroup dedicated to a
    certain (group of) language(s), like this one, is not the place to start.
    Try comp.infosystems.www.authoring.misc, or comp.lang.misc.

    >> > I am looking for a general solution, not one tailored
    >> > to a particular page/script.

    >>
    >> There are too many possibilities to deal with for a solution to that
    >> question to be simple and/or general. Just too many ways that a URL can
    >> be put together in script.
    >>
    >> Can you give a general example of what you are trying to do though?

    >
    > I would like to transform web pages "in the wild" into tables of links
    > for a site map,


    A site map is best implemented using lists (in [X]HTML: `ul' and `ol'
    elements), not tables. A table is a table is a table. [psf 3.8]

    > regardless of whether those links are encoded in HTML, CSS, Flash,
    > Javascript, etc. Sounds like this is not possible,


    It is possible to a certain point (I don't think decompiling Flash is
    possible easily). There is software for that already (Web spiders),
    and you could use its output.

    > particularly for event-driven aspects of the script like rollover image
    > menus?


    The rollover effect has to take place on existing markup, so it does not
    matter here. You will have difficulties to recognize not gracefully
    degrading client-side generated menus, and those that use pseudo-links
    like (<a href="javascript:somefunction()">...</a>), though.

    Which also tells you that unless you are using server-side J(ava)Script,
    J(ava)Script is not the appropriate language for generating the site map.
    However, e.g. it can help with letting the user expand/collapse it later.


    PointedEars
    --
    This is Usenet. It is a discussion group, not a helpdesk. You post
    something, we discuss it. If you have a question and that happens to get
    answered in the course of the discussion, then great. If not, you can
    have a full refund of your membership fees. -- Mark Parnell in alt.html
    Thomas 'PointedEars' Lahn, May 29, 2006
    #7
  8. Guest

    Thomas 'PointedEars' Lahn wrote:
    > wrote:
    >
    > > Randy Webb wrote:
    > >> said the following on 5/26/2006 3:03 PM:
    > >> > I am looking for a method to extract the links embedded within the
    > >> > Javascript in a web page: an ActiveX component, or example code in
    > >> > C++/Pascal/etc.

    >
    > Obviously you are not yet sure what to use, so a newsgroup dedicated to a
    > certain (group of) language(s), like this one, is not the place to start.
    > Try comp.infosystems.www.authoring.misc, or comp.lang.misc.


    I am not *unsure* what language to use to solve this problem; actually
    I don't care. My question is about algorithms for parsing and
    interpreting Javascript.


    > >> > I am looking for a general solution, not one tailored
    > >> > to a particular page/script.
    > >>
    > >> There are too many possibilities to deal with for a solution to that
    > >> question to be simple and/or general. Just too many ways that a URL can
    > >> be put together in script.
    > >>
    > >> Can you give a general example of what you are trying to do though?

    > >
    > > I would like to transform web pages "in the wild" into tables of links
    > > for a site map,

    >
    > A site map is best implemented using lists (in [X]HTML: `ul' and `ol'
    > elements), not tables. A table is a table is a table. [psf 3.8]


    I do not mean "table" as in HTML table, but "table" as in raw data set.


    > > regardless of whether those links are encoded in HTML, CSS, Flash,
    > > Javascript, etc. Sounds like this is not possible,

    >
    > It is possible to a certain point (I don't think decompiling Flash is
    > possible easily). There is software for that already (Web spiders),
    > and you could use its output.


    Have you used any that actually extract links from Javascript? I have
    not, though I know some claim to do so.


    > > particularly for event-driven aspects of the script like rollover image
    > > menus?

    >
    > The rollover effect has to take place on existing markup, so it does not
    > matter here. You will have difficulties to recognize not gracefully
    > degrading client-side generated menus, and those that use pseudo-links
    > like (<a href="javascript:somefunction()">...</a>), though.
    >
    > Which also tells you that unless you are using server-side J(ava)Script,
    > J(ava)Script is not the appropriate language for generating the site map.
    > However, e.g. it can help with letting the user expand/collapse it later.


    Again, I am not looking to write a solution *in* Javascript
    (necessarily), I am looking to read links *from* Javascript using
    whatever tools are available.
    , May 30, 2006
    #8
  9. wrote:

    > Thomas 'PointedEars' Lahn wrote:
    >> wrote:
    >> > Randy Webb wrote:
    >> >> said the following on 5/26/2006 3:03 PM:
    >> >> > I am looking for a method to extract the links embedded within the
    >> >> > Javascript in a web page: an ActiveX component, or example code in
    >> >> > C++/Pascal/etc.

    >>
    >> Obviously you are not yet sure what to use, so a newsgroup dedicated to a
    >> certain (group of) language(s), like this one, is not the place to start.
    >> Try comp.infosystems.www.authoring.misc, or comp.lang.misc.

    >
    > I am not *unsure* what language to use to solve this problem; actually
    > I don't care. My question is about algorithms for parsing and
    > interpreting Javascript.


    Interpretation of "Javascript" would first include the recognition that
    there are different implementations of ECMAScript: JavaScript, JScript,
    Opera-ECMAScript, KJS; just to name the most widely distributed ones.

    Whether script code executes or not, i.e. whether there is a "link" or
    not, would depend entirely on how tight something is coded to a specific
    implementation, let alone a specific execution environment or, object
    model.

    Second, if you would stick to strictly ECMAScript-conforming code as
    should be expected by an interoperable Web site that is to be parsed,
    the matter of interpretation includes how you want to recognize what
    is a "link" or not. Because

    var img = new Image();
    img.src = "foo";

    could be considered a link (to an image resource named `foo').

    var img = new Object();
    img.src = "foo";

    could not.

    As for recognizing links and pseudo-links such as

    function updateFrame(o)
    {
    var f = window.parent.frames['foo'];
    if (f && f.document)
    {
    f.document.URL = "bar/" + o.href;
    return false;
    }

    return true;
    }

    <a href="blurb.html" onclick="return updateFrame(this);"

    or the ill-conceived

    <a href="#" onclick="location = foo + 'bar'">...</a>

    <a href="javascript:someFunction()">...</a>

    or even something dynamically scripted like

    <script type="text/javascript">
    var a = document.createElement("a");
    if (a && isMethod(a.appendChild, a.addEventListener,
    document.createTextNode, document.body.appendChild))
    {
    a.appendChild(document.createTextNode("foo"));
    a.addEventListener('click',
    function(e)
    {
    if (!e) e = window.event;
    if (e)
    {
    (dhtml.getElem("id", "bar") || {click: function(){}}).onclick();
    if (isMethod(e.stopPropagation)) e.stopPropagation();
    if (isMethod(e.preventDefault)) e.preventDefault();
    if (typeof e.cancelBubble != "undefined") e.cancelBubble = true;
    }
    },
    false);

    document.body.appendChild(a);
    }
    </script>

    how would you even /know/ that there is a "link" and where it points to
    without implementing the script engine along with its execution environment
    itself? I think there are far too many variables here to make even an
    educated guess.

    >> > regardless of whether those links are encoded in HTML, CSS, Flash,
    >> > Javascript, etc. Sounds like this is not possible,

    >>
    >> It is possible to a certain point (I don't think decompiling Flash is
    >> possible easily). There is software for that already (Web spiders),
    >> and you could use its output.

    >
    > Have you used any that actually extract links from Javascript?


    No. Probably for good reason.

    > Again, I am not looking to write a solution *in* Javascript
    > (necessarily), I am looking to read links *from* Javascript
    > using whatever tools are available.


    I don't think this is very much on topic here.


    PointedEars
    Thomas 'PointedEars' Lahn, May 30, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    5
    Views:
    11,884
    MadhuP
    Mar 11, 2011
  2. livin
    Replies:
    1
    Views:
    10,864
    Steven Bethard
    Sep 14, 2005
  3. Replies:
    1
    Views:
    5,753
    lordy
    Aug 7, 2006
  4. Noel

    Extract links from HTML

    Noel, Oct 22, 2008, in forum: Java
    Replies:
    2
    Views:
    560
    Stefan Ram
    Oct 22, 2008
  5. Replies:
    0
    Views:
    134
Loading...

Share This Page