Grab Data Displayed On Other Web Page Using document.write

Discussion in 'Javascript' started by '69 Camaro, Jan 11, 2006.

  1. '69 Camaro

    '69 Camaro Guest

    Perhaps I'm Googling for the wrong terms. Does anyone have links to
    examples of the syntax necessary to read the HTML on another Web page when
    that HTML is produced from JavaScript using the document.write( ) method?

    For a simplified example, I have two Web pages. Page 1 uses JavaScript with
    the following:

    htmlData = "<B>This is bold text.</B>";
    document.write(htmlData);

    Page 1 displays in bold text:

    This is bold text.

    Page 2 needs to get the markup for page 1, i.e., just "<B>This is bold
    text.</B>" (which includes the tags), not the JavaScript code listed above.
    I've tried using the responseText property of the MSXML2.XMLHTTP.3.0 object,
    but it gives me the JavaScript used for rendering page 1, not the markup
    (that's stored in the htmlData variable).

    The ultimate goal is to grab the data displayed on a Web page and display
    only the items needed on another Web page. I can parse the HTML based upon
    the tags to target exactly the data I want. That's why I need to read the
    tags.

    Suggestions for other approaches are welcome. I'm the author of both Web
    pages, so I have some leeway. Preference is for client-side JavaScript. I
    have some experience in JavaScript, but I'm a C/Java programmer.
    Cross-platform compatibility is preferred, and the majority of browsers will
    be IE6. I can also run Perl scripts on the Web server, but I have little
    experience with Perl, so this would be an opportunity to learn more. In
    case it matters, the Web server is Apache on Linux.

    Thanks.

    Gunny
    '69 Camaro, Jan 11, 2006
    #1
    1. Advertising

  2. '69 Camaro wrote:


    > For a simplified example, I have two Web pages. Page 1 uses JavaScript with
    > the following:
    >
    > htmlData = "<B>This is bold text.</B>";
    > document.write(htmlData);
    >
    > Page 1 displays in bold text:
    >
    > This is bold text.
    >
    > Page 2 needs to get the markup for page 1, i.e., just "<B>This is bold
    > text.</B>" (which includes the tags), not the JavaScript code listed above.
    > I've tried using the responseText property of the MSXML2.XMLHTTP.3.0 object,
    > but it gives me the JavaScript used for rendering page 1, not the markup
    > (that's stored in the htmlData variable).


    That script code is being executed by a browser/user agent when it
    renders the HTML document so one way to access that contents with script
    is to use script to automate a browser. On Windows Microsoft IE can be
    automated so you use a Windows Script Host script to create an IE
    instance, load a URL and then read out the innerHTML or outerHTML of an
    element in the browser's document object model. Of course if you have
    the complete document tree object model then I am not sure you need the
    serialized markup to reparse it yourself. And you should be aware that
    the browser object model will usually include both the contents created
    by script and the script element itself.
    That way you have a script application to be run on a system where IE is
    installed and IE loads a remote URL.

    Another approach might be HTTP Unit
    <http://www.httpunit.org/>
    although I don't know how good their script support is and whether they
    allow access to elements and/or serialized markup.

    None of that however allows you to have browser-side script in one HTML
    document access the DOM created by script in another HTML document. With
    the same origin policy that is only possible if you have two documents
    on the same server, then you can use frames or windows and use cross
    frame or cross window script techniques.


    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Jan 11, 2006
    #2
    1. Advertising

  3. Martin Honnen wrote:

    > None of that however allows you to have browser-side script in one HTML
    > document access the DOM created by script in another HTML document. With
    > the same origin policy that is only possible if you have two documents
    > on the same server, [...]


    It is still only the same second-level domain, will you recognize that?


    PointedEars
    Thomas 'PointedEars' Lahn, Jan 11, 2006
    #3
  4. '69 Camaro

    '69 Camaro Guest

    "Martin Honnen" wrote in message
    news:43c54ac2$0$20773$-online.net...
    > On Windows Microsoft IE can be automated so you use a Windows Script Host
    > script to create an IE instance, load a URL and then read out the
    > innerHTML or outerHTML of an element in the browser's document object
    > model.


    Thanks for that info. I now have an option for visitors with IE browsers.

    > And you should be aware that the browser object model will usually include
    > both the contents created by script and the script element itself.


    So the contents created by the script on Web page 1 can be accessed using
    the innerHTML property of the document.body element of Web page 1? If so,
    does this apply just to IE or does it also apply to other common browsers,
    such as Firefox, Netscape, and Opera?

    > Another approach might be HTTP Unit
    > <http://www.httpunit.org/>
    > although I don't know how good their script support is and whether they
    > allow access to elements and/or serialized markup.


    Thanks for that info. It can help with the automated testing of Web
    applications which I'd recently been wondering about, so thanks for
    answering another question of mine!

    > With the same origin policy that is only possible if you have two
    > documents on the same server, then you can use frames or windows and use
    > cross frame or cross window script techniques.


    Both HTML documents are in the same subdomain on the same Web server, so I
    think this satisfies the same origin policy. I've been Googling on
    cross-window scripting, looking for examples of cross-window scripting
    syntax, but all I've seen so far are examples that spawn a new window (which
    I'd rather avoid unless I can make it invisible to the user -- which I don't
    know how to do), and then write to the new window, but I need to read its
    contents instead.

    Do you have any links to example syntax for accessing a Web page's DOM
    without spawning a new window (if this is even possible), or for hiding a
    spawned window, or for reading the window's contents if my guess on the
    innerHTML property of the document.body element of Web page 1 is incorrect?

    Thanks.

    Gunny
    '69 Camaro, Jan 11, 2006
    #4
  5. '69 Camaro

    Randy Webb Guest

    '69 Camaro said the following on 1/11/2006 3:45 PM:

    <snip>

    >
    > Do you have any links to example syntax for accessing a Web page's DOM
    > without spawning a new window (if this is even possible), or for hiding a
    > spawned window, or for reading the window's contents if my guess on the
    > innerHTML property of the document.body element of Web page 1 is incorrect?


    If all you are wanting to do is get the contents of a page, load it in a
    hidden IFrame and access it from there using the Frames collection.

    window.frames['IFrameNAMEnotID'].property

    --
    Randy
    comp.lang.javascript FAQ - http://jibbering.com/faq & newsgroup weekly
    Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
    Randy Webb, Jan 11, 2006
    #5
  6. '69 Camaro

    '69 Camaro Guest

    "Randy Webb" wrote in message news:...
    > If all you are wanting to do is get the contents of a page, load it in a
    > hidden IFrame and access it from there using the Frames collection.
    >
    > window.frames['IFrameNAMEnotID'].property


    Thanks. I'm not familiar with Frames or IFrames, but I figured out how to
    create an IFrame and make it invisible. Now I'm stuck on the syntax for
    trying to read the HTML produced by the document.write( ) method of the Web
    page loaded into that IFrame. To give a very simplified example, say the
    following Web page was loaded into the IFrame:

    <HTML>
    <BODY>
    <SCRIPT Language = "JavaScript" TYPE = "text/javascript">
    <!--
    var htmlData;
    htmlData = "<B>This is bold text.</B>";
    document.write(htmlData);
    //-->
    </SCRIPT>
    </BODY>
    </HTML>

    .. . . and I want to read the HTML produced by the document.write( ) method.
    In the main window, I'd like to use something like the following:

    html = window.frames['StatsFrame'].whateverGoesHere;

    .. . . where the variable, html, receives "<B>This is bold text.</B>".

    Any suggestions on the correct syntax for "whateverGoesHere"?

    Thanks.

    Gunny
    '69 Camaro, Jan 12, 2006
    #6
  7. '69 Camaro

    Randy Webb Guest

    '69 Camaro said the following on 1/11/2006 7:12 PM:
    > "Randy Webb" wrote in message news:...
    >
    >>If all you are wanting to do is get the contents of a page, load it in a
    >>hidden IFrame and access it from there using the Frames collection.
    >>
    >>window.frames['IFrameNAMEnotID'].property

    >
    >
    > Thanks. I'm not familiar with Frames or IFrames, but I figured out how to
    > create an IFrame and make it invisible. Now I'm stuck on the syntax for
    > trying to read the HTML produced by the document.write( ) method of the Web
    > page loaded into that IFrame. To give a very simplified example, say the
    > following Web page was loaded into the IFrame:
    >
    > <HTML>
    > <BODY>
    > <SCRIPT Language = "JavaScript" TYPE = "text/javascript">
    > <!--
    > var htmlData;
    > htmlData = "<B>This is bold text.</B>";
    > document.write(htmlData);
    > //-->
    > </SCRIPT>
    > </BODY>
    > </HTML>
    >
    > .. . . and I want to read the HTML produced by the document.write( ) method.
    > In the main window, I'd like to use something like the following:
    >
    > html = window.frames['StatsFrame'].whateverGoesHere;
    >
    > .. . . where the variable, html, receives "<B>This is bold text.</B>".
    >
    > Any suggestions on the correct syntax for "whateverGoesHere"?



    IFrame Code:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"
    "http://www.w3.org/TR/REC-html40/strict.dtd">
    <html>
    <head>
    <title>Form Test Page</title>
    <script type="text/javascript">
    var htmlData;
    htmlData = "<B>This is bold text.</B>";
    document.write(htmlData);
    </script>
    </head>
    <body>
    </body>
    </html>

    Main Page Code:
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">
    <html>
    <head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    <title>Frame test</title>
    </head>
    <body>
    <iframe src="blank.html" name="myIFrame">Test text here</iframe>
    <button
    onclick="alert(window.frames['myIFrame'].document.body.innerHTML)">Show
    Title</button>
    </body>
    </html>

    Opera 8, Firefox and IE all give me simply the HTML that was generated.
    Namely, "<B>This is bold text.</B>"

    So it appears the above should be close to what you want or at least
    close enough to get you started on it.

    --
    Randy
    comp.lang.javascript FAQ - http://jibbering.com/faq & newsgroup weekly
    Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
    Randy Webb, Jan 12, 2006
    #7
  8. '69 Camaro

    '69 Camaro Guest

    "Randy Webb" wrote in message
    news:...
    > So it appears the above should be close to what you want or at least close
    > enough to get you started on it.


    Thank you! That gets me much closer to where I want to be. However, I'm
    still trying to get over the hurdle of determining from the second Web page
    what's in the the string variable used for the document.write( ) method on
    the first Web page. While I gave a simplified example of the JavaScript on
    my Web page, the value passed to document.write( ) is actually the value of
    the property of an object, not a string literal. The object is returned
    from a function that I pass variables to, and the function dynamically
    creates the HTML needed to display a Web page with all of the data. For
    example:

    stats = getStats(uid, mid, gid, lg, crit, sUrl);
    htmlData = stats.Text;
    document.write(htmlData);

    .. . . and the value in the stats.Text property would be the string
    containing the HTML with the data and the tags I need to parse, like
    "<B>This is bold text.</B>". This markup never actually appears on Web page
    1, so I don't know how to read it from Web page 2 and extract only the data
    I need.

    Any suggestions on how to read from Web page 2 either the value stored in
    htmlData or the Web page contents created by my script on Web Page 1? I've
    tried using getElementByName( ) and getElementByID( ) to retrieve the
    htmlData variable (or value) in the frame object, but I don't have the
    correct syntax because it bombs.

    Thanks.

    Gunny
    '69 Camaro, Jan 12, 2006
    #8
  9. '69 Camaro

    Randy Webb Guest

    '69 Camaro said the following on 1/12/2006 10:02 AM:
    > "Randy Webb" wrote in message
    > news:...
    >
    >>So it appears the above should be close to what you want or at least close
    >>enough to get you started on it.

    >
    >
    > Thank you! That gets me much closer to where I want to be. However, I'm
    > still trying to get over the hurdle of determining from the second Web page
    > what's in the the string variable used for the document.write( ) method on
    > the first Web page. While I gave a simplified example of the JavaScript on
    > my Web page, the value passed to document.write( ) is actually the value of
    > the property of an object, not a string literal. The object is returned
    > from a function that I pass variables to, and the function dynamically
    > creates the HTML needed to display a Web page with all of the data. For
    > example:
    >
    > stats = getStats(uid, mid, gid, lg, crit, sUrl);
    > htmlData = stats.Text;
    > document.write(htmlData);


    If the name of that variable is always htmlData, then you can access it
    by window.frames['IFrameNAMEnotID'].htmlData

    --
    Randy
    comp.lang.javascript FAQ - http://jibbering.com/faq & newsgroup weekly
    Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
    Randy Webb, Jan 13, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    868
    Patrick.O.Ige
    Oct 19, 2005
  2. CB
    Replies:
    2
    Views:
    502
  3. boney
    Replies:
    1
    Views:
    562
  4. catinhat

    how to grab data on asp page

    catinhat, Aug 13, 2008, in forum: .NET
    Replies:
    0
    Views:
    315
    catinhat
    Aug 13, 2008
  5. Replies:
    0
    Views:
    121
Loading...

Share This Page