Converting textbox contents to xml

Discussion in 'Javascript' started by Jeff North, Apr 16, 2005.

  1. Jeff North

    Jeff North Guest

    Problem:
    I need to copy the contents of another website into a textarea
    (actually a HTMLArea textarea that retains all of the html code) on to
    my webpage. Then I need to extract certain parts of this page for my
    database. From this copy/paste action I would like to walk through the
    copied data. The trouble is that it is only data. I need to convert
    this data to XHTML format. Is there a method (either server-side or
    client-side) that will allow me to do this?

    Just to add to the problem. I've looked at the code and it is not
    xhtml compliant code (they don't close off a lot of their tags i.e.
    <P> missing closing </P> tag). Will this cause problems with the
    conversion?

    I've looked at XMLHTMLREQUEST option but this appears to work only on
    the same domain - or have I got this totally wrong?

    Reason:
    the site I need to copy the data from wants to charge an exorbitant
    annual fee. The problem is that a) my department doesn't have the cash
    and b) we don't know if this project is going to receive funding to
    continue.

    Any help would greatly appreciated
    ---------------------------------------------------------------
    : Remove your pants to reply
    ---------------------------------------------------------------
     
    Jeff North, Apr 16, 2005
    #1
    1. Advertising

  2. Jeff North wrote:
    <snip>
    > Reason:
    > the site I need to copy the data from wants to charge an
    > exorbitant annual fee. The problem is that a) my department
    > doesn't have the cash and b) we don't know if this project
    > is going to receive funding to continue.
    >
    > Any help would greatly appreciated


    I don't know how it works in your part of the world but here assisting
    you in the theft of some third party's intellectual property would be
    illegal in itself, no matter how much you might appreciate it.

    Richard.
     
    Richard Cornford, Apr 16, 2005
    #2
    1. Advertising

  3. Jeff North

    Jeff North Guest

    On Sat, 16 Apr 2005 21:55:01 +0100, in comp.lang.javascript "Richard
    Cornford" <> wrote:

    >| Jeff North wrote:
    >| <snip>
    >| > Reason:
    >| > the site I need to copy the data from wants to charge an
    >| > exorbitant annual fee. The problem is that a) my department
    >| > doesn't have the cash and b) we don't know if this project
    >| > is going to receive funding to continue.
    >| >
    >| > Any help would greatly appreciated
    >|
    >| I don't know how it works in your part of the world but here assisting
    >| you in the theft of some third party's intellectual property would be
    >| illegal in itself, no matter how much you might appreciate it.


    That's right, jump to the wrong conclusions.
    FYI, it is the same government department - just different sections.
    FYI, *I* do have legal access to this data, in fact the first
    department *demands* that I do access *their* data. At a later date,
    when funding is guaranteed, then I will pay the necessary fee but in
    the meantime I have to make do without.
    ---------------------------------------------------------------
    : Remove your pants to reply
    ---------------------------------------------------------------
     
    Jeff North, Apr 16, 2005
    #3
  4. Jeff North wrote:

    > I need to copy the contents of another website into a textarea
    > (actually a HTMLArea textarea that retains all of the html code) on to
    > my webpage. Then I need to extract certain parts of this page for my
    > database. From this copy/paste action I would like to walk through the
    > copied data. The trouble is that it is only data. I need to convert
    > this data to XHTML format. Is there a method (either server-side or
    > client-side) that will allow me to do this?


    Use XMLHttpRequest and then an XMLParser object to parse what is served.

    > Just to add to the problem. I've looked at the code and it is not
    > xhtml compliant code (they don't close off a lot of their tags i.e.
    > <P> missing closing </P> tag). Will this cause problems with the
    > conversion?
    >
    > I've looked at XMLHTMLREQUEST option but this appears to work only on
    > the same domain - or have I got this totally wrong?


    Due to the Same Origin Policy it only works on the same second-level
    domain.
    <http://www.mozilla.org/projects/security/components/same-origin.html>

    > : Remove your pants to reply


    Remove `yourpants' to post standards compliant and to be not ignored
    in the future.


    PointedEars
     
    Thomas 'PointedEars' Lahn, Apr 19, 2005
    #4
  5. Jeff North

    Randy Webb Guest

    Thomas 'PointedEars' Lahn wrote:
    > Jeff North wrote:


    <snip>

    >> : Remove your pants to reply

    >
    >
    > Remove `yourpants' to post standards compliant and to be not ignored
    > in the future.


    Here we go again........... What "Standard" are you babbling about?
     
    Randy Webb, Apr 20, 2005
    #5
  6. Jeff North

    Jeff North Guest

    On Wed, 20 Apr 2005 00:08:44 +0200, in comp.lang.javascript Thomas
    'PointedEars' Lahn <> wrote:

    >| Jeff North wrote:
    >|
    >| > I need to copy the contents of another website into a textarea
    >| > (actually a HTMLArea textarea that retains all of the html code) on to
    >| > my webpage. Then I need to extract certain parts of this page for my
    >| > database. From this copy/paste action I would like to walk through the
    >| > copied data. The trouble is that it is only data. I need to convert
    >| > this data to XHTML format. Is there a method (either server-side or
    >| > client-side) that will allow me to do this?
    >|
    >| Use XMLHttpRequest and then an XMLParser object to parse what is served.


    Yep, tried that but it the data is on another web site.
    I was trying to automate a process for my users.
    Guess I'll have to use the old copy/paste method :-(

    >| > Just to add to the problem. I've looked at the code and it is not
    >| > xhtml compliant code (they don't close off a lot of their tags i.e.
    >| > <P> missing closing </P> tag). Will this cause problems with the
    >| > conversion?
    >| >
    >| > I've looked at XMLHTMLREQUEST option but this appears to work only on
    >| > the same domain - or have I got this totally wrong?
    >|
    >| Due to the Same Origin Policy it only works on the same second-level
    >| domain.
    >| <http://www.mozilla.org/projects/security/components/same-origin.html>
    >|
    >| > : Remove your pants to reply
    >|
    >| Remove `yourpants' to post standards compliant and to be not ignored
    >| in the future.
    >|
    >|
    >| PointedEars


    ---------------------------------------------------------------
    : Remove your pants to reply
    ---------------------------------------------------------------
     
    Jeff North, Apr 20, 2005
    #6
  7. Jeff North wrote:

    > On Wed, 20 Apr 2005 00:08:44 +0200, in comp.lang.javascript Thomas
    > 'PointedEars' Lahn <> wrote:


    Your attribution contains superfluous, duplicate information for the
    most part.

    >>| Jeff North wrote:
    >>| > I need to copy the contents of another website into a textarea
    >>| > (actually a HTMLArea textarea that retains all of the html code)


    Why?

    >>| > my webpage. Then I need to extract certain parts of this page for my
    >>| > database. From this copy/paste action I would like to walk through the
    >>| > copied data. The trouble is that it is only data. I need to convert
    >>| > this data to XHTML format. Is there a method (either server-side or
    >>| > client-side) that will allow me to do this?
    >>|
    >>| Use XMLHttpRequest and then an XMLParser object to parse what is served.

    >
    > Yep, tried that but it the data is on another web site.


    Do you mean another second-level domain? If no, please re-read my
    previous article more thoroughly. And please trim your quotes.


    PointedEars
     
    Thomas 'PointedEars' Lahn, Apr 21, 2005
    #7
  8. Jeff North

    Jeff North Guest

    On Thu, 21 Apr 2005 01:02:21 +0200, in comp.lang.javascript Thomas
    'PointedEars' Lahn <> wrote:

    >| Your attribution contains superfluous, duplicate information for the
    >| most part.


    So.

    >| Why?


    It would be easier to walk through the DOM nodes than to try and get
    information out of plain text with \r\n control characters

    >| Do you mean another second-level domain?


    No. I mean it is on another web site i.e. my site
    http://www.mydomain.com and the other is on http://www.microsoft.com

    >| If no, please re-read my
    >| previous article more thoroughly. And please trim your quotes.


    It this post trimmed enough for you?
    ---------------------------------------------------------------
    : Remove your pants to reply
    ---------------------------------------------------------------
     
    Jeff North, Apr 21, 2005
    #8
  9. JRS: In article <>, dated Thu, 21 Apr
    2005 01:02:21, seen in news:comp.lang.javascript, Thomas 'PointedEars'
    Lahn <> posted :
    >Jeff North wrote:
    >
    >> On Wed, 20 Apr 2005 00:08:44 +0200, in comp.lang.javascript Thomas
    >> 'PointedEars' Lahn <> wrote:

    >
    >Your attribution contains superfluous, duplicate information for the
    >most part.


    From your limited and inexperienced point of view, perhaps.

    However, the attribute quoted is compatible with the current thinking of
    Usefor, the News expert team; and objecting to it is childish.

    --
    © John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4 ©
    <URL:http://www.jibbering.com/faq/> JL/RC: FAQ of news:comp.lang.javascript
    <URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
    <URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
     
    Dr John Stockton, Apr 22, 2005
    #9
  10. Jeff North wrote:

    > [...] Thomas 'PointedEars' Lahn [...] wrote:
    >>| Your attribution contains superfluous, duplicate information for the
    >>| most part.

    >
    > So.


    If that is a statement: Yes.
    If that is a question, it begs the answer: Don't do it then.

    >>| Why?

    >
    > It would be easier to walk through the DOM nodes than to try and get
    > information out of plain text with \r\n control characters


    The question was: Why you

    | [...] need to copy the contents of another website into a textarea
    | (actually a HTMLArea textarea that retains all of the html code)

    ? That is somehow a contradiction to your actual goal.

    >>| Do you mean another second-level domain?

    >
    > No. I mean it is on another web site i.e. my site
    > http://www.mydomain.com and the other is on http://www.microsoft.com


    I very much doubt this is possible with client-side scripting since
    the SOP, as mentioned, forbids that. Server-side scripting is a viable
    approach here, provided that laws are obeyed.

    >>| If no, please re-read my
    >>| previous article more thoroughly. And please trim your quotes.

    >
    > It this post trimmed enough for you?


    Too much for some parts, context gets lost sometimes (e.g. the "Why?"
    quote). Quotation should be a friendly reminder for the reader only.
    Not snipped too much, not too less of it. And quotes of quotes should
    be summarized where possible to save the reader time and bandwidth usage.

    Your quotation level style, however, is unusual (and as such as
    disturbing as --

    > ---------------------------------------------------------------
    > : Remove your pants to reply
    > ---------------------------------------------------------------


    -- while the above additionally does not really make sense, taking
    into account the content of your From/Reply-To headers.)

    You may want to read the newsgroup's FAQ about that:

    <http://jibbering.com/faq/#FAQ2_3>
    <http://www.jibbering.com/faq/faq_notes/pots1.html>


    PointedEars
     
    Thomas 'PointedEars' Lahn, Apr 23, 2005
    #10
  11. Jeff North

    Jeff North Guest

    On Sat, 23 Apr 2005 17:07:03 +0200, in comp.lang.javascript Thomas
    'PointedEars' Lahn <> wrote:

    >| Jeff North wrote:
    >|
    >| > [...] Thomas 'PointedEars' Lahn [...] wrote:
    >| >>| Your attribution contains superfluous, duplicate information for the
    >| >>| most part.
    >| >
    >| > So.
    >|
    >| If that is a statement: Yes.
    >| If that is a question, it begs the answer: Don't do it then.


    I'll set my newsreader the way *I* want it, thank you very much.

    [snip]

    >| > It this post trimmed enough for you?
    >|
    >| Too much for some parts, context gets lost sometimes (e.g. the "Why?"
    >| quote). Quotation should be a friendly reminder for the reader only.
    >| Not snipped too much, not too less of it. And quotes of quotes should
    >| be summarized where possible to save the reader time and bandwidth usage.


    Please make up your mind. The above method you stated is my usual
    style yet you complained.

    >| Your quotation level style, however, is unusual (and as such as
    >| disturbing as --
    >|
    >| > ---------------------------------------------------------------
    >| > : Remove your pants to reply
    >| > ---------------------------------------------------------------
    >|
    >| -- while the above additionally does not really make sense, taking
    >| into account the content of your From/Reply-To headers.)


    I don't have to explain my addresses to you or anyone else as it is
    quite obvious what I'm doing. The fact that you find it 'disturbing'
    is your problem.

    >| You may want to read the newsgroup's FAQ about that:
    >|
    >| <http://jibbering.com/faq/#FAQ2_3>
    >| <http://www.jibbering.com/faq/faq_notes/pots1.html>


    Which states absolutely nothing about address/Reply To headers.

    >| PointedEars


    Oh BTW

    PLONK
    ---------------------------------------------------------------
    : Remove your pants to reply
    ---------------------------------------------------------------
     
    Jeff North, Apr 24, 2005
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. MattC
    Replies:
    4
    Views:
    3,928
    Thomas Hawtin
    Dec 30, 2005
  2. Don Adams
    Replies:
    1
    Views:
    607
    Martin Honnen
    Mar 5, 2004
  3. jkflens
    Replies:
    2
    Views:
    1,514
    jkflens
    May 30, 2006
  4. Kamarulnizam Rahim
    Replies:
    4
    Views:
    230
    Robert Klemme
    Jan 28, 2011
  5. Replies:
    0
    Views:
    204
Loading...

Share This Page