Converting textbox contents to xml

J

Jeff North

Problem:
I need to copy the contents of another website into a textarea
(actually a HTMLArea textarea that retains all of the html code) on to
my webpage. Then I need to extract certain parts of this page for my
database. From this copy/paste action I would like to walk through the
copied data. The trouble is that it is only data. I need to convert
this data to XHTML format. Is there a method (either server-side or
client-side) that will allow me to do this?

Just to add to the problem. I've looked at the code and it is not
xhtml compliant code (they don't close off a lot of their tags i.e.
<P> missing closing </P> tag). Will this cause problems with the
conversion?

I've looked at XMLHTMLREQUEST option but this appears to work only on
the same domain - or have I got this totally wrong?

Reason:
the site I need to copy the data from wants to charge an exorbitant
annual fee. The problem is that a) my department doesn't have the cash
and b) we don't know if this project is going to receive funding to
continue.

Any help would greatly appreciated
 
R

Richard Cornford

Jeff North wrote:
Reason:
the site I need to copy the data from wants to charge an
exorbitant annual fee. The problem is that a) my department
doesn't have the cash and b) we don't know if this project
is going to receive funding to continue.

Any help would greatly appreciated

I don't know how it works in your part of the world but here assisting
you in the theft of some third party's intellectual property would be
illegal in itself, no matter how much you might appreciate it.

Richard.
 
J

Jeff North

| Jeff North wrote:
| <snip>
| > Reason:
| > the site I need to copy the data from wants to charge an
| > exorbitant annual fee. The problem is that a) my department
| > doesn't have the cash and b) we don't know if this project
| > is going to receive funding to continue.
| >
| > Any help would greatly appreciated
|
| I don't know how it works in your part of the world but here assisting
| you in the theft of some third party's intellectual property would be
| illegal in itself, no matter how much you might appreciate it.

That's right, jump to the wrong conclusions.
FYI, it is the same government department - just different sections.
FYI, *I* do have legal access to this data, in fact the first
department *demands* that I do access *their* data. At a later date,
when funding is guaranteed, then I will pay the necessary fee but in
the meantime I have to make do without.
 
T

Thomas 'PointedEars' Lahn

Jeff said:
I need to copy the contents of another website into a textarea
(actually a HTMLArea textarea that retains all of the html code) on to
my webpage. Then I need to extract certain parts of this page for my
database. From this copy/paste action I would like to walk through the
copied data. The trouble is that it is only data. I need to convert
this data to XHTML format. Is there a method (either server-side or
client-side) that will allow me to do this?

Use XMLHttpRequest and then an XMLParser object to parse what is served.
Just to add to the problem. I've looked at the code and it is not
xhtml compliant code (they don't close off a lot of their tags i.e.
<P> missing closing </P> tag). Will this cause problems with the
conversion?

I've looked at XMLHTMLREQUEST option but this appears to work only on
the same domain - or have I got this totally wrong?

Due to the Same Origin Policy it only works on the same second-level
domain.
(e-mail address removed) : Remove your pants to reply

Remove `yourpants' to post standards compliant and to be not ignored
in the future.


PointedEars
 
R

Randy Webb

Thomas said:
Jeff North wrote:



Remove `yourpants' to post standards compliant and to be not ignored
in the future.

Here we go again........... What "Standard" are you babbling about?
 
J

Jeff North

| Jeff North wrote:
|
| > I need to copy the contents of another website into a textarea
| > (actually a HTMLArea textarea that retains all of the html code) on to
| > my webpage. Then I need to extract certain parts of this page for my
| > database. From this copy/paste action I would like to walk through the
| > copied data. The trouble is that it is only data. I need to convert
| > this data to XHTML format. Is there a method (either server-side or
| > client-side) that will allow me to do this?
|
| Use XMLHttpRequest and then an XMLParser object to parse what is served.

Yep, tried that but it the data is on another web site.
I was trying to automate a process for my users.
Guess I'll have to use the old copy/paste method :-(
 
T

Thomas 'PointedEars' Lahn

Your attribution contains superfluous, duplicate information for the
most part.
Why?


Yep, tried that but it the data is on another web site.

Do you mean another second-level domain? If no, please re-read my
previous article more thoroughly. And please trim your quotes.


PointedEars
 
J

Jeff North

| Your attribution contains superfluous, duplicate information for the
| most part.
So.

| Why?

It would be easier to walk through the DOM nodes than to try and get
information out of plain text with \r\n control characters
| Do you mean another second-level domain?

No. I mean it is on another web site i.e. my site
http://www.mydomain.com and the other is on http://www.microsoft.com
| If no, please re-read my
| previous article more thoroughly. And please trim your quotes.

It this post trimmed enough for you?
 
D

Dr John Stockton

JRS: In article <[email protected]>, dated Thu, 21 Apr
2005 01:02:21, seen in Thomas 'PointedEars'
Lahn said:
Your attribution contains superfluous, duplicate information for the
most part.

From your limited and inexperienced point of view, perhaps.

However, the attribute quoted is compatible with the current thinking of
Usefor, the News expert team; and objecting to it is childish.
 
T

Thomas 'PointedEars' Lahn

Jeff said:

If that is a statement: Yes.
If that is a question, it begs the answer: Don't do it then.
It would be easier to walk through the DOM nodes than to try and get
information out of plain text with \r\n control characters

The question was: Why you

| [...] need to copy the contents of another website into a textarea
| (actually a HTMLArea textarea that retains all of the html code)

? That is somehow a contradiction to your actual goal.
No. I mean it is on another web site i.e. my site
http://www.mydomain.com and the other is on http://www.microsoft.com

I very much doubt this is possible with client-side scripting since
the SOP, as mentioned, forbids that. Server-side scripting is a viable
approach here, provided that laws are obeyed.
It this post trimmed enough for you?

Too much for some parts, context gets lost sometimes (e.g. the "Why?"
quote). Quotation should be a friendly reminder for the reader only.
Not snipped too much, not too less of it. And quotes of quotes should
be summarized where possible to save the reader time and bandwidth usage.

Your quotation level style, however, is unusual (and as such as
disturbing as --

-- while the above additionally does not really make sense, taking
into account the content of your From/Reply-To headers.)

You may want to read the newsgroup's FAQ about that:

<http://jibbering.com/faq/#FAQ2_3>
<http://www.jibbering.com/faq/faq_notes/pots1.html>


PointedEars
 
J

Jeff North

| Jeff North wrote:
|
| > [...] Thomas 'PointedEars' Lahn [...] wrote:
| >>| Your attribution contains superfluous, duplicate information for the
| >>| most part.
| >
| > So.
|
| If that is a statement: Yes.
| If that is a question, it begs the answer: Don't do it then.

I'll set my newsreader the way *I* want it, thank you very much.

[snip]
| > It this post trimmed enough for you?
|
| Too much for some parts, context gets lost sometimes (e.g. the "Why?"
| quote). Quotation should be a friendly reminder for the reader only.
| Not snipped too much, not too less of it. And quotes of quotes should
| be summarized where possible to save the reader time and bandwidth usage.

Please make up your mind. The above method you stated is my usual
style yet you complained.
| Your quotation level style, however, is unusual (and as such as
| disturbing as --
|
| > ---------------------------------------------------------------
| > (e-mail address removed) : Remove your pants to reply
| > ---------------------------------------------------------------
|
| -- while the above additionally does not really make sense, taking
| into account the content of your From/Reply-To headers.)

I don't have to explain my addresses to you or anyone else as it is
quite obvious what I'm doing. The fact that you find it 'disturbing'
is your problem.
| You may want to read the newsgroup's FAQ about that:
|
| <http://jibbering.com/faq/#FAQ2_3>
| <http://www.jibbering.com/faq/faq_notes/pots1.html>

Which states absolutely nothing about address/Reply To headers.
| PointedEars

Oh BTW

PLONK
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top