How to save a webpage contents to a file ( with LWP )

J

Jack

Hi there, does anyone skilled in the art of LWP (or other perl module)
and screen scraping know how to do the equivalent of a "file", "save
as" html content ? Some webpages arent scrapeable but when you save
down their content to a local file its available. Any ideas would be
great.

Also, if there is a drop down + button to select content BUT in the
HTML source no "submit" entry at all, how does one remote control a
user selection without this post handle ?

Thanks in advance,

Jack
 
A

A. Sinan Unur

Jack said:
Hi there, does anyone skilled in the art of LWP (or other perl module)
and screen scraping know how to do the equivalent of a "file", "save
as" html content ?

http://search.cpan.org/~gaas/libwww-perl-5.808/lib/LWP/Simple.pm

getstore($url, $file)

http://search.cpan.org/~gaas/libwww-perl-5.808/lib/LWP.pm#The_Response_Object

http://search.cpan.org/~gaas/libwww-perl-5.808/lib/HTTP/Response.pm

$r->content( $content )

This is used to get/set the raw content

$r->decoded_content( %options )

This will return the content after any Content-Encoding and charsets
has been decoded.
Also, if there is a drop down + button to select content BUT in the
HTML source no "submit" entry at all, how does one remote control a
user selection without this post handle ?

If the page uses Javascript to dynamically post form contents, you will
have to figure out what the Javascript does and replicate it.

Sinan
 
J

Jack

http://search.cpan.org/~gaas/libwww-perl-5.808/lib/LWP/Simple.pm

getstore($url, $file)

http://search.cpan.org/~gaas/libwww-perl-5.808/lib/LWP.pm#The_Respons...

http://search.cpan.org/~gaas/libwww-perl-5.808/lib/HTTP/Response.pm

$r->content( $content )

    This is used to get/set the raw content

$r->decoded_content( %options )

    This will return the content after any Content-Encoding and charsets
    has been decoded.


If the page uses Javascript to dynamically post form contents, you will
have to figure out what the Javascript does and replicate it.

Sinan

Hi Sinan the site uses ASP, no JS files.. this is all there is in the
html
<!--<SCRIPT>
//
</SCRIPT>-->
<FRAMESET ROWS="70,*" FRAMESPACING=0>
<FRAME NAME="header" SRC="./header_default.asp?
NoCache=2%2F20%2F2008+7%3A35%3A47+AM" SCROLLING="no" MARGINWIDTH="2"
MARGINHEIGHT="0">

<FRAME NAME="bodyx" SRC=
body.asp?centerin=GGCC
SCROLLING="auto" MARGINWIDTH="2" MARGINHEIGHT="2">


</FRAMESET>

</HTML>
 
A

A. Sinan Unur

Do *not* quote sigs.
Hi Sinan the site uses ASP, no JS files.. this is all there is in the
html
<!--<SCRIPT>
//
</SCRIPT>-->
<FRAMESET ROWS="70,*" FRAMESPACING=0>
<FRAME NAME="header" SRC="./header_default.asp?
NoCache=2%2F20%2F2008+7%3A35%3A47+AM" SCROLLING="no" MARGINWIDTH="2"
MARGINHEIGHT="0">

<FRAME NAME="bodyx" SRCbody.asp?centerin=GGCC

I am assuming you retyped the source rather than copied & pasting.
Please don't retype code.
SCROLLING="auto" MARGINWIDTH="2" MARGINHEIGHT="2">

Oh, but there is more. How about them frames?

Anyway, this forum is for help with the Perl aspect of things. If you
need to learn html, there is a group for that as well.

Sinan
 
G

Gunnar Hjalmarsson

Jack said:
this is all there is in the html
<!--<SCRIPT>
//
</SCRIPT>-->
<FRAMESET ROWS="70,*" FRAMESPACING=0>
<FRAME NAME="header" SRC="./header_default.asp?
NoCache=2%2F20%2F2008+7%3A35%3A47+AM" SCROLLING="no" MARGINWIDTH="2"
MARGINHEIGHT="0">

<FRAME NAME="bodyx" SRC=
body.asp?centerin=GGCC
SCROLLING="auto" MARGINWIDTH="2" MARGINHEIGHT="2">


</FRAMESET>

</HTML>

Then get the bodyx frame, not the frameset.
 
J

Jack

Then get the bodyx frame, not the frameset.

--
Gunnar Hjalmarsson
Email:http://www.gunnar.cc/cgi-bin/contact.pl- Hide quoted text -

- Show quoted text -

How exactly does one get the bodyx frame, and more importantly how do
you auto select from the select box when there is no such mention of
it or a submit button in html for this ASP application.
Thank you,
Jack
 
G

Gunnar Hjalmarsson

Jack said:
How exactly does one get the bodyx frame,

Assuming the URL of the frameset is
http://www.example.com/somepage/index.asp, you probably use the URL
http://www.example.com/somepage/body.asp?centerin=GGCC
and more importantly how do
you auto select from the select box when there is no such mention of
it or a submit button in html for this ASP application.

As Sinan mentioned, you apparently need to learn some basics about HTML.
Asking questions in a Perl group is not the right way to do so.

Recommended reading: http://www.w3.org/TR/html4/present/frames.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top