liburl cant load webpage with Javascript

U

Uwe Mayer

Hi,

I want do use liburl to scan a webpage which is only accessible from within
my LAN environment. While mozilla manages to load the target URL properly
neither wget, nor liburl or liburl2 does.
I had a closer look at the html source and discovered a lot of Javascript,
including Cookies.

My suspicion is that the Javascript code needs to be executed for the page
to work properly. Also I don't know how liburl deals with Cookies, but
since they are handled by the Javascript in the source code they are
probably not considered at all.

In any case I get an IOError: connection refused, Error Code 111.

Does anyone know a way out of this?

Thanks for any hints,
Ciao
Uwe
 
L

Lorenzo Gatti

Uwe Mayer said:
Hi,

I want do use liburl to scan a webpage which is only accessible from within
my LAN environment. While mozilla manages to load the target URL properly
neither wget, nor liburl or liburl2 does.
I had a closer look at the html source and discovered a lot of Javascript,
including Cookies.

My suspicion is that the Javascript code needs to be executed for the page
to work properly. Also I don't know how liburl deals with Cookies, but
since they are handled by the Javascript in the source code they are
probably not considered at all.

In any case I get an IOError: connection refused, Error Code 111.

Does anyone know a way out of this?

Thanks for any hints,
Ciao
Uwe

Mozilla is a web browser, and it implements cookies, DOM for HTML
pages, and a Javascript interpreter with objects representing browser
automation.
It's unlikely and inappropriate for low level HTTP implementations
like wget and liburl to have that kind of support for advanced web
features; maybe you can support cookies and Javascript in your
application.

In the specific case of "IOError: connection refused, Error Code 111",
however, the failure seems to happen at a lower protocol level: wrong
host names or port numbers, unavailable servers and maybe proxy
authentication requirements are the usual causes of refused
connections.

Lorenzo Gatti
 
J

John J. Lee

I had a closer look at the html source and discovered a lot of Javascript,
including Cookies.
[...]
Mozilla is a web browser, and it implements cookies, DOM for HTML
pages, and a Javascript interpreter with objects representing browser
automation.
It's unlikely and inappropriate for low level HTTP implementations
like wget and liburl to have that kind of support for advanced web
[...]

JavaScript support is rare, but many libraries and tools support
cookies (including wget and my library, ClientCookie -- essentially a
drop-in replacement for urllib2). For JS, see my FAQ here (under
"Embedded script is messing up my web-scraping. What do I do?"):

http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html

In the specific case of "IOError: connection refused, Error Code 111",
however, the failure seems to happen at a lower protocol level: wrong
[...]

Right.


John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top