Looking for browser emulator

Roy Smith · Oct 14, 2011

I've got to write some tests in python which simulate getting a page of
HTML from an http server, finding a link, clicking on it, and then
examining the HTML on the next page to make sure it has certain features.

I can use urllib to do the basic fetching, and lxml gives me the tools
to find the link I want and extract its href attribute. What's missing
is dealing with turning the href into an absolute URL that I can give to
urlopen(). Browsers implement all sorts of stateful logic such as "if
the URL has no hostname, use the same hostname as the current page".
I'm talking about something where I can execute this sequence of calls:

urlopen("http://foo.com:9999/bar")
urlopen("/baz")

and have the second one know that it needs to get
"http://foo.com:9999/baz". Does anything like that exist?

I'm really trying to stay away from Selenium and go strictly with
something I can run under unittest.

Jon Clements · Oct 14, 2011

I've got to write some tests in python which simulate getting a page of
HTML from an http server, finding a link, clicking on it, and then
examining the HTML on the next page to make sure it has certain features.

I can use urllib to do the basic fetching, and lxml gives me the tools
to find the link I want and extract its href attribute. What's missing
is dealing with turning the href into an absolute URL that I can give to
urlopen(). Browsers implement all sorts of stateful logic such as "if
the URL has no hostname, use the same hostname as the current page".
I'm talking about something where I can execute this sequence of calls:

urlopen("http://foo.com:9999/bar")
urlopen("/baz")

and have the second one know that it needs to get
"http://foo.com:9999/baz". Does anything like that exist?

I'm really trying to stay away from Selenium and go strictly with
something I can run under unittest.

lxml.html.make_links_absolute() ?

Jon Clements · Oct 14, 2011

I've got to write some tests in python which simulate getting a page of
HTML from an http server, finding a link, clicking on it, and then
examining the HTML on the next page to make sure it has certain features.

I can use urllib to do the basic fetching, and lxml gives me the tools
to find the link I want and extract its href attribute. What's missing
is dealing with turning the href into an absolute URL that I can give to
urlopen(). Browsers implement all sorts of stateful logic such as "if
the URL has no hostname, use the same hostname as the current page".
I'm talking about something where I can execute this sequence of calls:

urlopen("http://foo.com:9999/bar")
urlopen("/baz")

and have the second one know that it needs to get
"http://foo.com:9999/baz". Does anything like that exist?

I'm really trying to stay away from Selenium and go strictly with
something I can run under unittest.

lxml.html.make_links_absolute() ?

Roy Smith · Oct 14, 2011

Jon Clements said:
lxml.html.make_links_absolute() ?

Interesting. That might be exactly what I'm looking for. Thanks!

Miki Tebeka · Oct 14, 2011

IIRC mechanize can do that.

Gary Herron · Oct 14, 2011

I've got to write some tests in python which simulate getting a page of
HTML from an http server, finding a link, clicking on it, and then
examining the HTML on the next page to make sure it has certain features.

I can use urllib to do the basic fetching, and lxml gives me the tools
to find the link I want and extract its href attribute. What's missing
is dealing with turning the href into an absolute URL that I can give to
urlopen(). Browsers implement all sorts of stateful logic such as "if
the URL has no hostname, use the same hostname as the current page".
I'm talking about something where I can execute this sequence of calls:

urlopen("http://foo.com:9999/bar")
urlopen("/baz")

and have the second one know that it needs to get
"http://foo.com:9999/baz". Does anything like that exist?

I'm really trying to stay away from Selenium and go strictly with
something I can run under unittest.

Try mechanize
http://wwwsearch.sourceforge.net/mechanize/
billed as
Stateful programmatic web browsing in Python.

I handles clicking on links, cookies, logging in/out, and filling in of
forms in the same way as a "real" browser, but it's all under
programmatic control from Python.

In Ubuntu, it's the python-mechanize package.

Roy Smith · Oct 14, 2011

Gary Herron said:
Try mechanize
http://wwwsearch.sourceforge.net/mechanize/
billed as
Stateful programmatic web browsing in Python.

Wow, this is cool, thanks! It even does cookies!

Trying to access hdml from an open browser using Python.	1	Jan 18, 2023
Looking for someone to take alook at this code and help	2	Mar 10, 2023
How to position the tooltip comment on these buttons?	9	Nov 4, 2023
Brython - Python in the browser	52	Dec 19, 2012
Brython (Python in the browser)	6	Dec 27, 2013
Looking for UNICODE to ASCII Conversioni Example Code	15	Oct 18, 2013
Align img inside nav tabs section	5	Dec 29, 2023
Background image not showing up on html page	3	Sep 23, 2023

Looking for browser emulator

Roy Smith

Jon Clements

Jon Clements

Roy Smith

Miki Tebeka

Gary Herron

Roy Smith

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads