scraping from bundes-telefonbuch.de with python

davidgp · Jun 19, 2010

hello, i'm new on this group, and quiet new to python!
i'm trying to scrap some adress data from bundes-telefonbuch.de but i
run into a problem:
the link is like this: http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20
and it is basically the same for every search query.
thus i need to submit post data to the webserver, i try to do this
like this:

opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible;
Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')]
urllib2.install_opener(opener)

data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A ||
G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche
starten','S': '1','translationtemplate': 'checkstrasse'})

url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20'
response = urllib2.urlopen(url, data)

this returns a page saying i have to reenter my search terms..
what's going wrong here?

Thanks!!

Rebelo · Jun 19, 2010

hello, i'm new on this group, and quiet new to python!
i'm trying to scrap some adress data from bundes-telefonbuch.de but i
run into a problem:
the link is like this:http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20
and it is basically the same for every search query.
thus i need to submit post data to the webserver, i try to do this
like this:

opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible;
Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')]
urllib2.install_opener(opener)

data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A ||
G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche
starten','S': '1','translationtemplate': 'checkstrasse'})

url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20'
response = urllib2.urlopen(url, data)

this returns a page saying i have to reenter my search terms..
what's going wrong here?

Thanks!!

Try mechanize : http://wwwsearch.sourceforge.net/mechanize/

import mechanize
response = mechanize.urlopen("http://www.bundes-telefonbuch.de/")
forms = mechanize.ParseResponse(response, backwards_compat=False)
form = forms[0]
form["F0"] = "query" #enter query
html = mechanize.urlopen(form.click()).read()
f = open("tmp.html","w")
f.writelines(html)
f.close()

Or you can try to parse response but I think that their HTML is not
valid

Michael Torrie · Jun 19, 2010

opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible;
Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')]
urllib2.install_opener(opener)

data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A ||
G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche
starten','S': '1','translationtemplate': 'checkstrasse'})

url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20'
response = urllib2.urlopen(url, data)

this returns a page saying i have to reenter my search terms..
what's going wrong here?

Most likely you need a cookie. You'll probably have to set up a cookie
store for use with urllib2, then request the page that the search form
is on so that the cookie is generated, and then make your post with your
search terms.

Python 2.6.4 - Urllib2 - Windows XP - Reading streaming HTTP sourcekills network card ... (believe i	0	Jan 12, 2010
cookielib incorrectly escapes cookie	1	Jul 5, 2006
not able to HTTPS page from python	3	Nov 9, 2005
need help to upload file to webserver	0	Apr 8, 2008
POST data with 401 authentication using urllib(2)	0	Mar 25, 2005

scraping from bundes-telefonbuch.de with python

davidgp

Rebelo

Michael Torrie

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads