scraping from bundes-telefonbuch.de with python

D

davidgp

hello, i'm new on this group, and quiet new to python!
i'm trying to scrap some adress data from bundes-telefonbuch.de but i
run into a problem:
the link is like this: http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20
and it is basically the same for every search query.
thus i need to submit post data to the webserver, i try to do this
like this:

opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible;
Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')]
urllib2.install_opener(opener)

data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A ||
G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche
starten','S': '1','translationtemplate': 'checkstrasse'})

url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20'
response = urllib2.urlopen(url, data)

this returns a page saying i have to reenter my search terms..
what's going wrong here?

Thanks!!
 
R

Rebelo

hello, i'm new on this group, and quiet new to python!
i'm trying to scrap some adress data from bundes-telefonbuch.de but i
run into a problem:
the link is like this:http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20
and it is basically the same for every search query.
thus i need to submit post data to the webserver, i try to do this
like this:

opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible;
Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')]
urllib2.install_opener(opener)

data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A ||
G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche
starten','S': '1','translationtemplate': 'checkstrasse'})

url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20'
response = urllib2.urlopen(url, data)

this returns a page saying i have to reenter my search terms..
what's going wrong here?

Thanks!!

Try mechanize : http://wwwsearch.sourceforge.net/mechanize/

import mechanize
response = mechanize.urlopen("http://www.bundes-telefonbuch.de/")
forms = mechanize.ParseResponse(response, backwards_compat=False)
form = forms[0]
form["F0"] = "query" #enter query
html = mechanize.urlopen(form.click()).read()
f = open("tmp.html","w")
f.writelines(html)
f.close()

Or you can try to parse response but I think that their HTML is not
valid
 
M

Michael Torrie

opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible;
Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')]
urllib2.install_opener(opener)

data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A ||
G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche
starten','S': '1','translationtemplate': 'checkstrasse'})

url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20'
response = urllib2.urlopen(url, data)

this returns a page saying i have to reenter my search terms..
what's going wrong here?

Most likely you need a cookie. You'll probably have to set up a cookie
store for use with urllib2, then request the page that the search form
is on so that the cookie is generated, and then make your post with your
search terms.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,201
Latest member
KourtneyBe

Latest Threads

Top