downloading a link with javascript in it..

J

Jetus

I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:eek:penimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?

When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')

Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2

params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]

data = urllib.urlencode(params)

f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)

s = f.read()
f.close()
open('jcolib.html','w').write(s)
 
D

Diez B. Roggisch

Jetus said:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:eek:penimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?

When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')

Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2

params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]

data = urllib.urlencode(params)

f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)

s = f.read()
f.close()
open('jcolib.html','w').write(s)

Use something like the FireBug-extension to see what the
openimagewin-function ultimately creates as reqest. Then issue that,
parametrised from parsed information out of the above href.

There is no way to interpret the JS in Python, let alone mimic possible
browser dom behavior.

Diez
 
7

7stud

I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says

href="javascript:eek:penimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

1) Use BeautifulSoup to extract the path:

JCCOGetImage.jsp?refnum=DN2007036179

from the html page.


2) The path is relative to the current url, so if the current url is:

http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp

Then the url to the page you want is:

http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2007036179

You can use urlparse.urljoin() to join a relative path to the current
url:


import urlparse

base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp'
relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179'

target_url = urlparse.urljoin(base_url, relative_url)
print target_url

--output:--
http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2007036179



3) Python has a webbrowser module that allows you to open urls in a
browser:

import webbrowser

webbrowser.open("www.google.com")


You could also use system() or os.startfile()[Windows], to do the same
thing:

os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe')

#You don't have to worry about directory names
#with spaces in them if you use startfile():
os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe')


All the urls you posted give me errors when I try to open them in a
browser, so you will have to sort out those problems first.
 
7

7stud

1) Use BeautifulSoup to extract the path:

JCCOGetImage.jsp?refnum=DN2007036179

from the html page.

BeautifulSoup will allow you to locate and extract the href attribute:

javascript:eek:penimagewin('JCCOGetImage.jsp?refnum=DN2007036179');

See: "The attributes of Tags" in the BS docs.

Then you can use string functions(preferable) or a regex to get
everything between the parentheses(remove the quotes around the path,
too)
 
J

Jetus

Jetus schrieb:


I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:eek:penimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>
So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?
When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')
Here is the following code I have been using
params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]
data = urllib.urlencode(params)
f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)
s = f.read()
f.close()
open('jcolib.html','w').write(s)

Use something like the FireBug-extension to see what the
openimagewin-function ultimately creates as reqest. Then issue that,
parametrised from parsed information out of the above href.

There is no way to interpret the JS in Python, let alone mimic possible
browser dom behavior.

Diez

Thanks Diez;
Never used Firebug, and could not find the http-header section, but it
lead me to Tamper Data, and that was perfect to give me the headers.
Thanks for the input.
 
J

Jetus

I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript:eek:penimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

1) Use BeautifulSoup to extract the path:

JCCOGetImage.jsp?refnum=DN2007036179

from the html page.

2) The path is relative to the current url, so if the current url is:

http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp

Then the url to the page you want is:

http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2...

You can use urlparse.urljoin() to join a relative path to the current
url:

import urlparse

base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp'
relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179'

target_url = urlparse.urljoin(base_url, relative_url)
print target_url

--output:--http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2...

3) Python has a webbrowser module that allows you to open urls in a
browser:

import webbrowser

webbrowser.open("www.google.com")

You could also use system() or os.startfile()[Windows], to do the same
thing:

os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe')

#You don't have to worry about directory names
#with spaces in them if you use startfile():
os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe')

All the urls you posted give me errors when I try to open them in a
browser, so you will have to sort out those problems first.

7Stud;
Thanks for sharing your knowledge!!

1)The proper url to the website is http://www.landrecords.jcc.ky.gov/records/S0Search.html.

2) The join won't work. I found that the request it sends is
http://206.196.0.195/cgi-bin/webview/SEND2.PGM?dispfmt=&itype=Q&authorization=&parm2=SDAAAA76070B
It looks like it generates a random code for param2...
I have two choices for generating this javascript,
I can click on the View, or in the form, if I put a "i" in the code
and click on the
option link, it will send me pdf file.

3) Was not sure why you suggested I use the Webbrowser module?
But I am glad to find out about it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,264
Messages
2,571,065
Members
48,770
Latest member
ElysaD

Latest Threads

Top