downloading a link with javascript in it..

Jetus · May 12, 2008

I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascript

penimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?

When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')

Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2

params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]

data = urllib.urlencode(params)

f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)

s = f.read()
f.close()
open('jcolib.html','w').write(s)

Diez B. Roggisch · May 12, 2008

Jetus said:
I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascriptpenimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?

When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')

Here is the following code I have been using
----------------------------------------------------------------
import urllib, urllib2

params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]

data = urllib.urlencode(params)

f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)

s = f.read()
f.close()
open('jcolib.html','w').write(s)

Use something like the FireBug-extension to see what the
openimagewin-function ultimately creates as reqest. Then issue that,
parametrised from parsed information out of the above href.

There is no way to interpret the JS in Python, let alone mimic possible
browser dom behavior.

Diez

7stud · May 12, 2008

I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says

href="javascriptpenimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

1) Use BeautifulSoup to extract the path:

JCCOGetImage.jsp?refnum=DN2007036179

from the html page.

2) The path is relative to the current url, so if the current url is:

http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp

Then the url to the page you want is:

http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2007036179

You can use urlparse.urljoin() to join a relative path to the current
url:

import urlparse

base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp'
relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179'

target_url = urlparse.urljoin(base_url, relative_url)
print target_url

--output:--
http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2007036179

3) Python has a webbrowser module that allows you to open urls in a
browser:

import webbrowser

webbrowser.open("www.google.com")

You could also use system() or os.startfile()[Windows], to do the same
thing:

os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe')

#You don't have to worry about directory names
#with spaces in them if you use startfile():
os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe')

All the urls you posted give me errors when I try to open them in a
browser, so you will have to sort out those problems first.

7stud · May 12, 2008

1) Use BeautifulSoup to extract the path:

JCCOGetImage.jsp?refnum=DN2007036179

from the html page.

BeautifulSoup will allow you to locate and extract the href attribute:

javascript

penimagewin('JCCOGetImage.jsp?refnum=DN2007036179');

See: "The attributes of Tags" in the BS docs.

Then you can use string functions(preferable) or a regex to get
everything between the parentheses(remove the quotes around the path,
too)

Jetus · May 13, 2008

Jetus schrieb:

I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says
href="javascriptpenimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

Click to expand...

So, in summary, when I download this page, for each record, I would
like to initiate the "view" link.
Can anyone point me in the right direction?

Click to expand...

When the "view" link is clicked on in IE or Firefox, it returns a pdf
file, so I should be able to download it with
urllib.urlretrieve('pdffile, 'c:\temp\pdffile')

Click to expand...

Here is the following code I have been using

Click to expand...

params = [
('booktype', 'L'),
('book', '930'),
('page', ''),
('hidPageName', 'S3Search'),
('DoItButton', 'Search'),]

Click to expand...

data = urllib.urlencode(params)

Click to expand...

f = urllib2.urlopen("http://www.landrecords.jcc.ky.gov/records/
S3DataLKUP.jsp", data)

Click to expand...

s = f.read()
f.close()
open('jcolib.html','w').write(s)

Click to expand...

Use something like the FireBug-extension to see what the
openimagewin-function ultimately creates as reqest. Then issue that,
parametrised from parsed information out of the above href.

There is no way to interpret the JS in Python, let alone mimic possible
browser dom behavior.

Diez

Thanks Diez;
Never used Firebug, and could not find the http-header section, but it
lead me to Tamper Data, and that was perfect to give me the headers.
Thanks for the input.

Jetus · May 13, 2008

I am able to download this page (enclosed code), but I then want to
download a pdf file that I can view in a regular browser by clicking
on the "view" link. I don't know how to automate this next part of my
script. It seems like it uses Javascript.
The line in the page source says

Click to expand...

href="javascriptpenimagewin('JCCOGetImage.jsp?
refnum=DN2007036179');" tabindex=-1>

Click to expand...

1) Use BeautifulSoup to extract the path:

JCCOGetImage.jsp?refnum=DN2007036179

from the html page.

2) The path is relative to the current url, so if the current url is:

http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp

Then the url to the page you want is:

http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2...

You can use urlparse.urljoin() to join a relative path to the current
url:

import urlparse

base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp'
relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179'

target_url = urlparse.urljoin(base_url, relative_url)
print target_url

--output:--http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2...

3) Python has a webbrowser module that allows you to open urls in a
browser:

import webbrowser

webbrowser.open("www.google.com")

You could also use system() or os.startfile()[Windows], to do the same
thing:

os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe')

#You don't have to worry about directory names
#with spaces in them if you use startfile():
os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe')

All the urls you posted give me errors when I try to open them in a
browser, so you will have to sort out those problems first.

7Stud;
Thanks for sharing your knowledge!!

1)The proper url to the website is http://www.landrecords.jcc.ky.gov/records/S0Search.html.

2) The join won't work. I found that the request it sends is
http://206.196.0.195/cgi-bin/webview/SEND2.PGM?dispfmt=&itype=Q&authorization=&parm2=SDAAAA76070B
It looks like it generates a random code for param2...
I have two choices for generating this javascript,
I can click on the View, or in the form, if I put a "i" in the code
and click on the
option link, it will send me pdf file.

3) Was not sure why you suggested I use the Webbrowser module?
But I am glad to find out about it.

Jan Claeys · May 16, 2008

Op Mon, 12 May 2008 22:06:28 +0200, schreef Diez B. Roggisch:

There is no way to interpret the JS in Python,

There is at least one way:
<http://wwwsearch.sourceforge.net/python-spidermonkey/>

I want to have a button in a web page with action for open new phone contact page and fill it with some date	1	Aug 31, 2023
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
Final chapter of "Learn PHP, MySQL and JavaScript"	3	Jun 4, 2024
Search Results with Pagination	1	Oct 25, 2024
Need Help With a Signup Form	10	Apr 7, 2022
Add recipes using JavaScript in table	20	Apr 17, 2023
Perhaps it is because of your home page, still in the strikingposition put python2 download link,	0	May 22, 2014
How can I structure the final array to meet the requirements of Bootstrap Tree View for building a tree in JavaScript?	1	Mar 29, 2024

downloading a link with javascript in it..

Jetus

Diez B. Roggisch

7stud

7stud

Jetus

Jetus

Jan Claeys

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads