understanding perl "get ($url) " function

K

KK

Hi there,
I need to access several pages of the website viz.,
'www.ieeexplore.ieee.org/Xplore/DynWel.jsp' (only accessable to
members). unfortunately, I cannot open multiple browsers (more than
10) of that website on my PC (a constraint set by the number of
licenses bought). However, one can navigate as many pages as you wish
in a single browser.

Coming to the problem, I have a perl program which browses through
several pages of the above mentioned website through use of get($url)
function. Doing so, after some executions of get($url) function, I run
into the problem of limited licenses (mentioned above). Does this
mean, every time I execute get($url) function, its equivalent to
opening the url in a new browser? If yes,
what is the possible alternative to get around this problem?

awaiting for possible help,
regards,

-KK
 
M

Malcolm Dew-Jones

KK ([email protected]) wrote:
: Hi there,
: I need to access several pages of the website viz.,
: 'www.ieeexplore.ieee.org/Xplore/DynWel.jsp' (only accessable to
: members). unfortunately, I cannot open multiple browsers (more than
: 10) of that website on my PC (a constraint set by the number of
: licenses bought). However, one can navigate as many pages as you wish
: in a single browser.

: Coming to the problem, I have a perl program which browses through
: several pages of the above mentioned website through use of get($url)
: function. Doing so, after some executions of get($url) function, I run
: into the problem of limited licenses (mentioned above). Does this
: mean, every time I execute get($url) function, its equivalent to
: opening the url in a new browser? If yes,
: what is the possible alternative to get around this problem?
:
: awaiting for possible help,
: regards,


It's hard to say without seeing the http traffic between your pc and the
server.

One technique that a server _might_ be using would be to send a cookie to
the browser after the first request so that on later requests the server
knows which browser is contacting it.

_If_ this were the case, then your first get($url) request would have to
save any cookies that the server sends, and then send the same cookie(s)
back to the server with any later requests.

You would have to read the docs for the get($url) function you are using
to see how to add and receive cookies to/from each request.

$0.02
 
B

Ben Morrow

KK ([email protected]) wrote:
: Hi there,
: I need to access several pages of the website viz.,
: 'www.ieeexplore.ieee.org/Xplore/DynWel.jsp' (only accessable to
: members). unfortunately, I cannot open multiple browsers (more than
: 10) of that website on my PC (a constraint set by the number of
: licenses bought). However, one can navigate as many pages as you wish
: in a single browser.

: Coming to the problem, I have a perl program which browses through
: several pages of the above mentioned website through use of get($url)
: function. Doing so, after some executions of get($url) function, I run
: into the problem of limited licenses (mentioned above). Does this
: mean, every time I execute get($url) function, its equivalent to
: opening the url in a new browser? If yes,
: what is the possible alternative to get around this problem?
:
: awaiting for possible help,
: regards,


It's hard to say without seeing the http traffic between your pc and the
server.

One technique that a server _might_ be using would be to send a cookie to
the browser after the first request so that on later requests the server
knows which browser is contacting it.

One thing that has to be said at this point is that you must read the
T&C of your license. Are you allowed to get the pages automatically at
all?

Ben
 
G

gnari

KK said:
Hi there,
I need to access several pages of the website viz.,
'www.ieeexplore.ieee.org/Xplore/DynWel.jsp' (only accessable to
members). unfortunately, I cannot open multiple browsers (more than
10) of that website on my PC (a constraint set by the number of
licenses bought). However, one can navigate as many pages as you wish
in a single browser.

Coming to the problem, I have a perl program which browses through
several pages of the above mentioned website through use of get($url)
function. Doing so, after some executions of get($url) function, I run
into the problem of limited licenses (mentioned above). Does this
mean, every time I execute get($url) function, its equivalent to
opening the url in a new browser? If yes,
what is the possible alternative to get around this problem?

actually, this depends on exactly how the site is tracking sessions.
2 things come to mind: referer and cookies (most likely)
you should figure if the site is using one of these, and alter your
program accordingly

when you have a perl question come back to this group, but
there are other groups more suited to HTTP discussions.

gnari
 
T

Tad McClellan

Malcolm Dew-Jones said:
KK ([email protected]) wrote:

: Coming to the problem, I have a perl program


Yes, but you do NOT have a perl (nor Perl) problem.

If you were using Java or Python you would be having the same problem.

: after some executions of get($url) function, I run
: into the problem of limited licenses

It's hard to say without seeing the http traffic between your pc and the
server.


This is a great tool for seeing that traffic:

Web Scraping Proxy

http://www.research.att.com/~hpk/wsp/

One technique that a server _might_ be using would be to send a cookie to
the browser after the first request so that on later requests the server
knows which browser is contacting it.

You would have to read the docs for the get($url) function you are using
to see how to add and receive cookies to/from each request.


He might not even "have to" read the docs, as wsp.pl will write
the Perl code for him!


I love that thing for reverse-engineering my many web scraping tools...
 
K

KK

Thank you one & all, for the ideas. In my previous missive, as I
mentioned, the problem I'm facing is the 'login concurrency'. I could
conclude that the site is keeping track of my navigation through
cookies. One of the replies, suggested to save/resend the cookie
through get($url) function. Apparently CPAN/other documentation of
get($url) I've referred, does not contain any such feature. Now my
question boils down to implentation of save/resend of cookies with
modules get($url) & getstore($url,$filename) (my program uses both).
I'm a newbie to perl programming and would like to request for a
little detailed explaination. Thank you fellow-perls.
 
U

Uri Guttman

K> Thank you one & all, for the ideas. In my previous missive, as I
K> mentioned, the problem I'm facing is the 'login concurrency'. I could
K> conclude that the site is keeping track of my navigation through
K> cookies. One of the replies, suggested to save/resend the cookie
K> through get($url) function. Apparently CPAN/other documentation of
K> get($url) I've referred, does not contain any such feature. Now my
K> question boils down to implentation of save/resend of cookies with
K> modules get($url) & getstore($url,$filename) (my program uses both).
K> I'm a newbie to perl programming and would like to request for a
K> little detailed explaination. Thank you fellow-perls.

it is in LWP::UserAgent. get and getstore are only in LWP::Simple and as
that name implies, it doesn't support fancier stuff like using cookies.

and if you want to make your life much easier, use WWW::Mechanize which
will do the cookies, fetching, and parsing all for you in one nice clean
api. it is designed to do what you are attempting.

uri
 
J

J. Gleixner

[...] Apparently CPAN/other documentation of
get($url) I've referred, does not contain any such feature. Now my
question boils down to implentation of save/resend of cookies with
modules get($url) & getstore($url,$filename) (my program uses both).
I'm a newbie to perl programming and would like to request for a
little detailed explaination. Thank you fellow-perls.

You actually looked on CPAN for "cookies" and didn't find anything???
How odd.. :)

Check the documentation for LWP and look for helpful examples in the LWP
cookbook:

perldoc lwpcook

look for "cookies". Could also check the documentation for
HTTP::Cookies and you'll see examples of using the cookie_jar method.
 
K

KK

Hello everyone, needless to say I'm greatly indebted to everyone who
have given their valuable suggestions here. I DID IT FINALLY ! I used
LWP::UserAgent for the purpose of saving & resending the cookie. I'm
so glad that many hours of labour work is cut down to just an hour
duration of simulation work. perl programming rocks!

Gleixner! dude! :)! please! I did not say CPAN does not have
modules/documentation for "cookies" - I meant a different
thing...nevermind.

I did not have to try mechanize. Useragent did the job for me.
 
U

Uri Guttman

<don't top post. read the group guidelines which are posted regularly>

K> I did not have to try mechanize. Useragent did the job for me.

it will still save you much coding. but what do i care about saving you
work?

uri
 
K

KK

Uri Guttman said:
<don't top post. read the group guidelines which are posted regularly>

K> I did not have to try mechanize. Useragent did the job for me.

it will still save you much coding. but what do i care about saving you
work?

uri

Hi Uri, I did not mean to turn down your suggestion. Just that it was
a bit late I came accross your suggestion, when, by that time I could
get my job done with Useragent. I am/was pressed for time. regards,
-KK
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top