How to share session with IE

Discussion in 'Python' started by zdp, Oct 10, 2006.

  1. zdp

    zdp Guest

    Hello!

    I need to process some webpages of a forum which is powered by discuz!.
    When I login, there are some options about how long to keep the
    cookies: forever, month, week, et al. If I choose forever, I don't
    need to login each time, and When I open the internet explorer I can
    access any pages directly. Some urls of the pages like:

    http://www.somesite.com/bbs/viewthread.php?tid=12345&extra=page=1

    However, now I need to process some pages by a python program. When I
    use urllib.urlopen(theurl), I can only get a page which told me I need
    login. I think It's reasonable, becuase I wasn't in a loggined session
    which as IE did.

    So how can I do my job? I want to get the right webpage by the url. I
    have search answers from the groups but didn't get clear answer. Should
    I use win32com or urllib? Any reply or information is appreciate. Hope
    I put it clear.

    Dapu
     
    zdp, Oct 10, 2006
    #1
    1. Advertising

  2. zdp

    Bernard Guest

    Hello Dapu,

    You can do the same thing as IE on your forum using urllib2 and
    cookielib. In short you need to code a small webcrawler. I can give you
    my browser module if necessary.
    You might not have the time to fiddle with the coding part or my
    browser module so you can also use this particularly useful module :
    http://wwwsearch.sourceforge.net/mechanize/
    The documentation is pretty clear for an initiated python programmer.
    If it's not your case, I'd recommend to read some ebooks on the python
    language first to get use to it.

    Bernard




    zdp wrote:
    > Hello!
    >
    > I need to process some webpages of a forum which is powered by discuz!.
    > When I login, there are some options about how long to keep the
    > cookies: forever, month, week, et al. If I choose forever, I don't
    > need to login each time, and When I open the internet explorer I can
    > access any pages directly. Some urls of the pages like:
    >
    > http://www.somesite.com/bbs/viewthread.php?tid=12345&extra=page=1
    >
    > However, now I need to process some pages by a python program. When I
    > use urllib.urlopen(theurl), I can only get a page which told me I need
    > login. I think It's reasonable, becuase I wasn't in a loggined session
    > which as IE did.
    >
    > So how can I do my job? I want to get the right webpage by the url. I
    > have search answers from the groups but didn't get clear answer. Should
    > I use win32com or urllib? Any reply or information is appreciate. Hope
    > I put it clear.
    >
    > Dapu
     
    Bernard, Oct 10, 2006
    #2
    1. Advertising

  3. zdp

    zdp Guest

    It's exactly what I want. I'll try. Thanks!

    Bernard wrote:
    > Hello Dapu,
    >
    > You can do the same thing as IE on your forum using urllib2 and
    > cookielib. In short you need to code a small webcrawler. I can give you
    > my browser module if necessary.
    > You might not have the time to fiddle with the coding part or my
    > browser module so you can also use this particularly useful module :
    > http://wwwsearch.sourceforge.net/mechanize/
    > The documentation is pretty clear for an initiated python programmer.
    > If it's not your case, I'd recommend to read some ebooks on the python
    > language first to get use to it.
    >
    > Bernard
    >
    >
    >
    >
    > zdp wrote:
    > > Hello!
    > >
    > > I need to process some webpages of a forum which is powered by discuz!.
    > > When I login, there are some options about how long to keep the
    > > cookies: forever, month, week, et al. If I choose forever, I don't
    > > need to login each time, and When I open the internet explorer I can
    > > access any pages directly. Some urls of the pages like:
    > >
    > > http://www.somesite.com/bbs/viewthread.php?tid=12345&extra=page=1
    > >
    > > However, now I need to process some pages by a python program. When I
    > > use urllib.urlopen(theurl), I can only get a page which told me I need
    > > login. I think It's reasonable, becuase I wasn't in a loggined session
    > > which as IE did.
    > >
    > > So how can I do my job? I want to get the right webpage by the url. I
    > > have search answers from the groups but didn't get clear answer. Should
    > > I use win32com or urllib? Any reply or information is appreciate. Hope
    > > I put it clear.
    > >
    > > Dapu
     
    zdp, Oct 10, 2006
    #3
  4. zdp

    John J. Lee Guest

    "Bernard" <> writes:
    > zdp wrote:

    [...]
    > > However, now I need to process some pages by a python program. When I
    > > use urllib.urlopen(theurl), I can only get a page which told me I need
    > > login. I think It's reasonable, becuase I wasn't in a loggined session
    > > which as IE did.
    > >
    > > So how can I do my job? I want to get the right webpage by the url. I
    > > have search answers from the groups but didn't get clear answer. Should
    > > I use win32com or urllib? Any reply or information is appreciate. Hope
    > > I put it clear.


    > You can do the same thing as IE on your forum using urllib2 and
    > cookielib. In short you need to code a small webcrawler. I can give you
    > my browser module if necessary.
    > You might not have the time to fiddle with the coding part or my
    > browser module so you can also use this particularly useful module :
    > http://wwwsearch.sourceforge.net/mechanize/
    > The documentation is pretty clear for an initiated python programmer.
    > If it's not your case, I'd recommend to read some ebooks on the python
    > language first to get use to it.


    In particular, if you're following the approach Bernard suggests, you
    can either:

    1. Log in every time your program runs, by going through the sequence
    of clicks, pages, etc. that you would use in a browser to log in.

    2. Once only (or once a month, or whatever), log in by hand using IE
    with a "Remember me"-style feature (if the website offers that) --
    where the webapp asks the browser to save the cookie rather than
    just keeping it in memory until you close your browser. Then your
    program can load the cookies from your real browser's cookie store
    using this:

    http://wwwsearch.sourceforge.net/mechanize/doc.html#browsers


    There are other alternatives too, but they depend on knowing a little
    bit more about how cookies and web apps work, and may or may not work
    depending on what exactly the server does. I'm thinking specifically
    here of saving *session* cookies (the kind that usually go away when
    you close your browser) in a file -- but the server may not like them
    when you send them back the next time, depending how much time has
    elapsed since the last run. Of course, you can always detect the
    "need to login" condition, and react accordingly.


    John
     
    John J. Lee, Oct 10, 2006
    #4
  5. John J. Lee wrote:
    > "Bernard" <> writes:
    >> zdp wrote:

    > [...]
    >>> However, now I need to process some pages by a python program. When I
    >>> use urllib.urlopen(theurl), I can only get a page which told me I need
    >>> login. I think It's reasonable, becuase I wasn't in a loggined session
    >>> which as IE did.
    >>>
    >>> So how can I do my job? I want to get the right webpage by the url. I
    >>> have search answers from the groups but didn't get clear answer. Should
    >>> I use win32com or urllib? Any reply or information is appreciate. Hope
    >>> I put it clear.

    >
    >> You can do the same thing as IE on your forum using urllib2 and
    >> cookielib. In short you need to code a small webcrawler. I can give you
    >> my browser module if necessary.
    >> You might not have the time to fiddle with the coding part or my
    >> browser module so you can also use this particularly useful module :
    >> http://wwwsearch.sourceforge.net/mechanize/
    >> The documentation is pretty clear for an initiated python programmer.
    >> If it's not your case, I'd recommend to read some ebooks on the python
    >> language first to get use to it.

    >
    > In particular, if you're following the approach Bernard suggests, you
    > can either:
    >
    > 1. Log in every time your program runs, by going through the sequence
    > of clicks, pages, etc. that you would use in a browser to log in.
    >
    > 2. Once only (or once a month, or whatever), log in by hand using IE
    > with a "Remember me"-style feature (if the website offers that) --
    > where the webapp asks the browser to save the cookie rather than
    > just keeping it in memory until you close your browser. Then your
    > program can load the cookies from your real browser's cookie store
    > using this:
    >
    > http://wwwsearch.sourceforge.net/mechanize/doc.html#browsers
    >
    >
    > There are other alternatives too, but they depend on knowing a little
    > bit more about how cookies and web apps work, and may or may not work
    > depending on what exactly the server does. I'm thinking specifically
    > here of saving *session* cookies (the kind that usually go away when
    > you close your browser) in a file -- but the server may not like them
    > when you send them back the next time, depending how much time has
    > elapsed since the last run. Of course, you can always detect the
    > "need to login" condition, and react accordingly.
    >
    >
    > John
    >



    Another option instead of making your program run through a series of
    clicks and text inputs, which is difficult to program, is to browse the
    html source until you find the name of the script that processes the
    login, and use python to request the page with the necessary form fields
    encoded in the request. Request something like
    http://www.targetsite.com/login.cgi?username=pyuser&password="fhqwhgads"
    This format is not guaranteed to work, since the login script or server
    might only support one of GET and POST. If this is the case, creating
    the request is slightly more involved and to be honest I haven't looked
    into how to do it.

    Thereafter, you will have to pass the environment to every page request
    so the server can read the cookie. Which brings me to question whether
    or not it is possible to do this manually once, export the environment
    variable to a file, and reload this file each time the program is run.
    Or to generate the cookie in the environment yourself. Quite frankly
    any server application that allows the client to control whether or not
    they have logged in sucks, but I've seen a fair few that do.[citation
    required]

    Cameron.
     
    Cameron Walsh, Oct 11, 2006
    #5
  6. I just thought, your original question was whether or not it was
    possible to share your browser session with IE. Unless you do this
    explicitly, you may require a different login for your Python program
    and for your IE user. If the Python program does not get the same
    cookie as used by IE, or vice-versa, and tries to login, you may find
    this resets the login in the other browser.

    Oh and at risk of starting a flame war <a href="www.getfirefox.com"
    reason="obligatory pro open source link">get a real browser.</a>
     
    Cameron Walsh, Oct 11, 2006
    #6
  7. zdp

    zdp Guest

    I found some similar topics in the newsgroup and get some ideas from
    them.
    http://groups.google.com/group/comp.lang.python/browse_thread/thread/2fe0be6c386adce4
    http://groups.google.com/group/comp.lang.python/browse_thread/thread/a51cec8747f64619

    According to all you suggestions, there are at least two ways to get my
    result.

    1. Use the cookie of IE, so I don't need to code to logon. That means I
    must use ClientCookie. I found some example in the docs and the
    newsgroup. Below is some code based on the docs of ClientCookie. But
    the page I get is still the page told me must login ( I CAN get the
    right page in IE).

    import ClientCookie, urllib2

    url_string="http://www.targetsite.com/bbs/viewthread.php?tid=12345"
    #the page I want to get

    cj = ClientCookie.MSIECookieJar(delayload=True)
    cj.load_from_registry()
    print cj #I want to know what I get

    opener =
    ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cj))
    ClientCookie.install_opener(opener)
    f = ClientCookie.urlopen(url_string)
    print f.read() # NOT the right page html


    2. Logon myself by python. First, I access the login page and submit
    the form of username and password. The form has many fields other than
    username and passwd, so the dict "data" has all the fields even if it's
    hide. Then, if the login succeed, I can get my page use the opener with
    CookieJar.

    import urllib2, cookielib

    url_string="http://www.targetsite.com/bbs/viewthread.php?tid=12345"
    #the page I want to get
    url_login="http://www.targetsite.com/bbs/logging.php?action=login"
    #the login page

    headers = {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5;
    Windows NT)'}
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

    urllib2.install_opener(opener)
    data = {
    'formhash': '3bd8bc0a',
    "referer" : "index.php",
    "loginfield": "username",
    'username': 'myname',
    'password': 'mypass',
    "questionid": 0,
    "answer":"",
    "cookietime" : "315360000",
    "loginmode":"",
    "styleid":""
    }
    req=urllib2.Request(url_login, urllib.urlencode(data), headers)
    f = opener.open(req)
    print req.get_data()
    print req.header_items()
    print f.info()
    print f.read()

    ## if login succeed, I can get my page
    f=opener.open( url_string)


    However, both ways didn't work for me. I don't know what's wrong. If
    it's because the server page check the header or the submit of the form
    is wrong?

    I didn't study Mechanize module yet. I want a solution as simple as
    possible for distribution reason.

    John J. Lee 写é“:

    > "Bernard" <> writes:
    > > zdp wrote:

    > [...]
    > > > However, now I need to process some pages by a python program. When I
    > > > use urllib.urlopen(theurl), I can only get a page which told me I need
    > > > login. I think It's reasonable, becuase I wasn't in a loggined session
    > > > which as IE did.
    > > >
    > > > So how can I do my job? I want to get the right webpage by the url. I
    > > > have search answers from the groups but didn't get clear answer. Should
    > > > I use win32com or urllib? Any reply or information is appreciate. Hope
    > > > I put it clear.

    >
    > > You can do the same thing as IE on your forum using urllib2 and
    > > cookielib. In short you need to code a small webcrawler. I can give you
    > > my browser module if necessary.
    > > You might not have the time to fiddle with the coding part or my
    > > browser module so you can also use this particularly useful module :
    > > http://wwwsearch.sourceforge.net/mechanize/
    > > The documentation is pretty clear for an initiated python programmer.
    > > If it's not your case, I'd recommend to read some ebooks on the python
    > > language first to get use to it.

    >
    > In particular, if you're following the approach Bernard suggests, you
    > can either:
    >
    > 1. Log in every time your program runs, by going through the sequence
    > of clicks, pages, etc. that you would use in a browser to log in.
    >
    > 2. Once only (or once a month, or whatever), log in by hand using IE
    > with a "Remember me"-style feature (if the website offers that) --
    > where the webapp asks the browser to save the cookie rather than
    > just keeping it in memory until you close your browser. Then your
    > program can load the cookies from your real browser's cookie store
    > using this:
    >
    > http://wwwsearch.sourceforge.net/mechanize/doc.html#browsers
    >
    >
    > There are other alternatives too, but they depend on knowing a little
    > bit more about how cookies and web apps work, and may or may not work
    > depending on what exactly the server does. I'm thinking specifically
    > here of saving *session* cookies (the kind that usually go away when
    > you close your browser) in a file -- but the server may not like them
    > when you send them back the next time, depending how much time has
    > elapsed since the last run. Of course, you can always detect the
    > "need to login" condition, and react accordingly.
    >
    >
    > John
     
    zdp, Oct 12, 2006
    #7
  8. zdp

    John J. Lee Guest

    Cameron Walsh <> writes:
    [...]
    > Another option instead of making your program run through a series of
    > clicks and text inputs, which is difficult to program, is to browse
    > the html source until you find the name of the script that processes
    > the login, and use python to request the page with the necessary form
    > fields encoded in the request. Request something like
    > http://www.targetsite.com/login.cgi?username=pyuser&password="fhqwhgads"
    > This format is not guaranteed to work, since the login script or
    > server might only support one of GET and POST. If this is the case,
    > creating the request is slightly more involved and to be honest I
    > haven't looked into how to do it.


    Absolutely, that's often a great way to do things, since it's very
    simple, and is not in conflict with handling cookies (where that's
    required).

    (But of course if you need to handle cookies, you still need to
    arrange to actually handle the cookies somewhere.)


    > Thereafter, you will have to pass the environment to every page
    > request so the server can read the cookie. Which brings me to
    > question whether or not it is possible to do this manually once,
    > export the environment variable to a file, and reload this file each
    > time the program is run. Or to generate the cookie in the environment
    > yourself.

    [...]

    Standard library module cookielib (or mechanize, which is not part of
    the stdlib, and does some more stuff automatically and provides some
    extra features for page navigation and form handling) does all this
    automatically:

    http://docs.python.org/lib/cookielib-examples.html

    import cookielib, urllib2
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    r = opener.open("http://example.com/")


    For loading and saving (including Firefox support):

    http://docs.python.org/lib/file-cookie-jar-classes.html

    http://docs.python.org/lib/cookie-jar-objects.html


    For loading IE cookies, use mechanize.

    http://wwwsearch.sourceforge.net/mechanize/


    John
     
    John J. Lee, Oct 14, 2006
    #8
  9. zdp

    John J. Lee Guest

    "zdp" <> writes:
    [...]
    > 1. Use the cookie of IE, so I don't need to code to logon. That means I
    > must use ClientCookie. I found some example in the docs and the
    > newsgroup. Below is some code based on the docs of ClientCookie. But
    > the page I get is still the page told me must login ( I CAN get the
    > right page in IE).


    Try mechanize (same website as ClientCookie -- though right now, that
    part of sourceforge seems to be down for me). It supports more
    browser features automatically.

    [...]
    > However, both ways didn't work for me. I don't know what's wrong. If
    > it's because the server page check the header or the submit of the form
    > is wrong?


    Changing the HTTP headers you send may solve your problem, yes.
    Either that, or the response body ;-)


    > I didn't study Mechanize module yet. I want a solution as simple as
    > possible for distribution reason.


    OK, then you should compare the HTTP requests that a real browser
    sends with the HTTP requests that your Python script sends. The
    following pages give some help with that (from memory, since the site
    is down right now):

    http://wwwsearch.sf.net/mechanize/doc.html#debugging

    http://wwwsearch.sf.net/bits/GeneralFAQ.html


    John
     
    John J. Lee, Oct 14, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bill Reynen
    Replies:
    1
    Views:
    463
    Kevin Spencer
    Dec 17, 2003
  2. far asl via DotNetMonster.com

    How can i share asp session data with asp.net session data

    far asl via DotNetMonster.com, Mar 22, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    579
    =?Utf-8?B?VmliaHUgQmFuc2Fs?=
    Mar 23, 2005
  3. Saraswati lakki
    Replies:
    0
    Views:
    1,427
    Saraswati lakki
    Jan 6, 2012
  4. Jack
    Replies:
    2
    Views:
    246
  5. Eric Wong
    Replies:
    0
    Views:
    226
    Eric Wong
    Feb 13, 2010
Loading...

Share This Page