some site login problem help plz..

Discussion in 'Python' started by james27, Oct 5, 2009.

  1. james27

    james27 Guest

    hello..
    im new to python.
    i have some problem with mechanize.
    before i was used mechanize with no problem.
    but i couldn't success login with some site.
    for several days i was looked for solution but failed.
    my problem is , login is no probelm but can't retrieve html source code from
    opened site.
    actually only can read some small html code, such like below.

    <html>
    <script language=javascript>
    location.replace("http://www.naver.com");
    </script>
    </html>

    i want to retrive full html source code..but i can't . i was try with twill
    and mechanize and urllib and so on.
    i have no idea.. anyone can help me?

    here is full source code.
    and Thanks in advance!

    # -*- coding: cp949 -*-
    import sys,os
    import mechanize, urllib
    import cookielib
    import re
    import BeautifulSoup

    params = urllib.urlencode({'url':'http://www.naver.com',
    'svctype':'',
    'viewtype':'',
    'postDataKey':'',

    'encpw':'3a793b174d976d8a614467eb0466898230f39ca68a8ce2e9c866f9c303e7c96a17c0e9bfd02b958d88712f5799abc5d26d5b6e2dfa090e10e236f2afafb723d42d2a2aba6cc3f268e214a169086af782c22d0c440c876a242a4411860dd938c4051acce987',
    'encnm':'100003774',
    'saveID':'0',
    'enctp':'1',
    'smart_level':'1',
    'id':'lbu142vj',
    'pw':'wbelryl',
    'x':'24',
    'y':'4'
    })
    rq = mechanize.Request("http://nid.naver.com/nidlogin.login", params)
    rs = mechanize.urlopen(rq)
    data = rs.read()
    print data
    rq = mechanize.Request("http://mail2.naver.com")
    rs = mechanize.urlopen(rq)
    data = rs.read()
    print data
    --
    View this message in context: http://www.nabble.com/some-site-login-problem-help-plz..-tp25746497p25746497.html
    Sent from the Python - python-list mailing list archive at Nabble.com.
     
    james27, Oct 5, 2009
    #1
    1. Advertising

  2. james27 wrote:

    >
    > hello..
    > im new to python.
    > i have some problem with mechanize.
    > before i was used mechanize with no problem.
    > but i couldn't success login with some site.
    > for several days i was looked for solution but failed.
    > my problem is , login is no probelm but can't retrieve html source code
    > from opened site.
    > actually only can read some small html code, such like below.
    >
    > <html>
    > <script language=javascript>
    > location.replace("http://www.naver.com");
    > </script>
    > </html>
    >
    > i want to retrive full html source code..but i can't . i was try with
    > twill and mechanize and urllib and so on.
    > i have no idea.. anyone can help me?


    Your problem is that the site uses JavaScript to replace itself. Mechanize
    can't do anything about that. You might have more luck with scripting a
    browser. No idea if there are any special packages available for that
    though.

    Diez
     
    Diez B. Roggisch, Oct 5, 2009
    #2
    1. Advertising

  3. james27

    james27 Guest

    still looking for good solution.
    anyway..thanks Diez :)

    Diez B. Roggisch-2 wrote:
    >
    > james27 wrote:
    >
    >>
    >> hello..
    >> im new to python.
    >> i have some problem with mechanize.
    >> before i was used mechanize with no problem.
    >> but i couldn't success login with some site.
    >> for several days i was looked for solution but failed.
    >> my problem is , login is no probelm but can't retrieve html source code
    >> from opened site.
    >> actually only can read some small html code, such like below.
    >>
    >> <html>
    >> <script language=javascript>
    >> location.replace("http://www.naver.com");
    >> </script>
    >> </html>
    >>
    >> i want to retrive full html source code..but i can't . i was try with
    >> twill and mechanize and urllib and so on.
    >> i have no idea.. anyone can help me?

    >
    > Your problem is that the site uses JavaScript to replace itself. Mechanize
    > can't do anything about that. You might have more luck with scripting a
    > browser. No idea if there are any special packages available for that
    > though.
    >
    > Diez
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    >


    --
    View this message in context: http://www.nabble.com/some-site-login-problem-help-plz..-tp25746497p25750229.html
    Sent from the Python - python-list mailing list archive at Nabble.com.
     
    james27, Oct 5, 2009
    #3
  4. james27

    lkcl Guest

    On Oct 5, 8:26 am, "Diez B. Roggisch" <> wrote:
    > james27 wrote:
    >
    > > hello..
    > > im new to python.
    > > i have some problem with mechanize.
    > > before i was used mechanize with no problem.
    > > but i couldn't success login with some site.
    > > for several days i was looked for solution but failed.
    > > my problem is , login is no probelm but can't retrieve html source code
    > > from opened site.
    > > actually only can read some small html code, such like below.

    >
    > > <html>
    > > <script language=javascript>
    > > location.replace("http://www.naver.com");
    > > </script>
    > > </html>

    >
    > > i want to retrive full html source code..but i can't . i was try with
    > > twill and mechanize and urllib and so on.
    > > i have no idea.. anyone can help me?

    >
    > Your problem is that the site usesJavaScriptto replace itself. Mechanize
    > can't do anything about that. You might have more luck with scripting a
    > browser. No idea if there are any special packages available for that
    > though.


    yes, there are. i've mentioned this a few times, on
    comp.lang.python,
    (so you can search for them) and have the instances documented here:

    http://wiki.python.org/moin/WebBrowserProgramming

    basically, you're not going to like this, but you actually need
    a _full_ web browser engine, and to _execute_ the javascript.
    then, after a suitable period of time (or after the engine's
    "stopped executing" callback has been called, if it has one)
    you can then node-walk the DOM of the engine, grab the engine's
    document.body.innerHTML property, or use the engine's built-in
    XPath support (if it has it) to find specific parts of the DOM
    faster than if you extracted the text (into lxml etc).

    you should not be shocked by this - by the fact that it takes
    a whopping 10 or 20mb library, including a graphical display
    mechanism, to execute a few bits of javascript.

    also, if you ask him nicely, flier liu is currently working on
    http://code.google.com/p/pyv8 and on implementing the W3C DOM
    standard as a "daemon" service (i.e. with no GUI component) and
    he might be able to help you out. the pyv8 project comes with
    an example w3c.py file which implements DOM partially, but i
    know he's done a lot more.

    so - it's all doable, but for a given value of "do" :)

    l.
     
    lkcl, Oct 12, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. brian

    Size of my Struct? PLZ PLZ reply

    brian, Nov 23, 2004, in forum: C Programming
    Replies:
    7
    Views:
    428
    -berlin.de
    Nov 25, 2004
  2. Replies:
    1
    Views:
    400
  3. Replies:
    2
    Views:
    354
    Nick Keighley
    Nov 24, 2006
  4. Replies:
    1
    Views:
    359
  5. nocturnal
    Replies:
    1
    Views:
    546
    nocturnal
    Jul 10, 2009
Loading...

Share This Page