how can i use lxml with win32com?

Discussion in 'Python' started by elca, Oct 25, 2009.

  1. elca

    elca Guest

    hello...
    if anyone know..please help me !
    i really want to know...i was searched in google lot of time.
    but can't found clear soultion. and also because of my lack of python
    knowledge.
    i want to use IE.navigate function with beautifulsoup or lxml..
    if anyone know about this or sample.
    please help me!
    thanks in advance ..
    --
    View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26044339.html
    Sent from the Python - python-list mailing list archive at Nabble.com.
     
    elca, Oct 25, 2009
    #1
    1. Advertising

  2. Hi,

    elca, 25.10.2009 02:35:
    > hello...
    > if anyone know..please help me !
    > i really want to know...i was searched in google lot of time.
    > but can't found clear soultion. and also because of my lack of python
    > knowledge.
    > i want to use IE.navigate function with beautifulsoup or lxml..
    > if anyone know about this or sample.
    > please help me!
    > thanks in advance ..


    You wrote a message with nine lines, only one of which gives a tiny hint on
    what you actually want to do. What about providing an explanation of what
    you want to achieve instead? Try to answer questions like: Where does your
    data come from? Is it XML or HTML? What do you want to do with it?

    This might help:

    http://www.catb.org/~esr/faqs/smart-questions.html

    Stefan
     
    Stefan Behnel, Oct 25, 2009
    #2
    1. Advertising

  3. elca

    elca Guest

    Hello,
    im very sorry .
    first my source is come from website which consist of html mainly.
    and i want to make web scraper.
    i was found some script source in internet.
    following is script source which can beautifulsoup and PAMIE work together.
    but if i run this script source error was happened.

    AttributeError: PAMIE instance has no attribute 'pageText'
    File "C:\test12.py", line 7, in <module>
    bs = BeautifulSoup(ie.pageText())

    and following is orginal source until i was found in internet.

    from BeautifulSoup import BeautifulSoup
    from PAM30 import PAMIE
    url = 'http://www.cnn.com'
    ie = PAMIE(url)
    bs = BeautifulSoup(ie.pageText())

    if possible i really want to make it work together with beautifulsoup or
    lxml with PAMIE.
    sorry my bad english.
    thanks in advance.






    Stefan Behnel-3 wrote:
    >
    > Hi,
    >
    > elca, 25.10.2009 02:35:
    >> hello...
    >> if anyone know..please help me !
    >> i really want to know...i was searched in google lot of time.
    >> but can't found clear soultion. and also because of my lack of python
    >> knowledge.
    >> i want to use IE.navigate function with beautifulsoup or lxml..
    >> if anyone know about this or sample.
    >> please help me!
    >> thanks in advance ..

    >
    > You wrote a message with nine lines, only one of which gives a tiny hint
    > on
    > what you actually want to do. What about providing an explanation of what
    > you want to achieve instead? Try to answer questions like: Where does your
    > data come from? Is it XML or HTML? What do you want to do with it?
    >
    > This might help:
    >
    > http://www.catb.org/~esr/faqs/smart-questions.html
    >
    > Stefan
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    >


    --
    View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26045617.html
    Sent from the Python - python-list mailing list archive at Nabble.com.
     
    elca, Oct 25, 2009
    #3
  4. elca

    User Guest

    On 25 Oct 2009, at 07:45 , elca wrote:

    > i want to make web scraper.
    > if possible i really want to make it work together with
    > beautifulsoup or
    > lxml with PAMIE.


    Scraping information from webpages falls apart in two tasks:

    1. Getting the HTML data
    2. Extracting information from the HTML data

    It looks like you want to use Internet Explorer for getting the HTML
    data; is there any reason you can't use a simpler approach like using
    urllib2.urlopen()?

    Once you have the HTML data, you could feed it into BeautifulSoup or
    lxml.

    Mixing up 1 and 2 into a single statement created some confusion for
    you, I think.

    Greetings,
     
    User, Oct 25, 2009
    #4
  5. elca

    elca Guest

    Hello,
    yes there is some reason why i nave to insist internet explorere interface.
    because of javascript im trying to insist use PAMIE.
    i was tried some other solution urlopen or mechanize and so on.
    but it hard to use javascript.
    can you show me some sample for me ? :)
    such like if i want to extract some text in CNN website with 'CNN Shop'
    'Site map' in bottom of CNN website page by use PAMIE.
    thanks for your help.



    motoom wrote:
    >
    >
    > On 25 Oct 2009, at 07:45 , elca wrote:
    >
    >> i want to make web scraper.
    >> if possible i really want to make it work together with
    >> beautifulsoup or
    >> lxml with PAMIE.

    >
    > Scraping information from webpages falls apart in two tasks:
    >
    > 1. Getting the HTML data
    > 2. Extracting information from the HTML data
    >
    > It looks like you want to use Internet Explorer for getting the HTML
    > data; is there any reason you can't use a simpler approach like using
    > urllib2.urlopen()?
    >
    > Once you have the HTML data, you could feed it into BeautifulSoup or
    > lxml.
    >
    > Mixing up 1 and 2 into a single statement created some confusion for
    > you, I think.
    >
    > Greetings,
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    >


    --
    View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26045673.html
    Sent from the Python - python-list mailing list archive at Nabble.com.
     
    elca, Oct 25, 2009
    #5
  6. elca

    User Guest

    On 25 Oct 2009, at 08:06 , elca wrote:

    > because of javascript im trying to insist use PAMIE.


    I see, your problem is not with lxml or BeautifulSoup, but getting the
    raw data in the first place.


    > i want to extract some text in CNN website with 'CNN Shop'
    > 'Site map' in bottom of CNN website page


    What text? Can you give an example? I'd like to be able to reproduce
    it manually in the webbrowser so I get a clear idea what exactly
    you're trying to achieve.

    Greetings,
     
    User, Oct 25, 2009
    #6
  7. elca

    elca Guest

    hello,
    www.cnn.com in main website page.
    for example ,if you see www.cnn.com's html source, maybe you can find such
    like line of html source.

    http://www.turnerstoreonline.com/ CNN Shop

    and for example if i want to extract 'CNN Shop' text in html source.
    and i want to add such like function ,with following script source.

    from BeautifulSoup import BeautifulSoup
    from PAM30 import PAMIE
    from time import sleep

    url = 'http://www.cnn.com'
    ie = PAMIE(url)
    sleep(10)
    bs = BeautifulSoup(ie.getTextArea())
    #from here i want to add such like text extract function with use PAMIE and
    lxml or beautifulsoup.

    thanks for your help .


    in the cnn website's html source
    there i

    motoom wrote:
    >
    >
    > On 25 Oct 2009, at 08:06 , elca wrote:
    >
    >> because of javascript im trying to insist use PAMIE.

    >
    > I see, your problem is not with lxml or BeautifulSoup, but getting the
    > raw data in the first place.
    >
    >
    >> i want to extract some text in CNN website with 'CNN Shop'
    >> 'Site map' in bottom of CNN website page

    >
    > What text? Can you give an example? I'd like to be able to reproduce
    > it manually in the webbrowser so I get a clear idea what exactly
    > you're trying to achieve.
    >
    > Greetings,
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    >


    --
    View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26045766.html
    Sent from the Python - python-list mailing list archive at Nabble.com.
     
    elca, Oct 25, 2009
    #7
  8. elca

    User Guest

    On 25 Oct 2009, at 08:33 , elca wrote:

    > www.cnn.com in main website page.
    > for example ,if you see www.cnn.com's html source, maybe you can
    > find such
    > like line of html source.
    > http://www.turnerstoreonline.com/ CNN Shop
    > and for example if i want to extract 'CNN Shop' text in html source.


    So, if I understand you correctly, you want your program to do the
    following:

    1. Retrieve the http://cnn.com webpage
    2. Look for a link identified by the text "CNN Shop"
    3. Extract the URL for that link.

    The result would be http://www.turnerstoreonline.com

    Is that what you want?

    Greetings,
     
    User, Oct 25, 2009
    #8
  9. elca

    elca Guest

    hello,
    im very sorry my english.
    yes i want to extract this text 'CNN Shop' and linked page
    'http://www.turnerstoreonline.com'.
    thanks a lot!



    motoom wrote:
    >
    >
    > On 25 Oct 2009, at 08:33 , elca wrote:
    >
    >> www.cnn.com in main website page.
    >> for example ,if you see www.cnn.com's html source, maybe you can
    >> find such
    >> like line of html source.
    >> http://www.turnerstoreonline.com/ CNN Shop
    >> and for example if i want to extract 'CNN Shop' text in html source.

    >
    > So, if I understand you correctly, you want your program to do the
    > following:
    >
    > 1. Retrieve the http://cnn.com webpage
    > 2. Look for a link identified by the text "CNN Shop"
    > 3. Extract the URL for that link.
    >
    > The result would be http://www.turnerstoreonline.com
    >
    > Is that what you want?
    >
    > Greetings,
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    >


    --
    View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26045811.html
    Sent from the Python - python-list mailing list archive at Nabble.com.
     
    elca, Oct 25, 2009
    #9
  10. elca wrote:

    > yes i want to extract this text 'CNN Shop' and linked page
    > 'http://www.turnerstoreonline.com'.


    Well then.
    First, we'll get the page using urrlib2:

    doc=urllib2.urlopen("http://www.cnn.com")

    Then we'll feed it into the HTML parser:

    soup=BeautifulSoup(doc)

    Next, we'll look at all the links in the page:

    for a in soup.findAll("a"):

    and when a link has the text 'CNN Shop', we have a hit,
    and print the URL:

    if a.renderContents()=="CNN Shop":
    print a["href"]


    The complete program is thus:

    import urllib2
    from BeautifulSoup import BeautifulSoup

    doc=urllib2.urlopen("http://www.cnn.com")
    soup=BeautifulSoup(doc)
    for a in soup.findAll("a"):
    if a.renderContents()=="CNN Shop":
    print a["href"]


    The example above can be condensed because BeautifulSoup's find function
    can also look for texts:

    print soup.find("a",text="CNN Shop")

    and since that's a navigable string, we can ascend to its parent and
    display the href attribute:

    print soup.find("a",text="CNN Shop").findParent()["href"]

    So eventually the whole program could be collapsed into one line:

    print
    BeautifulSoup(urllib2.urlopen("http://www.cnn.com")).find("a",text="CNN
    Shop").findParent()["href"]

    ....but I think this is very ugly!


    > im very sorry my english.


    You English is quite understandable. The hard part is figuring out what
    exactly you wanted to achieve ;-)

    I have a question too. Why did you think JavaScript was necessary to
    arrive at this result?

    Greetings,
     
    Michiel Overtoom, Oct 25, 2009
    #10
  11. elca, 25.10.2009 08:46:
    > im very sorry my english.


    It's fairly common in this news-group that people do not have a good level
    of English, so that's perfectly ok. But you should try to provide more
    information in your posts. Be explicit about what you tried and what failed
    (and how!), and provide short code examples and exact copies of failure
    messages whenever possible. That will help others in understanding what is
    going on on your side. Remember that we can't look at your screen, nor read
    your mind.

    Oh, and please don't top-post in replies.

    Stefan
     
    Stefan Behnel, Oct 25, 2009
    #11
  12. elca

    elca Guest

    Hello,
    thanks for your reply.
    actually what i want to parse website is some different language site.
    so i was quote some common english website for easy understand. :)
    by the way, is it possible to use with PAMIE and beautifulsoup work
    together?
    Thanks a lot



    motoom wrote:
    >
    > elca wrote:
    >
    >> yes i want to extract this text 'CNN Shop' and linked page
    >> 'http://www.turnerstoreonline.com'.

    >
    > Well then.
    > First, we'll get the page using urrlib2:
    >
    > doc=urllib2.urlopen("http://www.cnn.com")
    >
    > Then we'll feed it into the HTML parser:
    >
    > soup=BeautifulSoup(doc)
    >
    > Next, we'll look at all the links in the page:
    >
    > for a in soup.findAll("a"):
    >
    > and when a link has the text 'CNN Shop', we have a hit,
    > and print the URL:
    >
    > if a.renderContents()=="CNN Shop":
    > print a["href"]
    >
    >
    > The complete program is thus:
    >
    > import urllib2
    > from BeautifulSoup import BeautifulSoup
    >
    > doc=urllib2.urlopen("http://www.cnn.com")
    > soup=BeautifulSoup(doc)
    > for a in soup.findAll("a"):
    > if a.renderContents()=="CNN Shop":
    > print a["href"]
    >
    >
    > The example above can be condensed because BeautifulSoup's find function
    > can also look for texts:
    >
    > print soup.find("a",text="CNN Shop")
    >
    > and since that's a navigable string, we can ascend to its parent and
    > display the href attribute:
    >
    > print soup.find("a",text="CNN Shop").findParent()["href"]
    >
    > So eventually the whole program could be collapsed into one line:
    >
    > print
    > BeautifulSoup(urllib2.urlopen("http://www.cnn.com")).find("a",text="CNN
    > Shop").findParent()["href"]
    >
    > ...but I think this is very ugly!
    >
    >
    > > im very sorry my english.

    >
    > You English is quite understandable. The hard part is figuring out what
    > exactly you wanted to achieve ;-)
    >
    > I have a question too. Why did you think JavaScript was necessary to
    > arrive at this result?
    >
    > Greetings,
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    >


    --
    View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26045979.html
    Sent from the Python - python-list mailing list archive at Nabble.com.
     
    elca, Oct 25, 2009
    #12
  13. elca wrote:

    > actually what i want to parse website is some different language site.


    A different website? What website? What text? Please show your actual
    use case, instead of smokescreens.


    > so i was quote some common english website for easy understand. :)


    And, did you learn something from it? Were you able to apply the
    technique to the other website?


    > by the way, is it possible to use with PAMIE and beautifulsoup work
    > together?


    If you define 'working together' as like 'PAMIE produces a HTML text and
    BeautifulSoup parses it', then maybe yes.

    Greetings,

    --
    "The ability of the OSS process to collect and harness
    the collective IQ of thousands of individuals across
    the Internet is simply amazing." - Vinod Valloppillil
    http://www.catb.org/~esr/halloween/halloween4.html
     
    Michiel Overtoom, Oct 25, 2009
    #13
  14. elca

    elca Guest

    Hello,
    actually what i want is,
    if you run my script you can reach this page
    'http://news.search.naver.com/search.naver?sm=tab_hty&where=news&query=korea+times&x=0&y=0'
    that is korea portal site and i was search keyword using 'korea times'
    and i want to scrap resulted to text name with 'blogscrap_save.txt'
    if you run this script ,you can see
    following article

    "Yesan County: How do you like them apples?
    코리아헤럴드 |
    carp fishing at the Yedang Reservoir -
    Korea`s biggest - taking a nice stroll...
    During the curator`s recitation of Yun`s life and times as a resistance
    and freedom fighter,
    he would emphsize random ...
    "

    and also can see following article and so on ....
    "
    10,000 Nepalese Diaspora Emerging in Korea
    코리아타임스 세계 | 2009.10.23 (금) 오후 9:31
    Although the Nepalese community in Korea is worker dominated,
    there are... yoga is popular among Nepalese. These festivals are the
    times when expatriate Nepalese feel nostalgic for their... "

    so actual process to scrap site is,
    first i want to use keyword and want to save resulted article with only
    text.


    i was attached currently im making script but not so much good and can't
    work well.
    especially extract part is really hard for novice,such like for me :)
    thanks in advance..




    http://www.nabble.com/file/p26046215/untitled-1.py untitled-1.py


    motoom wrote:
    >
    > elca wrote:
    >
    >> actually what i want to parse website is some different language site.

    >
    > A different website? What website? What text? Please show your actual
    > use case, instead of smokescreens.
    >
    >
    >> so i was quote some common english website for easy understand. :)

    >
    > And, did you learn something from it? Were you able to apply the
    > technique to the other website?
    >
    >
    >> by the way, is it possible to use with PAMIE and beautifulsoup work
    >> together?

    >
    > If you define 'working together' as like 'PAMIE produces a HTML text and
    > BeautifulSoup parses it', then maybe yes.
    >
    > Greetings,
    >
    > --
    > "The ability of the OSS process to collect and harness
    > the collective IQ of thousands of individuals across
    > the Internet is simply amazing." - Vinod Valloppillil
    > http://www.catb.org/~esr/halloween/halloween4.html
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    >


    --
    View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26046215.html
    Sent from the Python - python-list mailing list archive at Nabble.com.
     
    elca, Oct 25, 2009
    #14
  15. elca

    paul Guest

    elca schrieb:
    > Hello,

    Hi,

    > following is script source which can beautifulsoup and PAMIE work together.
    > but if i run this script source error was happened.
    >
    > AttributeError: PAMIE instance has no attribute 'pageText'
    > File "C:\test12.py", line 7, in <module>
    > bs = BeautifulSoup(ie.pageText())

    You could execute the script line by line in the python console, then
    after the line "ie = PAMIE(url)" look at the "ie" object with "dir(ie)"
    to check if it really looks like a healthy instance. ...got bored, just
    tried it -- looks like pageText() has been renamed to getPageText().
    Try:
    text = PAMIE('http://www.cnn.com').getPageText()

    cheers
    Paul

    >
    > and following is orginal source until i was found in internet.
    >
    > from BeautifulSoup import BeautifulSoup
    > from PAM30 import PAMIE
    > url = 'http://www.cnn.com'
    > ie = PAMIE(url)
    > bs = BeautifulSoup(ie.pageText())
    >
    > if possible i really want to make it work together with beautifulsoup or
    > lxml with PAMIE.
    > sorry my bad english.
    > thanks in advance.
    >
    >
    >
    >
    >
    >
    > Stefan Behnel-3 wrote:
    >> Hi,
    >>
    >> elca, 25.10.2009 02:35:
    >>> hello...
    >>> if anyone know..please help me !
    >>> i really want to know...i was searched in google lot of time.
    >>> but can't found clear soultion. and also because of my lack of python
    >>> knowledge.
    >>> i want to use IE.navigate function with beautifulsoup or lxml..
    >>> if anyone know about this or sample.
    >>> please help me!
    >>> thanks in advance ..

    >> You wrote a message with nine lines, only one of which gives a tiny hint
    >> on
    >> what you actually want to do. What about providing an explanation of what
    >> you want to achieve instead? Try to answer questions like: Where does your
    >> data come from? Is it XML or HTML? What do you want to do with it?
    >>
    >> This might help:
    >>
    >> http://www.catb.org/~esr/faqs/smart-questions.html
    >>
    >> Stefan
    >> --
    >> http://mail.python.org/mailman/listinfo/python-list
    >>
    >>

    >
     
    paul, Oct 25, 2009
    #15
  16. elca

    elca Guest

    Hi,
    thanks a lot.
    studying alone is tough thing :)
    how can i improve my skill...


    paul kölle wrote:
    >
    > elca schrieb:
    >> Hello,

    > Hi,
    >
    >> following is script source which can beautifulsoup and PAMIE work
    >> together.
    >> but if i run this script source error was happened.
    >>
    >> AttributeError: PAMIE instance has no attribute 'pageText'
    >> File "C:\test12.py", line 7, in <module>
    >> bs = BeautifulSoup(ie.pageText())

    > You could execute the script line by line in the python console, then
    > after the line "ie = PAMIE(url)" look at the "ie" object with "dir(ie)"
    > to check if it really looks like a healthy instance. ...got bored, just
    > tried it -- looks like pageText() has been renamed to getPageText().
    > Try:
    > text = PAMIE('http://www.cnn.com').getPageText()
    >
    > cheers
    > Paul
    >
    >>
    >> and following is orginal source until i was found in internet.
    >>
    >> from BeautifulSoup import BeautifulSoup
    >> from PAM30 import PAMIE
    >> url = 'http://www.cnn.com'
    >> ie = PAMIE(url)
    >> bs = BeautifulSoup(ie.pageText())
    >>
    >> if possible i really want to make it work together with beautifulsoup or
    >> lxml with PAMIE.
    >> sorry my bad english.
    >> thanks in advance.
    >>
    >>
    >>
    >>
    >>
    >>
    >> Stefan Behnel-3 wrote:
    >>> Hi,
    >>>
    >>> elca, 25.10.2009 02:35:
    >>>> hello...
    >>>> if anyone know..please help me !
    >>>> i really want to know...i was searched in google lot of time.
    >>>> but can't found clear soultion. and also because of my lack of python
    >>>> knowledge.
    >>>> i want to use IE.navigate function with beautifulsoup or lxml..
    >>>> if anyone know about this or sample.
    >>>> please help me!
    >>>> thanks in advance ..
    >>> You wrote a message with nine lines, only one of which gives a tiny hint
    >>> on
    >>> what you actually want to do. What about providing an explanation of
    >>> what
    >>> you want to achieve instead? Try to answer questions like: Where does
    >>> your
    >>> data come from? Is it XML or HTML? What do you want to do with it?
    >>>
    >>> This might help:
    >>>
    >>> http://www.catb.org/~esr/faqs/smart-questions.html
    >>>
    >>> Stefan
    >>> --
    >>> http://mail.python.org/mailman/listinfo/python-list
    >>>
    >>>

    >>

    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    >


    --
    View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26046638.html
    Sent from the Python - python-list mailing list archive at Nabble.com.
     
    elca, Oct 25, 2009
    #16
  17. elca

    paul Guest

    elca schrieb:
    > Hi,
    > thanks a lot.
    > studying alone is tough thing :)
    > how can i improve my skill...

    1. Stop top-posting.
    2. Read documentation
    3. Use the interactive prompt

    cheers
    Paul

    >
    >
    > paul kölle wrote:
    >> elca schrieb:
    >>> Hello,

    >> Hi,
    >>
    >>> following is script source which can beautifulsoup and PAMIE work
    >>> together.
    >>> but if i run this script source error was happened.
    >>>
    >>> AttributeError: PAMIE instance has no attribute 'pageText'
    >>> File "C:\test12.py", line 7, in <module>
    >>> bs = BeautifulSoup(ie.pageText())

    >> You could execute the script line by line in the python console, then
    >> after the line "ie = PAMIE(url)" look at the "ie" object with "dir(ie)"
    >> to check if it really looks like a healthy instance. ...got bored, just
    >> tried it -- looks like pageText() has been renamed to getPageText().
    >> Try:
    >> text = PAMIE('http://www.cnn.com').getPageText()
    >>
    >> cheers
    >> Paul
    >>
    >>> and following is orginal source until i was found in internet.
    >>>
    >>> from BeautifulSoup import BeautifulSoup
    >>> from PAM30 import PAMIE
    >>> url = 'http://www.cnn.com'
    >>> ie = PAMIE(url)
    >>> bs = BeautifulSoup(ie.pageText())
    >>>
    >>> if possible i really want to make it work together with beautifulsoup or
    >>> lxml with PAMIE.
    >>> sorry my bad english.
    >>> thanks in advance.
    >>>
    >>>
    >>>
    >>>
    >>>
    >>>
    >>> Stefan Behnel-3 wrote:
    >>>> Hi,
    >>>>
    >>>> elca, 25.10.2009 02:35:
    >>>>> hello...
    >>>>> if anyone know..please help me !
    >>>>> i really want to know...i was searched in google lot of time.
    >>>>> but can't found clear soultion. and also because of my lack of python
    >>>>> knowledge.
    >>>>> i want to use IE.navigate function with beautifulsoup or lxml..
    >>>>> if anyone know about this or sample.
    >>>>> please help me!
    >>>>> thanks in advance ..
    >>>> You wrote a message with nine lines, only one of which gives a tiny hint
    >>>> on
    >>>> what you actually want to do. What about providing an explanation of
    >>>> what
    >>>> you want to achieve instead? Try to answer questions like: Where does
    >>>> your
    >>>> data come from? Is it XML or HTML? What do you want to do with it?
    >>>>
    >>>> This might help:
    >>>>
    >>>> http://www.catb.org/~esr/faqs/smart-questions.html
    >>>>
    >>>> Stefan
    >>>> --
    >>>> http://mail.python.org/mailman/listinfo/python-list
    >>>>
    >>>>

    >> --
    >> http://mail.python.org/mailman/listinfo/python-list
    >>
    >>

    >
     
    paul, Oct 25, 2009
    #17
  18. elca

    elca Guest

    paul kölle wrote:
    >
    > elca schrieb:
    >> Hi,
    >> thanks a lot.
    >> studying alone is tough thing :)
    >> how can i improve my skill...

    > 1. Stop top-posting.
    > 2. Read documentation
    > 3. Use the interactive prompt
    >
    > cheers
    > Paul
    >
    >>
    >>
    >> paul kölle wrote:
    >>> elca schrieb:
    >>>> Hello,
    >>> Hi,
    >>>
    >>>> following is script source which can beautifulsoup and PAMIE work
    >>>> together.
    >>>> but if i run this script source error was happened.
    >>>>
    >>>> AttributeError: PAMIE instance has no attribute 'pageText'
    >>>> File "C:\test12.py", line 7, in <module>
    >>>> bs = BeautifulSoup(ie.pageText())
    >>> You could execute the script line by line in the python console, then
    >>> after the line "ie = PAMIE(url)" look at the "ie" object with "dir(ie)"
    >>> to check if it really looks like a healthy instance. ...got bored, just
    >>> tried it -- looks like pageText() has been renamed to getPageText().
    >>> Try:
    >>> text = PAMIE('http://www.cnn.com').getPageText()
    >>>
    >>> cheers
    >>> Paul
    >>>
    >>>> and following is orginal source until i was found in internet.
    >>>>
    >>>> from BeautifulSoup import BeautifulSoup
    >>>> from PAM30 import PAMIE
    >>>> url = 'http://www.cnn.com'
    >>>> ie = PAMIE(url)
    >>>> bs = BeautifulSoup(ie.pageText())
    >>>>
    >>>> if possible i really want to make it work together with beautifulsoup
    >>>> or
    >>>> lxml with PAMIE.
    >>>> sorry my bad english.
    >>>> thanks in advance.
    >>>>
    >>>>
    >>>>
    >>>>
    >>>>
    >>>>
    >>>> Stefan Behnel-3 wrote:
    >>>>> Hi,
    >>>>>
    >>>>> elca, 25.10.2009 02:35:
    >>>>>> hello...
    >>>>>> if anyone know..please help me !
    >>>>>> i really want to know...i was searched in google lot of time.
    >>>>>> but can't found clear soultion. and also because of my lack of python
    >>>>>> knowledge.
    >>>>>> i want to use IE.navigate function with beautifulsoup or lxml..
    >>>>>> if anyone know about this or sample.
    >>>>>> please help me!
    >>>>>> thanks in advance ..
    >>>>> You wrote a message with nine lines, only one of which gives a tiny
    >>>>> hint
    >>>>> on
    >>>>> what you actually want to do. What about providing an explanation of
    >>>>> what
    >>>>> you want to achieve instead? Try to answer questions like: Where does
    >>>>> your
    >>>>> data come from? Is it XML or HTML? What do you want to do with it?
    >>>>>
    >>>>> This might help:
    >>>>>
    >>>>> http://www.catb.org/~esr/faqs/smart-questions.html
    >>>>>
    >>>>> Stefan
    >>>>> --
    >>>>> http://mail.python.org/mailman/listinfo/python-list
    >>>>>
    >>>>>
    >>> --
    >>> http://mail.python.org/mailman/listinfo/python-list
    >>>
    >>>

    >>

    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    >



    hello,
    im sorry ,also im not familiar with newsgroup.
    so this position is bottom-posting position?
    if wrong correct me..
    thanks , in addition i was testing just before you sent

    text = PAMIE('http://www.naver.com').getPageText()
    i have some question...
    how can i keep open only one windows? not open several windows.
    following is my scenario.
    after open www.cnn.com i want to go
    http://www.cnn.com/2009/US/10/24/teen.jane.doe/index.html
    with keep only one windows.

    text = PAMIE('http://www.cnn.com').getPageText()
    sleep(5)
    text = PAMIE('http://www.cnn.com/2009/US/10/24/teen.jane.doe/index.html')
    thanks in advance :)


    --
    View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26046897.html
    Sent from the Python - python-list mailing list archive at Nabble.com.
     
    elca, Oct 25, 2009
    #18
  19. elca wrote:

    > im sorry ,also im not familiar with newsgroup.


    It's not a newsgroup, but a mailing list. And if you're new to a certain
    community you're not familiar with, it's best to lurk a few days to see
    how it is used.


    > so this position is bottom-posting position?


    It is, but you should also cut away any quoted text that is not directly
    related to the answer.
    Otherwise people have to scroll many screens full of text before they
    can see the answer.


    > how can i keep open only one windows? not open several windows.


    The trick is to not instantiate multiple PAMIE objects, but only once,
    and reuse that.
    Like:

    import time
    import PAM30
    ie=PAM30.PAMIE( )

    ie.navigate("http://www.cnn.com")
    text1=ie.getPageText()

    ie.navigate("http://www.nu.nl")
    text2=ie.getPageText()

    ie.quit()
    print len(text1), len(text2)


    But still I think it's unnecessary to use Internet Explorer to get
    simple web pages.
    The standard library "urllib2.urlopen()" works just as well, and doesn't
    rely on Internet Explorer to be present.

    Greetings,


    --
    "The ability of the OSS process to collect and harness
    the collective IQ of thousands of individuals across
    the Internet is simply amazing." - Vinod Valloppillil
    http://www.catb.org/~esr/halloween/halloween4.html
     
    Michiel Overtoom, Oct 25, 2009
    #19
  20. Michiel Overtoom wrote:
    > elca wrote:
    >
    >> im sorry ,also im not familiar with newsgroup.

    >
    > It's not a newsgroup, but a mailing list. And if you're new to a certain
    > community you're not familiar with, it's best to lurk a few days to see
    > how it is used.


    Pot. Kettle. Black.
    comp.lang.python really is a usenet news group. There is a mailing list that mirrors the
    newsgroup though.

    -irmen
     
    Irmen de Jong, Oct 25, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. chris
    Replies:
    5
    Views:
    598
    chris
    Dec 20, 2004
  2. Sibylle Koczian

    win32com: use not possible as normal user

    Sibylle Koczian, Aug 12, 2005, in forum: Python
    Replies:
    2
    Views:
    496
    Tim Roberts
    Aug 13, 2005
  3. vithi
    Replies:
    0
    Views:
    295
    vithi
    Feb 17, 2007
  4. vithi
    Replies:
    0
    Views:
    265
    vithi
    Feb 17, 2007
  5. MRAB
    Replies:
    0
    Views:
    253
Loading...

Share This Page