issues with htmlparser.getpos

Discussion in 'Python' started by dysmas, Jul 4, 2007.

  1. dysmas

    dysmas Guest

    Hi,


    Im having an issue with HTMLParser, the getpos() funtion sometimes
    returns things like :

    (1, 1247)
    (1, 2114)
    (1, 2168)
    (1, 2228)
    (1, 2295)
    (1, 2382)
    (1, 2441)
    (1, 2963)
    (1, 3040)

    i guess this is because the HTMLParser has not correctly parsed the
    newline characters in the string fed to it... is there a workaround
    for this, without checking the string every time i feed it some data?
     
    dysmas, Jul 4, 2007
    #1
    1. Advertising

  2. dysmas

    Steve Holden Guest

    dysmas wrote:
    > Hi,
    >
    >
    > Im having an issue with HTMLParser, the getpos() funtion sometimes
    > returns things like :
    >
    > (1, 1247)
    > (1, 2114)
    > (1, 2168)
    > (1, 2228)
    > (1, 2295)
    > (1, 2382)
    > (1, 2441)
    > (1, 2963)
    > (1, 3040)
    >
    > i guess this is because the HTMLParser has not correctly parsed the
    > newline characters in the string fed to it... is there a workaround
    > for this, without checking the string every time i feed it some data?
    >

    Have you verified that these results aren't correct? There is no
    requirements for newlines in HTML, and some computer-generated pages
    don't bother to insert them.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://del.icio.us/steve.holden
    --------------- Asciimercial ------------------
    Get on the web: Blog, lens and tag the Internet
    Many services currently offer free registration
    ----------- Thank You for Reading -------------
     
    Steve Holden, Jul 4, 2007
    #2
    1. Advertising

  3. dysmas

    Guest

    Steve,

    thanks for reply

    there are newlines present, it looks like the files in question are
    from a mac, (my text editor tells me they are UTF8 & use CR for
    marking newlines)

    Cheers
     
    , Jul 4, 2007
    #3
  4. dysmas

    Guest

    On Jul 4, 1:47 pm, wrote:
    > Steve,
    >
    > thanks for reply
    >
    > there are newlines present, it looks like the files in question are
    > from a mac, (my text editor tells me they are UTF8 & use CR for
    > marking newlines)
    >
    > Cheers


    d0h,

    f = open(this_file,"U")
    ^^^^
    \ this fixed it

    cheers anyway ;)
     
    , Jul 4, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joe Johnson
    Replies:
    0
    Views:
    2,769
    Joe Johnson
    Apr 2, 2005
  2. Tan Vu Ngoc

    HTMLParser solution!

    Tan Vu Ngoc, Nov 18, 2003, in forum: Java
    Replies:
    0
    Views:
    375
    Tan Vu Ngoc
    Nov 18, 2003
  3. JavaJug

    Swing HTMLParser problem

    JavaJug, Jul 26, 2004, in forum: Java
    Replies:
    3
    Views:
    494
    JavaJug
    Jul 26, 2004
  4. mike
    Replies:
    0
    Views:
    331
  5. mike
    Replies:
    0
    Views:
    890
Loading...

Share This Page