Patch to pydoc (partial) to handle encodings other than ascii

Discussion in 'Python' started by w.m.gardella.sambeth@gmail.com, May 29, 2007.

  1. Guest

    Hello Pythonists:
    I am using SPE as python IDE on Windows, with Python 2.5.1 installed
    (official distro). As my mother tongue is Spanish, I had documented
    some modules in it (I now, I should have documented all in English,
    except if I were 110% sure than nobody else would read my docs, but
    they are only for in-house use). When I tried to use the pydoc tab
    that SPE attaches to every source file, I only found a message saying
    that my accented text coud not be decoded.

    Browsing the SPE's sources, I found that pydoc's HTMLDoc class could
    not handle the non-ascii characters. Then patched the Doc's class (the
    parent of HTMLDoc) code to look for the encoding declared in the
    source of the module to document, and (in HTMLDoc) decode the source
    with it. As the HTML file writer function used the same class and
    choked when writing the file, reencoded the text with the same
    encoding on writing.

    As I could not find the mail of pydoc's maintainer (the source code
    states that the autor is Ka-Ping Yee, but the original date is from
    2001, and I could not find if he is still maintaining it), I want to
    make this patch available so can be possible to use pydoc on non-ascii
    sources (at least to generate programmatically HTML documentation). If
    the solution is useful (please don't hesitate in criticize it), may be
    can be incorporated on a future pydoc version.

    I don't know how to make a patch file (I usually don't do co-op
    programming, but use to code as a hobby), but of course I don't even
    think of sending 90 k of code to the newsgroup, so I am sending the
    modified code here, with the indication of where do the modifications:

    After line 323, replace

    if inspect.ismodule(object): return self.docmodule(*args)

    with:

    if inspect.ismodule(object):
    remarks = inspect.getcomments(object)
    start = remarks.find(' -*- coding: ') + 13
    if start == 12:
    start = remarks.find('# vim:fileencoding=') + 19
    if start == 18:
    if inspect.getsource(object)[:3] == '\xef\xbb\xbf':
    self.encoding = 'utf_8'
    else:
    self.encoding = sys.getdefaultencoding()
    else:
    end = remarks.find(' ', start)
    self.encoding = remarks[start:end]
    else:
    end = remarks.find('-*-', start)
    self.encoding = remarks[start:end].strip()
    return self.docmodule(*args)

    After the line 421 (moved to 437 with the previous insert), insert

    title = title.decode(self.encoding)
    contents = contents.decode(self.encoding)

    And finally replace line 1491 (now 1509):

    file.write(page)

    with:

    file.write(page.encode(html.encoding))

    The code don't solves the encoding issue on consoles (just try to
    document utf-8 sources and see what funny things appears!), but if the
    approach can help, may be something can be worked to use it in a
    general way (I just don't know hoy to get the console encoding, and I
    don't use consoles most of the time).
    Hope that this can help to some other non-ascii user like me.
    Cheers (and sorry for the english).
    Walter Gardella
    , May 29, 2007
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Eric Mathew Hopper

    pydoc patch for Subversion

    Eric Mathew Hopper, Jan 11, 2004, in forum: Python
    Replies:
    0
    Views:
    346
    Eric Mathew Hopper
    Jan 11, 2004
  2. BartlebyScrivener

    pydoc script.py vs. pydoc scriptpy

    BartlebyScrivener, Oct 20, 2007, in forum: Python
    Replies:
    1
    Views:
    627
    Stargaming
    Oct 22, 2007
  3. News123
    Replies:
    0
    Views:
    285
    News123
    Feb 6, 2010
  4. News123
    Replies:
    0
    Views:
    335
    News123
    Feb 6, 2010
  5. Roedy Green

    A proposal to handle file encodings

    Roedy Green, Nov 22, 2012, in forum: Java
    Replies:
    31
    Views:
    897
    Peter J. Holzer
    Dec 2, 2012
Loading...

Share This Page