different string representation (buffer gap)

Discussion in 'Python' started by manders2k, Feb 3, 2004.

  1. manders2k

    manders2k Guest

    Hi all --

    I'm contemplating the idea of writing a simple emacs-like editor in
    python (for fun and the experience of doing so). In reading through
    Craig Finseth's "The Craft of Text Editing":

    http://www.finseth.com/~fin/craft/

    , I've come across the "buffer gap" representation for the text data
    of the buffer. Very briefly, it keeps the unallocated memory of the
    character array at the editing point, so that as long as there is
    memory available, an insert/delete is very low (constant) cost. Of
    course, moving the editing point means copying some character data so
    that the gap moves with you, but...

    Anyway, I'm wondering what straightforward ways to leverage /
    implement this representation in Python. Ideally, it would be great
    if one could use a BufferGap class in all the places you'ld use a
    python string transparently, to use standard regular expressions, for
    example. Glancing quickly at the regexmodule.c, and its use of
    PyString_Whatever, I'm not certain this is easy to do efficiently
    (must one copy the buffer's contents into a Python-native string
    before one can use something like a regular expression match on the
    buffer's contents?).

    Anyone have ideas / suggestions on how one would represent an editing
    buffer in a way that would remain most (transparently) compatible with
    the Python standard library string operations (and yet remain
    efficient for editing)? If one is embedding the interpreter in the
    editor (or writing the editor in pure python), and using python for
    editor extensibility, it seems desireable to keep complexity down for
    extension writers , and to allow them to think of the buffer as a
    string.

    Thanks...
    manders2k, Feb 3, 2004
    #1
    1. Advertising

  2. [snip]

    My advice: find some pre-existing editor component and adapt it to suit
    your needs. Writing an editor component, whether it be for a GUI or
    console, is a pain in the ass.

    If you are looking for a GUI editor component, stick with wxPython and
    the wxStyledTextCtrl, it is a binding to the fabulous scintilla editing
    component: http://scintilla.org

    I have no advice for a console editor, but I would suggest you take a
    look at curses or ncurses, whichever one is the open source version.

    Once you have that thing embedded in your application, the rest should
    basically write itself. How do I know? Because I wrote an editor with
    it myself: http://pype.sourceforge.net


    - Josiah
    Josiah Carlson, Feb 4, 2004
    #2
    1. Advertising

  3. manders2k

    Paul Rubin Guest

    (manders2k) writes:
    > Anyway, I'm wondering what straightforward ways to leverage /
    > implement this representation in Python. Ideally, it would be great
    > if one could use a BufferGap class in all the places you'ld use a
    > python string transparently, to use standard regular expressions, for
    > example. Glancing quickly at the regexmodule.c, and its use of
    > PyString_Whatever, I'm not certain this is easy to do efficiently
    > (must one copy the buffer's contents into a Python-native string
    > before one can use something like a regular expression match on the
    > buffer's contents?).


    You can do some of that with the array module, but Python's regexp
    library doesn't give any way to search backwards for a regexp, so that's
    another problem you'll face trying to write an editor in Python.
    Paul Rubin, Feb 4, 2004
    #3
  4. manders2k

    manders2k Guest

    Josiah Carlson <> wrote in message news:<bvppt5$81a$>...
    > My advice: find some pre-existing editor component and adapt it to suit
    > your needs. Writing an editor component, whether it be for a GUI or
    > console, is a pain in the ass.


    :) It might be a pain in the ass, but it sounds like the most
    edifying (and probably the most fun) part of the process to me.

    Reading up on what's involved, getting a basic editor put together
    sounds actually quite easy. A few tens of hours of work, maybe.
    Making the editor feature-rich, extending it in a substantial way,
    well doesn't sound hard just work-intensive.

    Thanks for the pointers though...
    manders2k, Feb 4, 2004
    #4
  5. manders2k

    manders2k Guest

    Paul Rubin <http://> wrote in message news:<>...
    > You can do some of that with the array module, but Python's regexp
    > library doesn't give any way to search backwards for a regexp, so that's
    > another problem you'll face trying to write an editor in Python.


    Yeah, I have a feeling that it might be easier to code up the buffer
    in C/C++, and embed it in the interpreter. I'm not sure how much of a
    performance bottleneck having this very low-level component written in
    python will be on modern machines; probably not such a big deal.
    Writing a buffer class and fiddling with pointers and whatnot actually
    sounds easier to do in C++ than in emulating this style of thing in
    Python (then again, I'm a heck of a lot more comfortable with C++ than
    Python at this point, so that might not speak to the difficulty of the
    task).

    What I guess I wish were the case is that I could implement the
    "string interface" on my BufferGap, so that everywhere that Python (at
    the C API level) expects a string, a BufferGap could be used instead.
    That way, all the libraries that inspect and operate on strings would
    work transparently, without having to be recoded (copy / paste, end up
    with a lot of mostly identical, redundant code) to operate on this
    other string representation. Maybe this just isn't possible with the
    current C-Python implementation. I suspect it would be with many
    possible C++-Python implementations, but we don't have one of those
    lying around so...

    I'm pretty sure that for modest size buffers (even a megabyte of
    text), copying the contents of the buffer into a python-string
    representation before operating on it with python-libraries would be
    transparently fast. It just seems...wasteful, and potentially very
    bad news if someone ever tried to do a regex search on a buffer that
    occupied more than half of the physical memory of the machine or
    somesuch.
    manders2k, Feb 4, 2004
    #5
  6. manders2k

    Neil Hodgson Guest

    manders2k:

    > I'm not sure how much of a
    > performance bottleneck having this very low-level component written in
    > python will be on modern machines; probably not such a big deal.


    The performance bottleneck in split buffers is often the cost of copying
    array ranges. I once wrote a patch for Python's array class to provide
    copying within an array but the patch contents didn't make it to SourceForge
    and I haven't had time to follow it up.

    http://mail.python.org/pipermail/patches/2003-April/012043.html

    > Writing a buffer class and fiddling with pointers and whatnot actually
    > sounds easier to do in C++ than in emulating this style of thing in
    > Python (then again, I'm a heck of a lot more comfortable with C++ than
    > Python at this point, so that might not speak to the difficulty of the
    > task).


    Split buffers don't need to use pointers. I have written several split
    buffer implementations including

    * the implementation in Scintilla (scintilla/src/CellBuffer.[h,cxx])
    http://cvs.sourceforge.net/viewcvs.py/scintilla/scintilla/

    * a templated C++ implementation
    http://mailman.lyra.org/pipermail/scintilla-interest/2002-March/000903.html

    * a generic implementation that is part of my SinkWorld project written in a
    subset of C++ that can be automatically translated into Java or C#
    http://cvs.sourceforge.net/viewcvs.py/scintilla/sinkworld/

    Also in SinkWorld is a split buffer based data structure for partitioning
    a document into segments such as lines called lv which is in lv.h. While the
    line starts could be stored in a standard split buffer, inserting text would
    then lead to adding to all following line start positions. To fix this,
    there is also a 'step', with all positions after the step position adding
    the step value to their values. The step is moved to the position where text
    is being inserted or deleted but due to locality of modification, the move
    is mostly short.

    > What I guess I wish were the case is that I could implement the
    > "string interface" on my BufferGap, so that everywhere that Python (at
    > the C API level) expects a string, a BufferGap could be used instead.


    IIRC, at one stage there was explicit support in Python (perhaps in the
    buffer class) for multiple segment buffers but it was never used so has
    probably rotted.

    > That way, all the libraries that inspect and operate on strings would
    > work transparently, without having to be recoded (copy / paste, end up
    > with a lot of mostly identical, redundant code) to operate on this
    > other string representation.


    I'd like to see this implemented and have been meaning to look into it
    myself.

    Neil
    Neil Hodgson, Feb 5, 2004
    #6
  7. "manders2k" <> wrote in message

    [snip]
    > What I guess I wish were the case is that I could implement the
    > "string interface" on my BufferGap, so that everywhere that Python (at
    > the C API level) expects a string, a BufferGap could be used instead.
    > That way, all the libraries that inspect and operate on strings would
    > work transparently, without having to be recoded (copy / paste, end up
    > with a lot of mostly identical, redundant code) to operate on this
    > other string representation. Maybe this just isn't possible with the
    > current C-Python implementation.


    Have you looked at the mmap standard module?
    http://www.python.org/doc/current/lib/module-mmap.html
    "Memory-mapped file objects behave like both strings and like file objects.
    Unlike normal string objects, however, these are mutable. You can use mmap
    objects in most places where strings are expected; for example, you can use
    the re module to search through a memory-mapped file"

    Michael
    Michael Spencer, Feb 11, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. chlori

    Gap in IE, no gap in FF

    chlori, Jan 19, 2006, in forum: HTML
    Replies:
    1
    Views:
    436
    kchayka
    Jan 19, 2006
  2. =?Utf-8?B?UmFqZXNoIHNvbmk=?=

    'System.String[]' from its string representation 'String[] Array'

    =?Utf-8?B?UmFqZXNoIHNvbmk=?=, May 4, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    1,786
    =?Utf-8?B?UmFqZXNoIHNvbmk=?=
    May 4, 2006
  3. Raja
    Replies:
    12
    Views:
    24,350
    John Harrison
    Jun 21, 2004
  4. Replies:
    2
    Views:
    590
    sergejusz
    Mar 26, 2007
  5. Earle Clubb
    Replies:
    2
    Views:
    86
    Robert Klemme
    Apr 24, 2007
Loading...

Share This Page