Looking for a design pattern

mathieu · Aug 10, 2009

Hi there,

Let say I am writing a word processor. It would be a bad design to
consider that a document is a std::vector of instances of a class
MyChar ? Where a MyChar would look like:

struct MyChar
{
char TheChar;
bool Bold;
bool Italic;
unsigned char Color[3];
};

This design is hardcoded (what if I need another attribute later on)
and it would be very difficult to look for a specify array of char. It
would also eat a lot of memory...
Is this something describe in the literature ? If yes, where can I
find documentation ?

Thanks,

Alf P. Steinbach · Aug 10, 2009

* mathieu:

Hi there,

Let say I am writing a word processor. It would be a bad design to
consider that a document is a std::vector of instances of a class
MyChar ? Where a MyChar would look like:

struct MyChar
{
char TheChar;
bool Bold;
bool Italic;
unsigned char Color[3];
};

This design is hardcoded (what if I need another attribute later on)
and it would be very difficult to look for a specify array of char. It
would also eat a lot of memory...
Is this something describe in the literature ? If yes, where can I
find documentation ?

FlyWeight pattern.

Cheers & hth.,

- Alf

mathieu · Aug 10, 2009

* mathieu:

Hi there,

Click to expand...

Let say I am writing a word processor. It would be a bad design to
consider that a document is a std::vector of instances of a class
MyChar ? Where a MyChar would look like:

Click to expand...

struct MyChar
{
char TheChar;
bool Bold;
bool Italic;
unsigned char Color[3];
};

Click to expand...

This design is hardcoded (what if I need another attribute later on)
and it would be very difficult to look for a specify array of char. It
would also eat a lot of memory...
Is this something describe in the literature ? If yes, where can I
find documentation ?

Click to expand...

FlyWeight pattern.

I cannot believe :

1. I got an answer in a couple of minutes,
2. You managed to decipher the description of my issue,
3. There is actually a pattern very well detailed, and with a very
intuitive name

Thanks Alf !

Michael Doubez · Aug 10, 2009

Let say I am writing a word processor. It would be a bad design to
consider that a document is a std::vector of instances of a class
MyChar ? Where a MyChar would look like:

struct MyChar
{
char TheChar;
bool Bold;
bool Italic;
unsigned char Color[3];
};

This design is hardcoded (what if I need another attribute later on)
and it would be very difficult to look for a specify array of char. It
would also eat a lot of memory...
Is this something describe in the literature ? If yes, where can I
find documentation ?

There are many ways to tackle this problem and they depend on your
needs.
You could have a look into some open source projects which do what you
want and see how they did it.

Googling for 'how to design a text editor' also gives interesting
results.

Please come back when you have a C++ language question.

Jerry Coffin · Aug 10, 2009

Hi there,

Let say I am writing a word processor. It would be a bad design to
consider that a document is a std::vector of instances of a class
MyChar ? Where a MyChar would look like:

struct MyChar
{
char TheChar;
bool Bold;
bool Italic;
unsigned char Color[3];
};

This design is hardcoded (what if I need another attribute later on)
and it would be very difficult to look for a specify array of char. It
would also eat a lot of memory...
Is this something describe in the literature ? If yes, where can I
find documentation ?

FlyWeight has already been mentioned, but it's really only part of
the answer.

IMO, you should take a look at the general design of CSS and/or ODF.
They're obviously not identical, but they have some characteristics
on common, and give an idea of how this general type of problem has
been solved (with a reasonable degree of success, I might add)
already.

The general idea can be described fairly simply though: a couple of
extra levels of indirection, at least one of them explicitly visible
to the user. Specifically, you want to allow the user to put together
a "style sheet" that describes characteristics of some text (font,
color, spacing, etc.) then you want to let them apply that style to
lots of different text. The user can then format most of his/her
document(s) based on a logical style rather than always separately
editing the individual attributes. For example, you could start by
stealing a few "styles" from HTML, and having H1, H2, H3 and H4.

Much like a browser, give each of these a default style -- but let
the user edit styles, so if the user wants H1 to be 32 point Bodoni
bold, and H2 to be 27.5 point Castellar MT black, s/he can do that.
Then when they change their mind and decide those should both be
Arial bold, in 30 and 20 points respectively, they can easily do that
-- and when they change the style, every instance of H1 or H2
throughout their document will automatically change together.

Likewise, of course, you want to let the user define new styles, so
the styles can and will reflect the logical structure of the document
they're creating, in the way they think about things. The style one
user calls "chemical formula" might happen to have identical
attributes to another user's "algorithm", but they're still separate
styles that happen (at least at some particular moment) to look the
same.

Alf P. Steinbach · Aug 10, 2009

* Michael Doubez:

Let say I am writing a word processor. It would be a bad design to
consider that a document is a std::vector of instances of a class
MyChar ? Where a MyChar would look like:

struct MyChar
{
char TheChar;
bool Bold;
bool Italic;
unsigned char Color[3];
};

Click to expand...

Inserting and deleting in a vector<> for an editor is already a
problem.

No, it's the preferred and fastestest for an editor. The idiomatic choice for
those who have made some editors from scratch (not just using an edit
control/widget). Look up "cursor gap" in Wikipedia or wherever.

For a word processor it's a different matter.

Cheers & hth.,

- Alf

Michael DOUBEZ · Aug 10, 2009

Alf P. Steinbach a écrit :

* Michael Doubez:

Let say I am writing a word processor. It would be a bad design to
consider that a document is a std::vector of instances of a class
MyChar ? Where a MyChar would look like:

struct MyChar
{
char TheChar;
bool Bold;
bool Italic;
unsigned char Color[3];
};

Click to expand...

Inserting and deleting in a vector<> for an editor is already a
problem.

Click to expand...

No, it's the preferred and fastestest for an editor.

I thought it was the scratch buffer (like in ed or vi).

The idiomatic
choice for those who have made some editors from scratch (not just using
an edit control/widget).Look up "cursor gap" in Wikipedia or wherever.

I had a look on wikipedia ... as clear as the bottom of my socks. But it
looks interesting. Definitely will get a look at "The Craft of Text
Editing" (if I can find a working link).

Jerry Coffin · Aug 10, 2009

[ ... ]

I had a look on wikipedia ... as clear as the bottom of my socks.

The idea is pretty simple, really. When you're editing a text file,
you separate the file into two pieces: everything that's before the
cursor, and everything that's after the cursor.

Those two pieces are place into memory with the part that's before
the cursor at the very beginning of the buffer, and the part that's
after the cursor at the very end of the buffer. In between the two,
you have a "gap" of unused memory space. When/if the user enters new
text, you add the characters into that memory space (for overwrite
mode, you write at the end of the gap instead of the beginning).

When the user moves the cursor, you copy text from one side of the
gap to the other. Since the user usually only moves the cursor in
small increments (word, line, possibly page) it's pretty easy to copy
the text around quickly enough to keep up (with a little care, you
could keep up even on 8-bit processors). With this design, the single
most difficult situation to handle quickly is a goto line N, which
involves not only scanning through the data to find the Nth line, but
also copying data around to put the gap at the right place. Unless
files get really huge, though, it's still not much of a problem.

Michael Doubez · Aug 11, 2009

[ ... ]

I had a look on wikipedia ... as clear as the bottom of my socks.

Click to expand...

The idea is pretty simple, really. When you're editing a text file,
you separate the file into two pieces: everything that's before the
cursor, and everything that's after the cursor.

That I understood but what do you gain except an amortized copy ?
I guess it is memory cost effective.

Those two pieces are place into memory with the part that's before
the cursor at the very beginning of the buffer, and the part that's
after the cursor at the very end of the buffer. In between the two,
you have a "gap" of unused memory space. When/if the user enters new
text, you add the characters into that memory space (for overwrite
mode, you write at the end of the gap instead of the beginning).

That has the sake of simplicity but if you maintain a list of
modification (like rope or another list structure with modifications
stored in a scratch buffer) until next write to file then insertion
and deletion seems cheap to me.

I have never tried to write a text editor but this approach appeals me
more than gap buffer.

When the user moves the cursor, you copy text from one side of the
gap to the other. Since the user usually only moves the cursor in
small increments (word, line, possibly page)

I don't know what you call small increment but, on a daily basis, I
move between marks, end/beginning of line, next/previous char/word/
symbol ... I move much more than I write; I expect the gap is created
on the first modification.

Don't mistake me. I don't discuss the usefulness of the technique but
I try to understand the forces that apply to such a design.

Alf P. Steinbach · Aug 11, 2009

* Michael Doubez:

[ ... ]

The idiomatic
choice for those who have made some editors from scratch (not just using
an edit control/widget).Look up "cursor gap" in Wikipedia or wherever.
I had a look on wikipedia ... as clear as the bottom of my socks.

Click to expand...

The idea is pretty simple, really. When you're editing a text file,
you separate the file into two pieces: everything that's before the
cursor, and everything that's after the cursor.

Click to expand...

That I understood but what do you gain except an amortized copy ?
I guess it is memory cost effective.

It's a *simple* structure that yields O(1) insertion and deletion at the cursor
position and O(1) random access read and write, where the former is the same as
with a linked list, and the latter is much better than a linked list.

The main cost is that, without reallocation and copying of all the data, the
maximum size of the buffer is determined up front, and it uses that much memory
regardless of the amount of data (for an editor, text), so no, it isn't
especially memory cost effective.

Of course, in C++ any complexity for a more complicated structure can be hidden
under a simpler programmatic interface. So I guess that my comment reflected
that since C++ displaced C for such tasks, implementing editors from scratch
hasn't been that much in vogue. And the only such editor I made that I still
have the source code for, on magnetic tape, an editor for the HP3000 from 198x!,
wasn't written in C but in Pascal, and it didn't use cursor gap but a more
sophisticated list of blocks, sort of like a C++ std::deque implementation.

Cheers,

- Alf

James Kanze · Aug 11, 2009

Let say I am writing a word processor. It would be a bad
design to consider that a document is a std::vector of
instances of a class MyChar ? Where a MyChar would look
like:
struct MyChar
{
char TheChar;
bool Bold;
bool Italic;
unsigned char Color[3];
};

Click to expand...

Inserting and deleting in a vector<> for an editor is already
a problem.

Maybe. The classical solution has always been to keep the free
space at the position of the cursor, rather than at the end, but
I do remember some measurements, made a long time ago, which
showed that copying even a MB of characters wasn't that
expensive.

Maybe. Alf's already mentionned the flyweight pattern; more
classically, character attributes tend to affect blocks of
characters, and not really be needed for all editing tasks, so
the usual solution is to maintain them separately, over ranges,
rather than single positions. Say in a structure something like
std::map< position, attributes >. (If you need to find the
attributes for a specific character, lower_bound with the
character's position should do the trick. On the other hand,
insertions and deletions can be tricky, since unless you can do
so without modifying the position key, you'll have to update all
of the map entries following the affected position. The systems
I'm aware of that use this strategy use two dimensional
positionning---line and column---so that normally, very few
entries have to be updated.)

You can also use a discriminated union with either character
data or a pointer to the (new) attributes in the same array. To
find the attributes for a character, scan backwards until you
hit an entry which contains or points to attributes. (If you
don't mind some tricky programming, there are bits free in
something like:
union Character
{
uint32_t codePoint ;
Attributes const* attributes ;
} ;
which could be used. Unicode only requires 21 bits, and on most
systems, an Attributes* must be aligned---on a lot of systems,
like Windows or Linux, you're even guaranteed that a certain
number of the upper bits won't all be zero (and on all of the 32
bit Unix systems I know, it would be simple to write an
allocator/garbage collector for Attributes which would ensure
that the top bit of the address is always set).

If you want to keep the data with the character, then the
solution should still involve a pointer, so that the size and
structure of Attributes can easily evolve:
struct Character
{
uint32_t codePoint ;
Attributes const* attributes ;
} ;
(This is probably the easiest solution to implement, so should
probably be chosen until it is clear that it causes problems.)

Unless you're keeping the attributes in a separately keyed
structure, you'll want garbage collection. Managing their
memory by hand is too painful, and replacing the raw pointer
with a smart pointer here will kill you when it comes to
performance (and can't be done at all with the union
solution).

James Kanze · Aug 11, 2009

Alf P. Steinbach a écrit :

* Michael Doubez:

Let say I am writing a word processor. It would be a bad design to
consider that a document is a std::vector of instances of a class
MyChar ? Where a MyChar would look like:
struct MyChar
{
char TheChar;
bool Bold;
bool Italic;
unsigned char Color[3];
};
Inserting and deleting in a vector<> for an editor is already a
problem.

Click to expand...

No, it's the preferred and fastestest for an editor.

Click to expand...

There is no "preferred and fastest" solution.

I thought it was the scratch buffer (like in ed or vi).

It depends on the editor idiom. Ed and vi are line oriented,
and keep their data in a two dimensional structure, something
like an std::vector< std::string > or and std::vector<
std::vector< char > >. Most text processors are not line
oriented; line breaks are decided when formatting. The editors
in such programs are probably more like emacs, internally, using
some sort of gap buffer. The choice of the preferred idiom
depends on the set of commands the editor understands.

Note that most of the statements you will find concerning
performance in editors (including those I make

) are based on
measurements made a long, long time ago. The one time I
suggested using a gap buffer to a collegue, he actually measured
the difference compared to using the equivalent of std::vector<
char >, inserting characters at the beginning of a very large
block of text, and found that the std::vector< char > solution
was largely sufficient with regards to performance. (And that
was around 20 years ago. The "common knowledge" concerning how
to organize a buffer in an editor is a lot older than that,
however.)

I had a look on wikipedia ... as clear as the bottom of my
socks. But it looks interesting. Definitely will get a look at
"The Craft of Text Editing" (if I can find a working link).

You might be interested in the editor in Kernighan and Plauger's
_Software Tools in Pascal_. I think it's pretty close to what
is used in ed (and vi, and probably vim as well). (The book
itself is, despite its age, required reading for anyone wanting
to write programs professionally. And don't let the Pascal in
the title throw you---the authors have managed to write C code
in Pascal, and I've used the book as a textbook for a course in
C---today, of course, I'd teach C++ and use Stroustrup's
_Programming Principles and Practice Using C++_, which is
definitely the best book I've seen for teaching programming.)

James Kanze · Aug 11, 2009

[ ... ]

The idiomatic choice for those who have made some
editors from scratch (not just using an edit
control/widget).Look up "cursor gap" in Wikipedia or
wherever.
I had a look on wikipedia ... as clear as the bottom of my socks.

Click to expand...

The idea is pretty simple, really. When you're editing a
text file, you separate the file into two pieces: everything
that's before the cursor, and everything that's after the
cursor.

Click to expand...

Click to expand...

That I understood but what do you gain except an amortized
copy ? I guess it is memory cost effective.

Click to expand...

You don't get a large copy on inserting near the beginning of
the buffer. It was an important issue 30 or 40 years ago, when
using the equivalent of std::vector< char >, an insertion near
the beginning of a large body of text visibly slowed things
down. Measurements made by a collegue roughly 20 years ago
showed that it wasn't an issue then; unless machines have gotten
slower since then, it isn't an issue today.

That has the sake of simplicity but if you maintain a list of
modification (like rope or another list structure with
modifications stored in a scratch buffer) until next write to
file then insertion and deletion seems cheap to me.

Click to expand...

Boehm designed Rope expressedly for this type of use. It also
handles spilling to disk gracefully, since you merge and spill
blocks, keeping the merged block header with the reference to
the disk image. (When spilling a gap buffer, it's usual to
maintain two separate spill zones, at each end, and maintain one
of them in reverse order.)

I'm not sure that spilling is an important issue today; if the
text doesn't fit in memory, it seems reasonable to require the
user to split it up into several files. Back when a process
could only have 64 KB of memory, however, efficient spilling was
an important design issue.

I have never tried to write a text editor but this approach
appeals me more than gap buffer.

Click to expand...

I don't know what you call small increment but, on a daily
basis, I move between marks, end/beginning of line,
next/previous char/word/ symbol ... I move much more than I
write; I expect the gap is created on the first modification.

Click to expand...

The gap is always there. Of course, when the cursor is one end
of the file, the amount of text on the other side of the gap is
0. (When you start up the editor, most editors put the cursor
at the beginning of the file, which means all of the text is in
the end buffer, and the gap is at the start.)

Don't mistake me. I don't discuss the usefulness of the
technique but I try to understand the forces that apply to
such a design.

Click to expand...

Today, nothing. Even in the older times, I think that the line
oriented approach used by vi was probably preferable (but then,
I've never cared much for emacs---and this really is an emacs
vs. vi argument). Today, just use std::vector< char > or
std::vector< std::vector< char > > until the profiler says
otherwise, wrapping it in a class which presents a minimal
interface to the rest of the code, to facilitate changing it if
that moment ever comes.

Click to expand...

James Kanze · Aug 11, 2009

* Michael Doubez:

[ ... ]
The idiomatic choice for those who have made some editors
from scratch (not just using an edit control/widget).Look
up "cursor gap" in Wikipedia or wherever.
I had a look on wikipedia ... as clear as the bottom of my
socks.
The idea is pretty simple, really. When you're editing a
text file, you separate the file into two pieces:
everything that's before the cursor, and everything that's
after the cursor.

Click to expand...

That I understood but what do you gain except an amortized
copy ? I guess it is memory cost effective.

Click to expand...

Click to expand...

It's a *simple* structure that yields O(1) insertion and
deletion at the cursor position and O(1) random access read
and write, where the former is the same as with a linked list,
and the latter is much better than a linked list.

Click to expand...

But nobody uses a linked list. And it's O(n) for random cursor
positionning, or going to a specific line. It is, in fact, O(n)
when you go to the end of the file, which is a frequent
operation (since in many cases, that's where you're inserting
text). And I don't know how it works in today's environment,
when you have several windows open on the file, each with its
own cursor. (I think, but I'm very far from sure---maybe
someone who has the sources to emacs handy could check---that
emacs moves the cursor in the file to correspond to its position
in the active window. So that if you have one window open with
the local class definitions, at the top of the file, and another
where you're coding the implementations, at the bottom, each
time you move the cursor between the windows, you copy most of
the text.)

For a program editor, I'd definitely use something line
oriented, along the lines of vi, if only because one of the most
common operations is going to a specific line, with the line
number coming from an error message. The fact that emacs does
this almost instantly is, in fact, proof that on today's
machines, it really doesn't matter---just use

The main cost is that, without reallocation and copying of all
the data, the maximum size of the buffer is determined up
front, and it uses that much memory regardless of the amount
of data (for an editor, text), so no, it isn't especially
memory cost effective.

Click to expand...

The real cost is when you start using OS's which penalize
programs for excessive memory use. On the earliest version of
Unix I used (no virtual memory), when the OS had to swap out a
process, it choose the biggest, because that recovered the most
memory. (Back then, vi was the biggest program on the machine.
And everytime someone started a compile, vi stopped reacting for
three to five seconds.) Under Linux, you have to worry about
the famous OOM killer, which logically might behave
similarly---at the worst, the fact that you allocate a lot of
memory without touching it immediately does create a situation
where the OOM killer will be released. (Under Windows or
Solaris, it just works, thank you, and there's no OOM killer to
be worried about.)

Of course, in C++ any complexity for a more complicated
structure can be hidden under a simpler programmatic
interface. So I guess that my comment reflected that since C++
displaced C for such tasks, implementing editors from scratch
hasn't been that much in vogue. And the only such editor I
made that I still have the source code for, on magnetic tape,
an editor for the HP3000 from 198x!, wasn't written in C but
in Pascal, and it didn't use cursor gap but a more
sophisticated list of blocks, sort of like a C++ std::deque
implementation.

Click to expand...

Sounds like vi.

Click to expand...

Click to expand...

James Kanze · Aug 11, 2009

Let say I am writing a word processor. It would be a bad
design to consider that a document is a std::vector of
instances of a class MyChar ? Where a MyChar would look
like:
struct MyChar
{
char TheChar;
bool Bold;
bool Italic;
unsigned char Color[3];
};
This design is hardcoded (what if I need another attribute
later on) and it would be very difficult to look for a
specify array of char. It would also eat a lot of memory...
Is this something describe in the literature ? If yes, where
can I find documentation ?

Click to expand...

FlyWeight has already been mentioned, but it's really only
part of the answer.

IMO, you should take a look at the general design of CSS
and/or ODF. They're obviously not identical, but they have
some characteristics on common, and give an idea of how this
general type of problem has been solved (with a reasonable
degree of success, I might add) already.

Those are both external formats, using pure text (for
portability and ease of transfer), which implies a lot of
additional constraints.

The general idea can be described fairly simply though: a
couple of extra levels of indirection, at least one of them
explicitly visible to the user. Specifically, you want to
allow the user to put together a "style sheet" that describes
characteristics of some text (font, color, spacing, etc.) then
you want to let them apply that style to lots of different
text. The user can then format most of his/her document(s)
based on a logical style rather than always separately editing
the individual attributes. For example, you could start by
stealing a few "styles" from HTML, and having H1, H2, H3 and
H4.

That's actually a very good idea. I think it concerns user
interface more than internal representation, but it probably
doesn't make sense to invest too much effort on internal
representation until you've designed the user interface---as you
say, supporting this will probably involve some additional
levels of indirection.

Alf P. Steinbach · Aug 11, 2009

* James Kanze:

* Michael Doubez:

[ ... ]
The idiomatic choice for those who have made some editors
from scratch (not just using an edit control/widget).Look
up "cursor gap" in Wikipedia or wherever.
I had a look on wikipedia ... as clear as the bottom of my
socks.
The idea is pretty simple, really. When you're editing a
text file, you separate the file into two pieces:
everything that's before the cursor, and everything that's
after the cursor.
That I understood but what do you gain except an amortized
copy ? I guess it is memory cost effective.

Click to expand...

Click to expand...

It's a *simple* structure that yields O(1) insertion and
deletion at the cursor position and O(1) random access read
and write, where the former is the same as with a linked list,
and the latter is much better than a linked list.

Click to expand...

But nobody uses a linked list.

Click to expand...

Not sure about that.

And it's O(n) for random cursor
positionning, or going to a specific line.

Click to expand...

Well, you got that right.

Making that operation O(1) generally goes at the cost of O(n) insertion (e.g.
inserting a line). Old Basic implementations provided a possibility to work
around that (although I don't think they actually exploited the possibility they
created) by having a line number explicitly specified for, that is within, each
line. Then if you exhausted the "gaps" that you'd inserted in the line numbering
you had to "renumber" the lines.

Compromise is O(log n) for random cursor positioning.

It is, in fact, O(n)
when you go to the end of the file, which is a frequent
operation (since in many cases, that's where you're inserting
text).

Click to expand...

Above is incorrect. Cursor gap is O(1) for moving to other end. With any
reasonable implementation, since going to the ends are frequent operations.

And I don't know how it works in today's environment,
when you have several windows open on the file, each with its
own cursor. (I think, but I'm very far from sure---maybe
someone who has the sources to emacs handy could check---that
emacs moves the cursor in the file to correspond to its position
in the active window. So that if you have one window open with
the local class definitions, at the top of the file, and another
where you're coding the implementations, at the bottom, each
time you move the cursor between the windows, you copy most of
the text.)

Click to expand...

I've never seen or made a cursor gap structure that supported multiple cursors.
Possibly it can be done for small number of cursors. I'm not sure.

For a program editor, I'd definitely use something line
oriented, along the lines of vi, if only because one of the most
common operations is going to a specific line, with the line
number coming from an error message. The fact that emacs does
this almost instantly is, in fact, proof that on today's
machines, it really doesn't matter---just use
std::vector< char >, and be done with it.

Click to expand...

I think you mean std::vector< std::wstring >. And that's OK, at least with a COW
string implementation or C++0x move semantics, or a with any half-decent DIY
immutable string class. But as I recall, the makers of the main free C# IDE
started out with that but had to switch to a more sophisticated scheme (they
wrote a book about it, available free online).

The real cost is when you start using OS's which penalize
programs for excessive memory use. On the earliest version of
Unix I used (no virtual memory), when the OS had to swap out a
process, it choose the biggest, because that recovered the most
memory. (Back then, vi was the biggest program on the machine.
And everytime someone started a compile, vi stopped reacting for
three to five seconds.) Under Linux, you have to worry about
the famous OOM killer, which logically might behave
similarly---at the worst, the fact that you allocate a lot of
memory without touching it immediately does create a situation
where the OOM killer will be released. (Under Windows or
Solaris, it just works, thank you, and there's no OOM killer to
be worried about.)

Sounds like vi.

Click to expand...

No, it didn't have modes, except overwrite/insert.

It was also one of my first small-language implementations, because I "had" to
invent a language to store configuration of the terminal function keys. The HP
terminals had very nice function keys, with small displays at the bottom of the
screen showing the current assignment of any function key. Oh it was nice.

Cheers,

- Alf

Click to expand...

Robert Hairgrove · Aug 11, 2009

James said:
emacs moves the cursor in the file to correspond to its position
in the active window. So that if you have one window open with
the local class definitions, at the top of the file, and another
where you're coding the implementations, at the bottom, each
time you move the cursor between the windows, you copy most of
the text.)

EMACS = Eight Megabytes Are Continually Swapping ...

Jerry Coffin · Aug 11, 2009

[ ... ]

That has the sake of simplicity but if you maintain a list of
modification (like rope or another list structure with modifications
stored in a scratch buffer) until next write to file then insertion
and deletion seems cheap to me.

I have never tried to write a text editor but this approach appeals
me more than gap buffer.

I can't quite understand why. Offhand, I can only think of one
operation (pasting a large chunk of text) that it _might_ make faster
(if you could just link in the pasted block instead of copying it).
In general, it sounds to me like a lose-lose situation -- slower
operation, and more complex implementation.

I don't know what you call small increment but, on a daily basis, I
move between marks, end/beginning of line, next/previous char/word/
symbol ... I move much more than I write;

The only of those that's of any concern at all is moving between
marks. To move between marks, you copy all the text between the marks
from one side of the gap to the other. With marks more than a few
tens of megabytes apart, it's going to be difficult to implement that
in the 100 ms (or so) to be perceived as "instant". A couple hundred
ms isn't too terrible, but much beyond that can start to bother the
user.

The rest are simply too small to care about. Figure a line is <80
characters, usually being moved around in the cache. Such a movement
is usually going to be in the sub-microsecond range.

I expect the gap is created
on the first modification.

The gap is always at the cursor, from the time the file is loaded.

In the end editing/word processing is interactive -- a (very) soft
real-time process. In most cases, what matters is fast _enough_
response; faster doesn't hurt, but rarely matters much either.

James Kanze · Aug 12, 2009

* James Kanze: [...]

But nobody uses a linked list.

Click to expand...

Not sure about that.

OK. I've never seen it

.

Above is incorrect. Cursor gap is O(1) for moving to other
end. With any reasonable implementation, since going to the
ends are frequent operations.

It's certainly O(n) with the classical implementation of cursor
gap. Say, the one in emacs (or at least, the one in emacs 15-20
years ago, which was the last time I looked at the source).

[...]

I've never seen or made a cursor gap structure that supported
multiple cursors. Possibly it can be done for small number of
cursors. I'm not sure.

In a modern editor, you do have several cursors. More
precisely, a modern editor will separate the model (the buffer)
and the view (the window), with the possibility of having
several views in the same model. Logically, the cursor is part
of the view, although it can be complicated, since it is also
used by the model in most commands. (I'd still keep it in the
view, and only pass it down to the model when a command which
needed it, like insert, was issued. Maybe that's how emacs and
others using a cursor gap buffer work today: they only move the
cursor in the buffer when needed---when a view passes them a
command with the cursor in it.)

I think you mean std::vector< std::wstring >.

Maybe. The original vi definitely used something more like
vector< string >

.

Actually, for a program editor, I'd probably go with line
oriented and UTF-8 internally. So std::vector< std::string >,

where Line is a structure said:
And that's OK, at least with a COW string implementation or
C++0x move semantics, or a with any half-decent DIY immutable
string class.

Or you could use a vector< Line* >

.

Alf P. Steinbach · Aug 12, 2009

* James Kanze:

* James Kanze: [...]

It's a *simple* structure that yields O(1) insertion and
deletion at the cursor position and O(1) random access read
and write, where the former is the same as with a linked list,
and the latter is much better than a linked list.
But nobody uses a linked list.

Click to expand...

Click to expand...

Not sure about that.

Click to expand...

OK. I've never seen it.

Above is incorrect. Cursor gap is O(1) for moving to other
end. With any reasonable implementation, since going to the
ends are frequent operations.

Click to expand...

It's certainly O(n) with the classical implementation of cursor
gap. Say, the one in emacs (or at least, the one in emacs 15-20
years ago, which was the last time I looked at the source).

You might as well say that quicksort is O(n^3) average time because some novice
has managed to implement something he calls quicksort with cubic average time.

[...]

I've never seen or made a cursor gap structure that supported
multiple cursors. Possibly it can be done for small number of
cursors. I'm not sure.

Click to expand...

In a modern editor, you do have several cursors.

Yeah, but except for collaborative editors like the Mac one, only one at a time.

More
precisely, a modern editor will separate the model (the buffer)
and the view (the window), with the possibility of having
several views in the same model. Logically, the cursor is part
of the view, although it can be complicated, since it is also
used by the model in most commands. (I'd still keep it in the
view, and only pass it down to the model when a command which
needed it, like insert, was issued. Maybe that's how emacs and
others using a cursor gap buffer work today: they only move the
cursor in the buffer when needed---when a view passes them a
command with the cursor in it.)

The cursors and information about which one's active is to me obviously part of
the model used to present the GUI (note: it will be backed by other models).

It puts the knowledge where it's needed to avoid complication and unnecessary
communication between parts of the program.

It also allows reinstating the cursor positions when you reload the file, if the
file has associated information (e.g. in an editor "project"), and although I
think of the previous paragraph as most important, someone not very familiar
with MVC may just consider this simple rule-of-thumb: what do you want to save.

I suspect that the tendency to think of views as irrelevant passive consumers of
data from a model, not introducing data requirements themselves, stem from some
general feeling of inferiority and imperfectness, that the old Smalltalk MVC was
godlike perfect, an unachievable goal that one should nevertheless try to aim
at. Trying to emulate something one doesn't understand often leads to such
bizarre ideas. Conventions adopted as sort of laws, with no real understanding.

The important point to understand here is that MVC or just a model-view is
intended to simplify. If it turns out to complicate, then something's wrong.

[snip]

Cheers,

- Alf

Privacy Shield A Clean, C++ Win32 Tool for Temporarily Masking Windows	4	Mar 25, 2026
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Looking for a design pattern	3	Apr 30, 2005
What pattern am I looking for?	3	Mar 4, 2010
design pattern for a file converter...	14	Dec 9, 2010
Secure Keyboard v2.0 Modern C++ Virtual Keyboard for Windows (Glassmorphism UI, Clipboard Auto-Clear)	0	Mar 26, 2026
design pattern for defining a relational operator	2	Jun 4, 2007
Looking for a pattern	16	Oct 31, 2007

Looking for a design pattern

mathieu

Alf P. Steinbach

mathieu

Michael Doubez

Jerry Coffin

Alf P. Steinbach

Michael DOUBEZ

Jerry Coffin

Michael Doubez

Alf P. Steinbach

James Kanze

James Kanze

James Kanze

James Kanze

James Kanze

Alf P. Steinbach

Robert Hairgrove

Jerry Coffin

James Kanze

Alf P. Steinbach

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads