Help with naming convention requested

J

James Kanze

I have a particular case where I'm having a problem deciding on
a good naming convention. Basically, I have several classes
which are split in two: there is a base class, which is a POD
(designed to support static initialization) and has only const
functions; and a derived class, which has the usual
constructors, and various non-const functions. A good example
of this is SetOfCharacter for UTF8: the base class has functions
like:
template< typename ForwardIterator >
bool isSet( ForwardIterator begin,
ForwardIterator end ) const ;
(where the iterators designate a sequence of UTF-8 characters,
and the function returns true for the first character in the
sequence), and a function:
void dumpAsCpp( std::eek:stream& dest ) const ;
which outputs a declaration of the class with algomerate
initialization (with all of the necessary sub-tables in
anonymous namespace). The derived class has all of the
classical constructors, and the non-const operators, which build
and modify the data structure. The separation is essential
because it allows such objects to be statically initialized and
thus avoids order of initialization problems. It is also useful
in a few cases for performance reasons: one application uses
several thousand such objects, and dynamic construction can take
a significant amount of time.

For the moment, I've named the classes: BasicSetOfCharacter (POD
base class) and SetOfCharacter (derived class). I'm not
particularly happy with this, since it means that functions
taking a const reference really have to take a
BasicSetOfCharacter const&. Other alternatives I've considered
are ConstSetOfCharacter/SetOfCharacter and
SetOfCharacter/DynamicSetOfCharacter, but I'm open to other
suggestions as well.

I'm just curious as to what other programmers think. What would
be a good naming convention for this sort of thing? (I'd also
consider a good solution which doesn't involve two different
classes, but I don't think it's possible, since one has to be a
POD, and the other must at the very least have a destructor,
since the data structure must be built up dynamically, and I'm
not guaranteed to have garbage collection available.)
 
G

Greg Herlihy

For the moment, I've named the classes: BasicSetOfCharacter (POD
base class) and SetOfCharacter (derived class).  I'm not
particularly happy with this, since it means that functions
taking a const reference really have to take a
BasicSetOfCharacter const&.  Other alternatives I've considered
are ConstSetOfCharacter/SetOfCharacter and
SetOfCharacter/DynamicSetOfCharacter, but I'm open to other
suggestions as well.

I think that "CharacterSetBase" and "CharacterSet" (for the base and
derived classes respectively) would sound more natural and would
better convey the class' purpose. After all, a character set can be
any arbitrary collection of characters, it does not need to match any
of the "standard" character sets - although, of course, it might.

Moreover, you could also convey the const/non-const difference between
the classes with typedefs:

typedef CharacterSetBase * ConstCharacterSetRef;
typedef CharacterSet * CharacterSetRef; // or MutableCharacterSetRef

I would favor leaving out "dynamic" or "mutable" (since the absence of
"Const" should be enough). Note also that - although the two
CharacterSet typedefs are pointer types - the "Ref" in the typedef
name means that variables of this type are not expected to be null
(otherwise, I would use "Ptr" instead of "Ref").

Greg
 
J

James Kanze

* James Kanze:
Consider using singletons, or completely rethinking that design.

How can singletons help, given that I have many, many instances
of the type?
Although that was in Java, I was once bitten by someone else's
decision to make static a lot of data and functionality that
was logically global single instance.

In this case, I'm not working in Java. I'm working in C++. In
the first version, I didn't use static data, and the program
took close to a half an hour to start. For a program that's
typically invoked just to process a few lines, that's simply not
acceptable.

(The program is a small processor, something like AWK, but a lot
simpler. The SetOfCharacter, and the related StateTable, are
used in regular expressions which are in turn used to tokenize.
The half an hour included parsing the regular expressions, etc.
And since the regular expressions are in fact constants---what
constitutes a token never changes---the obvious solution was to
make the entire parsed regular expression staticly initialized.)
I'd still like to bite that person back :), but I don't
recall who it was.
Several thousand function calls => microseconds => significant
at startup?

It was more complicated than that, but the start up was on the
order of a half an hour. That is significant.
Since you're aiming for POD'ness, how about including "POD" in
the name?

Because the POD'ness is a means to an end, and not a goal in
itself. And because it's the base class, and for most client
code, it's the class they should be using. In many ways, I
really think that SetOfCharacter should be the name of the base
class, with maybe MutableSetOfCharacter the derived class.
 
J

James Kanze

I think that "CharacterSetBase" and "CharacterSet" (for the base and
derived classes respectively) would sound more natural and would
better convey the class' purpose. After all, a character set can be
any arbitrary collection of characters, it does not need to match any
of the "standard" character sets - although, of course, it might.

Exactly. I think you've understood what I'm working at. I
don't want to have to build up "[:alpha:]" at each execution; in
fact, practically, I can't, because I build it by reading and
parsing UnicodeData.txt. So it should be a statically
initialized constant. (In the actual application, a lot of
special sets can be as well, but [:alpha:] is a good example,
because it must be initialized externally.) On the other hand,
client code can (and in some cases does) create their own
SetOfCharacter, using the classical set operations.

I am currently using BasicSetOfCharacter and SetOfCharacter.
What I don't like about it is that most client code doesn't
modify its argument, and you want to be able to pass it things
like CharacterClass::alpha (the set for [:alpha:]). I'm thus
drawn to the idea that SetOfCharacter should be the non-mutable
base class, with something special to indicate mutability. On
the other hand...

With regards to CharacterSet vs. SetOfCharacter: the second
sounds more natural to me. Both Character and Set are nouns (in
this case, at least), and it seems more natural to join nouns
with a preposition than to just juxtapose them. But this could
just be me---it's been over 35 years since I last regularly
spoke English, and it's probable that I've lost a lot of my feel
for the language. (Now that you mention it, English is a
Germanic language, and in German, I'd definitely call it
Zeichenmenge, and not MengeVonZeichen.)
Moreover, you could also convey the const/non-const difference between
the classes with typedefs:
typedef CharacterSetBase * ConstCharacterSetRef;
typedef CharacterSet * CharacterSetRef; // or MutableCharacterSetRef
I would favor leaving out "dynamic" or "mutable" (since the absence of
"Const" should be enough). Note also that - although the two
CharacterSet typedefs are pointer types - the "Ref" in the typedef
name means that variables of this type are not expected to be null
(otherwise, I would use "Ptr" instead of "Ref").

I'm not sure about the typedef's, but I think you've raised a
significant point: in C++, mutability is implicit, and we mark
constness. If I base my naming on the language, the names
should be ConstSetOfCharacter and SetOfCharacter.
 
J

James Kanze

* James Kanze:
Given that you really need to have a great many separately
named instances, you can still use the main singleton
implementation ideas to avoid order of initialization
problems; of course they'll not be singletons then but so
what. :)

That still wouldn't solve the performance problem.
[snip]
I'm working in C++. In the first version, I didn't use
static data, and the program took close to a half an hour to
start. For a program that's typically invoked just to
process a few lines, that's simply not acceptable.
Assuming you're now down to something more reasonable, like
0.1 seconds, that's a factor of 18 000 and sounds like a bug,
not the result of added function call overhead.

Who's talking about function call overhead. These instances are
used in a regular expression based tokenizer. If I use dynamic
initialization, then I need to parse the full regular
expression, building most of them in dynamically allocated
memory. And on my machine, here, that takes almost a half an
hour. Actually, I have modified the strategy somewhat, and the
time is coming down. But it is still measured in minutes (more
than 10), whereas loading a statically initialized table takes
practically 0 time.
[snip]
Because the POD'ness is a means to an end, and not a goal in
itself. And because it's the base class, and for most client
code, it's the class they should be using. In many ways, I
really think that SetOfCharacter should be the name of the base
class, with maybe MutableSetOfCharacter the derived class.
Ah, well, I disagree, for two reasons. First, if my window
abstraction is a means to earn money in order to not suffer
from lacking such, and thus be happy, well I don't name it
BeHappy. Even if that's the final end it is a means for. :)

Agreed. Otherwise, all of our classes would be named
MakeMoney:). But you don't have to go to the opposite extreme.
In context, the abstraction that the client code has to deal
with is SetOfCharacter.
And second, I like names to be such that when I see the name I
get a correct impression, or at least not a misleading one,
about what that name names. And what you have been talking
most about for this class is POD'ness, that's apparently the
most important aspect and the reason the class exists at all,
so if I were to choose a name I'd include that most important
aspect, not hide it!

The POD'ness is an important issue for performance reasons, and
it is the major motivation for creating two classes.
Nevertheless, for the client code, it's irrelevant (except for
performance issues); the client code is dealing with a
SetOfCharacter; in one case, a SetOfCharacter which can be
modified, and in the other, one which cannot be modified.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top