ISO Studies of underscores vs MixedCase in Ada or C++

A

Andy Glew

I am in search of any rigourous,
scientific, academic or industrial studies
comparing naming conventions in
C++ or similar languages such as
Ada:

Specifically, are names formed with
underscores more or less readable
than names formed with MixedCase
StudlyCaps camelCase?

....and similarly, any measurements
of programmer productivity, bug rate,
etc.; although IMHO readability matters
most.


* Religion - NOT?!

I understand that this is a religious issue
for many programmers, an issue of programming style.
I am not interested in a religious war.
I obviously have my own opinion, but I am
open to scientific evidence.


* Ada Studies?

I thought that I had seen studies like
this in some of the early design documents
for Ada, but I have not been able to find
such references on the web. Which is not
entirely surprising, since Ada was designed
prior to the web.

The Ada 83 and 95 Quality Guidelines recommend
underscores to improve readability, but provide
no source justifying this statement.


* What such studies might look like

Simple readability and recall:
- present a test subject with
a list of compound words
formed with underscoresand mixed case
- remove the list, and ask test subject
to write it
- score on accuracy

Program debugging
- present programs that are otherwise identical,
differing only in their use of underscores/MixedCase
to test subject programmers (e.g. a CS class)
- program has a known bug
- ask test subjects to find bug
- score on accuracy locating bug

Cruel TA study:
- Two sections of a CS class
- Enforce programming standards,
underscores vs MixedCase
- Pose a programming problem
- Score according to success
completing assignment

Empirical:
- Given version control databases
of large programs, some written in underscore
style, others in MixedCase
- Total bug rates normalized by LOC, name count, etc.
- OR: count only bugs that can be attributed
(after inspection of checkins) to misnamed variables

For that matter, I would be interested in any surveys
folks may have done that count projects and their
coding standards, possibly weighted
- open source (e.g. sourceforge)
- industrial
- textbooks, weighted by sales
- websites of coding standards, weighted by Google score...
Although this is less convincing than a rigorous study.


* Explanation of Newsgroups Chosen

I hope it is obvious why I have chosen these
newsgroups to post this search to:

comp.software-eng, comp.programming,
- an issue of software engineering
comp.lang.c++,
- the language I am most interested in
comp.lang.ada
- because I vaguely recall historical work
 
A

Attila Feher

Andy said:
I am in search of any rigourous,
scientific, academic or industrial studies
comparing naming conventions in
C++ or similar languages such as
[SNIP]

The underscore convention work also in case insensitive languages.

The InnerCaps convention fails to solve the issue of all caps words like
SMTPTCPIPConnection. Usual solution is to write them wrong as
SmtpTcpIpConnection.

The underscore convention tends to make lines longer, which can have bad
effect on readablity.

IMO it is a personal preference issue, and also an issue of what fonts and
development envirnmoent is in use.

IMO if one has to select *one* convention for a whole company using many
languages then only the underscore one stands. With InnerCaps there is a
possibility to create hard-to-find name collisions, especially in languages
where the type of variables can change runtime by a simple assignment.
 
J

Jakob Bieling

[snip]
Specifically, are names formed with
underscores more or less readable
than names formed with MixedCase
StudlyCaps camelCase?

Write a large text (several lines) with mixed-case and the same again
with underscores. Then give it people to read and ask them what they find
easier to read. I would not be surprised if the majority favours the text
with underscores.

[snip]
The Ada 83 and 95 Quality Guidelines recommend
underscores to improve readability, but provide
no source justifying this statement.

The underscore can easily be view as a space which seperates the words,
whereas mixed-case does not provide a seperation like that, but rather a
'large' here-comes-a-new-word-mark (ie. the captial letter). The problem I
see with this: non-captial letters can be 'large' as well. just have a look
at the 't', 'h' etc, which, imo, does not make reading a mixed-case text
easier.

Personally, I prefer underscore for the reason above.

Just my .02c
 
M

Matt Gregory

Jakob said:
The underscore can easily be view as a space which seperates the words,
whereas mixed-case does not provide a seperation like that, but rather a
'large' here-comes-a-new-word-mark (ie. the captial letter). The problem I
see with this: non-captial letters can be 'large' as well. just have a look
at the 't', 'h' etc, which, imo, does not make reading a mixed-case text
easier.

I think we just need a programming font that has half-sized underscores
in front of all the capital letters. That would solve all these problems.
I personally don't like typing underscores, but I agree they are more
readable. Emacs does have a view-camel-cased-identifiers-as-underscored
mode, so that's a step in the right direction.
 
L

Ludovic Brenta

Personally I prefer underscores, too, and for that reason I really
like Emacs' glasses-mode. So, use whatever you want, *I* will always
see underscores :)
 
S

Steve

I think a more relevent test would be to give two versions the same code,
one with underscores, one with mixed casing, to different groups of
programmers to analyze. Include a quiz asking questions about the code.
See which version results in more correct answers, and which version
achieves the answers more quickly.

Steve
(The Duck)

[snip]
 
F

Frank J. Lhota

Underscores are basically a way to provide spaces in an identifier. Since
identifiers are generally phrases (nown phrases for objects, verb phrases
for procedures) and phrases often consist of more than one word, I find the
use of underscores to be quite natural.

The opposing argument is that underscores are too large, and that a case
change is a more readable way to indicate how to divide the decomposition
into words. To me, the upper / lower case method of delineate the words in
an indentifier has always looked like the transcript of a very fast talker.
Yes, you can make out the words, but just barely. Moreover, the use of
letter case to delineate words prohibits any other use of letter case. It
rules out using all caps for a certain category of identifiers, for example.

There is an easy way to test which convention is more readable. Here is one
of Shakespeare's sonnets rendered in the mixed case format:

FromFairestCreaturesWeDesireIncrease,
ThatTherebyBeautysRoseMightNeverDie,
ButAsTheRiperShouldByTimeDecease,
HisTenderHeirMightBearHisMemory:
ButThouContractedToThineOwnBrightEyes,
FeedstThyLightsFlameWithSelfSubstantialFuel,
MakingAFamineWhereAbundanceLies,
ThySelfThyFoeToThySweetSelfTooCruel:
ThouThatArtNowTheWorldsFreshOrnament,
AndOnlyHeraldToTheGaudySpring,
WithinThineOwnBudBuriestThyContent,
AndTenderChurlMakstWasteInNiggarding:
PityTheWorldOrElseThisGluttonBe,
ToEatTheWorldsDueByTheGraveAndThee

It may be a matter of taste, but I certainly found the original sonnet to be
more readable and more beautiful.
 
R

Randy King

<snip> op <snip>

This is a somwhat offtopic post, but the OP did ask the question about
readability.

Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer
inwaht orredr the ltteers in a wrod are, the olny iprmoetnt tihng is
taht the frist and lsat ltteer be at the rghit pclae. The rset can be a
total mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae
the huamn mnid deos not raed ervey lteter by istlef, butthe wrod as a
wlohe. Aolbsulty amzanig huh?
 
H

Hyman Rosen

Randy said:
Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer
inwaht orredr the ltteers in a wrod are, the olny iprmoetnt tihng is
taht the frist and lsat ltteer be at the rghit pclae. The rset can be a
total mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae
the huamn mnid deos not raed ervey lteter by istlef, butthe wrod as a
wlohe. Aolbsulty amzanig huh?

"Anidroccg to crad cniyrrag lcitsiugnis planoissefors at an uemannd,
utisreviny in Bsitirh Cibmuloa, and crartnoy to the duoibus cmials
of the ueticnd rcraeseh, a slpmie, macinahcel ioisrevnn of ianretnl
cretcarahs araepps sneiciffut to csufnoe the eadyrevy oekoolnr."
 
M

Matt Gregory

I said:
I think we just need a programming font that has half-sized underscores
in front of all the capital letters. That would solve all these problems.

Nevermind, that was a terrible idea. It was almost good though.
 
J

Jack Klein

I am in search of any rigourous,
scientific, academic or industrial studies
comparing naming conventions in
C++ or similar languages such as
Ada:

Specifically, are names formed with
underscores more or less readable
than names formed with MixedCase
StudlyCaps camelCase?

My team is currently working under this guideline as a compromise:

Function names must be CamelMode, but optionally underscores are
allowed, e.g. Camel_Mode.

....or should I say "compromised" guidelines?

Interestingly I see a lot of programmers who prefer CamelMode for
function names, yet prefer under_scores in variable names. In every
single case where I have checked, the programmer has done at least
some coding for Windows and its Pascal, BASIC, etc., API. And in
every single case they claim that is not where their style came from.
Go figure.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
 
P

Programmer Dude

Jack said:
Interestingly I see a lot of programmers who prefer CamelMode for
function names, yet prefer under_scores in variable names. In every
single case where I have checked, the programmer has done at least
some coding for Windows and its Pascal, BASIC, etc., API. And in
every single case they claim that is not where their style came from.

I've tried just about every combination over the years. At one
point it was underscores in function names, not in data names.
OOP added enough other basic types of things it got hard to have
a style for each. Currently, I use lower_case_with_underscores
for local names and CamelCaseMode for functions/methods and
for global data.

I'm considering switching to Mixed_Case_With_Underscores for
global data. In fact, with the fairly recent addition of
several new languages to my tool kit, it's probably time to
once again re-think my whole naming convention thing.
 
M

Mike Smith

Hyman said:
"Anidroccg to crad cniyrrag lcitsiugnis planoissefors at an uemannd,
utisreviny in Bsitirh Cibmuloa, and crartnoy to the duoibus cmials
of the ueticnd rcraeseh, a slpmie, macinahcel ioisrevnn of ianretnl
cretcarahs araepps sneiciffut to csufnoe the eadyrevy oekoolnr."

Yes, it's possible to take it *too* far. But I *was* able to read the
quoted text at maybe half the speed at which I could have read it if it
were spelled correctly. And the text in Randy King's post is even more
readable than that - I can read it at almost full speed.
 
T

tmoran

I think we just need a programming font that has half-sized underscores
If you want to get into fonts etc, look at "Human Factors and Typography
for More Readable Programs", (c) 1990 ACM Press, ISBN 0-201-10745-7
(It doesn't appear to address naming questions, however.)
 
M

Michael Feathers

Matt Gregory said:
problems.

Nevermind, that was a terrible idea. It was almost good though.


Let's see, what if an IDE had a toggle which converted identifier names back
and forth on demand, flagging any clashes. ;-)
 
H

Hyman Rosen

Mike said:
Yes, it's possible to take it *too* far. But I *was* able to read the
quoted text at maybe half the speed at which I could have read it if it
were spelled correctly. And the text in Randy King's post is even more
readable than that - I can read it at almost full speed.

Which clearly means that the first/last letter thing isn't the
only factor in comprehension.
 
D

Default User

Mike said:
Yes, it's possible to take it *too* far. But I *was* able to read the
quoted text at maybe half the speed at which I could have read it if it
were spelled correctly. And the text in Randy King's post is even more
readable than that - I can read it at almost full speed.


That's because it's not well scrambled at all. Examine the larger words,
they almost all have large unchanged or barely changed segments. Most of
the time double letter combos are kept together, very little reversal of
segments. I think the given example (I've received it many times) does
not provide much evidence for the contention at all.



Brian Rodenborn
 
D

Default User

Jack said:
Function names must be CamelMode, but optionally underscores are
allowed, e.g. Camel_Mode.


We are allowed underscores when acronyms appear in the name.

InitiateFMS_Executive();



Brian Rodenborn
 
A

Arthur J. O'Dwyer

That's because it's not well scrambled at all. Examine the larger words,
they almost all have large unchanged or barely changed segments. Most of
the time double letter combos are kept together, very little reversal of
segments. I think the given example (I've received it many times) does
not provide much evidence for the contention at all.

On the other hand, the thing which turned out to be confusing me the
most in Hyman's scrambled text was the typo (the comma after "unnamed").
Once I learned to ignore that, and take the rest of the grammar with a
grain of salt (the phrase including the word "uncited" also gave me
problems), it was fairly straight sailing.
At least, it was straight sailing until about half-way through, at
which point my brain kicked in and I rezilaed waht mohted was bnieg
uesd to otacsufbe the iaudividnl wdros -- at taht pniot I jsut setratd
rnidaeg tehm bdrawkcas.
Perhaps an interesting experiment would be to compare the relative
effects of ioisrevnn, aaabehiilopttzn, roandm sirnlcmabg, and radonm
dpraigh scamrbnlig. But that's not really topical here, (wherever
"here" is).

-Arthur
 
D

Dennis Lee Bieber

Matt Gregory fed this fish to the penguins on Friday 26 September 2003
12:11 am:
I think we just need a programming font that has half-sized
underscores
in front of all the capital letters. That would solve all these
problems. I personally don't like typing underscores, but I agree they
are more
readable. Emacs does have a
view-camel-cased-identifiers-as-underscored mode, so that's a step in
the right direction.

Well, we could all revert to a language with a parser like classical
FORTRAN -- where whitespace in identifiers was ignored...
--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top