Stylistic note on loops

R

Robert Klemme

I'm doing some text parsing and I'd appreciate some input on code styles
for loops.

A lot of the processing I'm currently doing involves finding a
particular character, and then doing something with the text up to that
point. For example:

int i;
for( i = 1; Character.isLetter( s.charAt( i) ); i++ )
{}
// do something with i and s here

This marches ahead until it find a character that isn't a "letter",
then, with i set to the offset of the last letter+1, is able to do
something with that group of letters in string s.

Did you consider using regular expressions? I do not know what your
inputs look like and what processing you have to do with this. But if
you need to take the string apart in more ways you could start with

package parsing;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class LetterParsing {

/** Pick a better name. */
private static final Pattern PAT = Pattern.compile("\\A.(\\p{Alpha}*)");

public static void main(String[] args) {
for (final String s : args) {
final Matcher m = PAT.matcher(s);

if (m.find()) {
final String letters = m.group(1);
System.out.println("Found letters: '" + letters + "'");
} else {
// no match
System.out.println("Did not find any letters in '" + s + "'");
}
}
}

}

Note, I let the regexp start with a dot because your indexes start at 1!

Kind regards

robert
 
M

Martin Gregorie

If you do serious Win32 API/MFC/ALT programming, then you better stick
to MS coding convention.

A lot of that stuff goes way beyond the C and C++ standards and are
truly MS land.
Agreed. I dislike their so-called Hungarian notation intensely. I find
that jumble of crap at the front, e.g. psaGetValues(), obscures the name
and, worse, if refactoring a design requires you to change the type of a
variable or the return value from a function you have to change its name
as well!

I think the usual K&R naming standards as codified and explained in "The
Practise of Programming" by Kernighan and Pike are fine for C in non-MS
shops and work acceptably in Java as well without clashing with accepted
Java practise. It contains Java code examples as well as C, C++, awk and
Perl.

BTW, I'd recommend reading this book to people learning to program in
Java. Its sections on designing code for ease of testing and debugging
are excellent and so are its comments about optimisation and use of
profilers to determine what sections of a program are worth optimising.
 
M

Martin Gregorie

The main point in coding convention is consistency.

You pick a convention and stick to it.
IME one of the worst crimes a programmer can commit is to modify an
existing program without sticking to the style it was originally written
in. It doesn't matter how much you hate the original style - use it and
follow its naming convention whatever that might be!

The effect of making changes in your pet style when that's at odds to the
original coding style are:

- everybody who works on the program in future will think you're a prat.

- if the program was already badly coded you'll just turn it into
a complete dogs dinner.
 
E

Eric Sosman

Developers are humans.

Even if they know things, then they can still get it
wrong.

Competent programmers should certainly know the
semantics of that semicolon, but they can still
miss it when reading the code.

They can get it wrong even if they read it and understand
it correctly. If I'm reading someone else's code and see an
oddly-positioned semicolon and understand perfectly well what
it signifies and what the program will do, I still may wonder:
"Did the original author really *mean* to do that, or did his
cat wander across the keyboard at just the wrong moment?" What
should have been a routine reading turns into an archaeological
excavation; the code is understandable but not readable.

I've seen this happen (admittedly not in Java, but that's
probably just happenstance). Once upon a time, having some spare
cycles, I decided to crank the compiler warning levels up as high
as they'd go while dumping my then company's three million lines
of C through it, just to see what might drop from the branches.
I got what you'd expect: Lots and lots of "venial" warnings, a few
obvious mistakes with obvious fixes -- and about half a dozen cases
that were quite clearly wrong, but where more than one "obvious" fix
was possible. Which fix? Well, what was the author's intention?

Two of the errors were in code whose authors were still available;
the others were "orphans" in subsystems I wasn't familiar with. Lots
of slow digging ensued ...

The moral: It is not enough that you write correct code; you
must also write clear code.
 
M

markspace

Did you consider using regular expressions? I do not know what your
inputs look like and what processing you have to do with this. But if
you need to take the string apart in more ways you could start with


I did, but I consider using regex for simple text parsing (i.e., looking
for a single character delimiter) to be pernicious. Regex is slow to
execute and harder to read than a simple loop. Therefore if I can just
code something up "by eye" I do. If I can't map it out in my head
easily then I consider Regex.
 
B

BGB / cr88192

Martin Gregorie said:
IME one of the worst crimes a programmer can commit is to modify an
existing program without sticking to the style it was originally written
in. It doesn't matter how much you hate the original style - use it and
follow its naming convention whatever that might be!

The effect of making changes in your pet style when that's at odds to the
original coding style are:

- everybody who works on the program in future will think you're a prat.

- if the program was already badly coded you'll just turn it into
a complete dogs dinner.

this is the one argument here I agree with...


it doesn't really matter what some authority says the coding styles should
be (Sun or MS or others), and ones' personal use of style should be flexible
(style is largely immaterial anyways).

this then leaves meshing with the existing code, and this is one of the
major ways where stylistic flexibility helps:
so one can write code which meshes with the code which is already there.


for writing ones' own code, things are a little more flexible, but I
disagree that there is or should be any single authoritative style, and
infact still believe that the optimal way to format something depends
somewhat on the factors surrounding the particular piece of code in question
(including more fluid matters, such as its overall design and style...).

for example, regions of code written in a more OO style will be better
suited to certain formatting practices, and regions of code written in a
more FP style will be better suited to others...


the main goal though is about minimizing costs, which may mean following
common conventions in many/most cases, disregarding them in some others, and
trying to style-match with the existing coding practices when working on or
maintaining existing code (as breaking the existing style makes things ugly,
regardless of whatever style one is using).

altering style in ways which breaks code is something to be extra avoided
IMO, even if one doesn't like the original style all that much...
 
A

Arne Vajhøj

Agreed. I dislike their so-called Hungarian notation intensely. I find
that jumble of crap at the front, e.g. psaGetValues(), obscures the name
and, worse, if refactoring a design requires you to change the type of a
variable or the return value from a function you have to change its name
as well!

I don't like it either.

But for good and for worse that is what MS unmanaged C/C++
developers has to live with.

Arne
 
A

Arne Vajhøj

IME one of the worst crimes a programmer can commit is to modify an
existing program without sticking to the style it was originally written
in. It doesn't matter how much you hate the original style - use it and
follow its naming convention whatever that might be!

The effect of making changes in your pet style when that's at odds to the
original coding style are:

- everybody who works on the program in future will think you're a prat.

- if the program was already badly coded you'll just turn it into
a complete dogs dinner.

Or the next maintainer will miss something critical, because
he assumed that everything was written in a certain style.

Arne
 
A

Arne Vajhøj

and, in the name of consistency, one can stick to the existing practices...

now, if one goes and looks at the Apache class library, one will see that
they don't exactly strictly follow the style conventions in the tutorial
either...

Apache projects require the code to follow their specified
coding standard.

Some Apache projects follow the standard Java coding standard
completely.

Some of them follow it with a few changes. The most common change
is the location of the {.

Arne
 
B

BGB / cr88192

Arne Vajhøj said:
Apache projects require the code to follow their specified
coding standard.

Some Apache projects follow the standard Java coding standard
completely.

Some of them follow it with a few changes. The most common change
is the location of the {.

yes, but it is not the exact same convention from the link elsewhere in the
thread, which sort of invalidates the whole "one true convention" argument,
in favor of a "code in this project should use 'this' particular
convention"...

things are moderately limited though by factors, such as how far one can go
before code starts looking weird or nasty, which usually takes a bit more
than matters of where exactly one puts their braces, or if there is
whitespace around operators, ...
 
M

Martin Gregorie

I don't like it either.

But for good and for worse that is what MS unmanaged C/C++ developers
has to live with.
Indeed, but its yet another reason I don't write code in that environment.

BTW, what do you make of this situation: at the start of a major project
in which there was a project manager, a technical manager and I was the
design authority and C was the programming language, and my design and
project-specific infrastructure specifications (e.g. a set of common
supporting libraries to handle intra-process connections and an active
data dictionary used to transform a set on incoming data formats to
common internal formats) had been signed off, the TM wrote a spec.
mandating K&R naming and coding standards and went off on leave for two
weeks.

Meanwhile two of us got stuck in and wrote and tested the supporting
libraries to the standards that he'd released.

At this point the TM returned, tore up his original coding standards and
re-issued them mandating 'Hungarian notation'. His next action was to
tell me that the support libraries must be rewritten in hungarian
notation.

By now the rest of the programmers were on board and we had a challenging
deadline to meet, so I told him to get stuffed and was able to make this
stick due to looming deadlines. All the code I wrote both then and
subsequently (under even tighter deadlines) adhered to the original
standards.

I maintain he was right out order changing his issued standards so
radically. What do you guys think?
 
J

Jim Janney

Tom Anderson said:
Seriously? We use nothing but where i work. In java and XML, at least
-
for some reason, in JSP (well, HTML, really) we tend use two spaces. I
think because JSP/HTML tends to have much more deeply nested
constructs.

I think our use of tabs must stem from the fact that Eclipse, by
default, uses tabs for indentation. They must have a reason to do
that.

tom

The trouble with tabs is that everyone has a different idea of what
the spacing should be, and they rarely bother to document it because
it's so obviously the one true way. This is manageable in a smallish
office environment but for distributed projects it turns into a
complete mess. I've seen source files where different sections
assumed different tab widths, so that no matter how you configured
your editor some parts of it would still look wrong. So some projects
ban the use of tabs completely [1].

Eclipse insists on indenting Java code anyway (and on breaking lines
in weird places, but that's another story) so using tabs doesn't
really offer any convenience.

[1] http://www.sfr-fresh.com/unix/misc/unzip552.tar.gz:a/unzip-5.52/proginfo/ZipPorts
(look for NO FEELTHY TABS)
 
T

Tom Anderson

The trouble with tabs is that everyone has a different idea of what the
spacing should be, and they rarely bother to document it because it's so
obviously the one true way. This is manageable in a smallish office
environment but for distributed projects it turns into a complete mess.
I've seen source files where different sections assumed different tab
widths, so that no matter how you configured your editor some parts of
it would still look wrong.

'Wrong' how? Whatever the tab width, things indented with the same number
of tabs will line up, and that's what matters. Sometimes that indentation
might be more, sometimes less. How does that matter?

Mixing tabs and spaces will wreck you, of course, but everybody knows not
to do that.
Eclipse insists on indenting Java code anyway (and on breaking lines
in weird places, but that's another story)

We add a 0 to the end of the default line lengths, so it effectively never
breaks lines. We have wide monitors, and if a line needs breaking, a
programmer will break it.
so using tabs doesn't really offer any convenience.

True.

tom
 
S

Stefan Ram

Leif Roar Moldskred said:
An "/* Empty */" is more akin to a "[sic]" than a full stop,
though. How do you feel about "[sic]"?

»[sic!]« helps when used with wrong (and possibly
unintended) spellings/uses/words:

»He wrote "I used this diagrama[sic!] to obtain the values".«
»He wrote "I rode[sic!] this diagram to obtain the values".«

But it would not make sense with correct
spellings/uses/words, even when somewhat unexpected:

»( (...)
the voice of your eyes is deeper than all roses)
nobody, not even the rain, has such small hands«

(E. E. Cummings)

Addings any »[sic!]«s there would hurt the text.

Also, »[sic!]« is never added by the original author of a
text, but used by others, when quoting text of someone else.
 
L

Lew

Stefan said:
     NB: [sic] THE PERIOD AT THE END OF THE PREVIOUS SENTENCE WAS
     WRITTEN INTENTIONALLY TO MARK THE END OF THE SENTENCE.
          SEE:http://en.wikipedia.org/wiki/Full_stop
An "/* Empty */" is more akin to a "[sic]" than a full stop,
though. How do you feel about "[sic]"?

Similarly, you don't normally need to initialize member variables to
zero-like ('null', 'false', 0, 0L, etc.) values, and in fact will
incur a tiny performance hit if you do, but if and when you do it
emphasizes by way of literate programming that the initial value has
semantic significance beyond "I'm not yet used."

Thus:

public class Foo
{
private Bar bar; // just a normal let-the-system-initialize-
it
private Qux qux = null; // emphasize that algorithm depends on
initial 'null' state
private boolean condn = false; // emphasize that 'condn' starts
out 'false'
...
}

Both of the indicated non-default initializations to the default value
will result in the member variable being assigned the default initial
value twice, once by default and the second time by the explicit
initialization.
 
A

Arne Vajhøj

'Wrong' how? Whatever the tab width, things indented with the same
number of tabs will line up, and that's what matters. Sometimes that
indentation might be more, sometimes less. How does that matter?

Try to get it to work with labels and with stuff that require multiple
column style alignments. Not possible.
Mixing tabs and spaces will wreck you, of course, but everybody knows
not to do that.

Not everyone.

Arne
 
C

ClassCastException

Tom Anderson said:
Seriously? We use nothing but where i work. In java and XML, at least -
for some reason, in JSP (well, HTML, really) we tend use two spaces. I
think because JSP/HTML tends to have much more deeply nested
constructs.

I think our use of tabs must stem from the fact that Eclipse, by
default, uses tabs for indentation. They must have a reason to do that.

tom

The trouble with tabs is that everyone has a different idea of what the
spacing should be, and they rarely bother to document it because it's so
obviously the one true way. This is manageable in a smallish office
environment but for distributed projects it turns into a complete mess.
I've seen source files where different sections assumed different tab
widths, so that no matter how you configured your editor some parts of
it would still look wrong. So some projects ban the use of tabs
completely [1].

Nah. The trouble with tabs is that tabs very obviously should always be
exactly 8 spaces, but 8 is way too deep an indent. ;-)
 
J

James Dow Allen

... some compilers will warn about unused local
variables, which is personally a bit annoying.

I once used a "language" where some unused variables, believe it or not,
caused Fatal Errors!! I even complained about it on Usenet in
Message <[email protected]> :

http://groups.google.com/group/comp.unix.wizards/msg/3774732a57ef112d?
hl=en&dmode=source

(People e-mailed me in response to that post, saying they'd printed
it and taped it to the wall!)

I prefer
while (*p++ = *q++) {
}

James Dow Allen
 
M

Malcolm McLean

but, there are many other cases where one will end up using loops like this.
for(i=0;i<N;i++)
/* do work here */

or

for(i=0;array != sentinel;i++)
/* do work here */

is my preferred method, because it doesn't corrupt the pointers. In
the olden days indexing used to be a little bit slower because the
compiler would create an extra variable and offset calculation.
However that's no longer much of a concern.
 
D

Donkey Hottie

Why? That's ridiculous.

The point of having no {} is that you realise that all the *serious*
stuff is going on inside the while loop statement itself.

I prefer

while (*p++ = *q++)
;

The empty statement where the single statement always goes. Can not be
seen as a typo.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top