multi-line Strings

Arne Vajhøj · Dec 18, 2012

Off-the-top-of-my-head (all classes and method are imaginary):

Regexp alpha = Regexp.fromList(java.lang.text.portable.Alphas);
alpha = alpha.or('_');
Regexp num = Regexp.fromList(java.lang.text.portable.Digits);
Regexp alphanum = alpha.or(num);
Regexp identifier = alpha.followedBy(alphanum.repeated());

I think that is what is widely known in the .NET world
as a fluent API.

Arne

Arved Sandstrom · Dec 19, 2012

On 12/18/2012 6:19 PM, Arved Sandstrom wrote: [ SNIP ]

Let's take another almost absurd example, the volume of a sphere. Apart
fro providing a named constant for pi, who is going to do that for the 4
or the two 3's? Well, from the sounds of it Peter would, but who else?

Point being, for any number of algorithms the numbers are _not_ magic.
They are documented formulae. If it is known what the formula is that is
being expressed in code (including my simplistic "doubleInteger"
method), it serves no purpose to supply named constants for the numeric
literals.

Click to expand...

It do happen that values are obvious in a given context and will never
have to be changed.

My experience though is that "obvious" varies quite a bit between
people, geographic locations, time etc..

So if there is just the slightest doubt, then I suggest using a
named constant.

The cost is very small and maybe it will benefit the maintenance
programmer sitting with the code in 15 years.

Arne

Your suggestions - and indeed this is part of my overall argument - rely
on the ability to supply a *good* named constant. I think we all of us
would agree that that is important, just as it is for variable and
method names.

What I am saying is that I believe there are plenty of situations where
that is difficult (or nearly impossible) to do. And a lousy name for a
named constant is worse than using the literal.

I'm actually a pretty reasonable guy. I pull strings out into XML config
files and .properties files, and I've been known to use named constants
for numeric literals too. I just don't do it when it makes no sense.

Nice point btw about being careful about changing the value of a named
constant willy-nilly w/o inspecting all the usage sites, as it may break
business rules.

AHS

BGB · Dec 19, 2012

I think that is what is widely known in the .NET world
as a fluent API.

better term maybe than "big pile o' nasty...".

yes, regex syntax could be nicer, but probably not by making it into a
big pile of API calls.

maybe something more EBNF-like can be used, like say:
SyntaxPattern pat = new SyntaxPattern(
"alpha = ('A'-'Z') | ('a'-'z');"
"alpha2 = alpha | '_';",
"hexalpha = ('A'-'F') | ('a'-'f');"
"num = ('0'-'9');",
"hexnum = num | hexalpha;",
"alphanum = alpha2 | num;",
"basenumber = num+;",
"realnumber = basenumber '.' basenumber ['e' basenumber ];",
"hexnumber = '0x' hexnum+;",
"integer = basenumber | hexnumber;",
"identifier = alpha2 alphanum*;",
...);

StringReader strr = new StringReader("foo 999 bar69");
String tok;
....
if(pat.match(strr, "identifier"))
{
tok=pat.readNext(strr, "identifier");
...
}

or:
tok=pat.tryMatchRead(strr, "identifier");
if(tok!=null)
{
...
}

or:
SyntaxParser parse = new SyntaxParser(strr, pat);
tok=parse.tryMatchRead("identifier");
if(tok!=null)
{
...
}
tok=parse.tryMatchRead("integer");
if(tok!=null)
{
...
}

granted, yes, all this is probably something a bit different than using
regexes, but oh well.

or something...

Gene Wirchenko · Dec 19, 2012

Pete, I know where you're coming from, but Eric more accurately captured
the scenarios I was thinking of.

Click to expand...

His examples are exceptions to the rule, not demonstrations of a good rule.

Another example is the old-style and somewhat language-agnostic pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Click to expand...

Writing code like that is silly. You should use a proper "round()" method
that takes as input the number of decimal places to use.

Click to expand...

Why? You are just going to have to raise 10 to the power of 2.
Not surprisingly, that is 100.

In either case, arguably the number of decimal places to round by should be
declared itself as a constant. Writing "100" by itself tells the reader of
the code nothing about why one is rounding to two decimal places, versus
some other number. This is true whether you use a more descriptive, more
functional "round()" method or go with the "*100, /100" approach.

Click to expand...

It is a common idiom. Outside computing, multiplying by 100 by
moving the decimal point two places to the left is understood by many,
including elementary school children.

I did not get this right. It should be right and "right". Move
the decimal point two places to the right. It is still true about the
elementary school children.

Sincerely,

Gene Wirchenko

Lew · Dec 19, 2012

Arved said:
Nice point btw about being careful about changing the value of a named
constant willy-nilly w/o inspecting all the usage sites, as it may break
business rules.

That would depend on whether the clients recompile after the change.

It is quite conceivable to have a scenario where some artifacts still contain the
old value of the constant while others the new.

Arne Vajhøj · Dec 19, 2012

better term maybe than "big pile o' nasty...".

yes, regex syntax could be nicer, but probably not by making it into a
big pile of API calls.

It is very popular in some .NET circles.

Surprisingly the Wikipedia article
http://en.wikipedia.org/wiki/Fluent_interface
does not even cover .NET.

Arne

Arved Sandstrom · Dec 19, 2012

None taken

I think the point that I was trying to get across is not about a "better
syntax" for regular expressions, but about moving that kind of logic out of
syntax altogether and into programmer-domian objects and code.

So -- for me -- I don't /care/ whether it is less readable than the traditional
ex/awk/sed-style regexp syntax. Naturally, I'd prefer the code to be
reasonably readable, and I'll certaily concede that the example code I came up
with is none too expressive. But I'm willing to put up with that given the
limitations of Java.

(Though maybe the sort of approach to designing DSLs in terms of "plain Java"
shown in jMock, in particular the paper:
Evolving an Embedded Domain-Specific Language in Java
Steve Freeman, Nat Pryce
http://jmock.org/articles.html
could be adapted to improve the feel of the objects-only approach that I'm
advocating. Hmm...)

As far as syntax goes, things that work for me are Linq (most any
flavour) or Scala Squeryl. Nothing's perfect, but Linq for XML or
Squeryl achieve a nice mix of readability, conciseness and safety.
That's just me - I know folks that can't wrap their heads around
syntaxes like that.

I'll be honest - I have little interest in trying to improve Java along
these lines anymore. I like Java for GP programming, but for certain
things I'll solve the problem in another JVM language and call that code
from Java.

Largely agreed. (Actually agreed entirely with the exception of the
"fragmentation" thing.)

Well, that's an entire other debate, fragmentation caused by OOP.

My contrasting view is that regular language grammars and parsers can be
expressed directly in object language, which gives all sorts of advantages. So
they /ought/ to be Java (but in the current state of affairs) are not.

Oh well...

-- chirs

AHS

Arne Vajhøj · Dec 19, 2012

This is all sound and very sensible.

But... I'm amused to note that these conditions also apply to identifiers used
in the code too. Say I have a class name MyClass, and methods
MyClass.myMethod(). It is obviously possible for the name "MyClass" to change,
similarly for methods (in fact -- for me -- class and method name changes are
the rule rather than the exception). So, in the spirit of this discussion,
shouldn't I put the class and method names into a config file somewhere (or
"meta-config") file, and only refer to the class and method via some sort of
access to the config data.

<class:122> something = new <class:122>();
something.<method:382365>(true);

;-)

Funny.

It would not make it more readable. It would make
it less readable.

And changing the name is not the same problem as
changing a literal, because an inconsistent change
will result in a compiler error.

Arne

Arne VajhÃ¸j · Dec 19, 2012

I like to add "for the same reason" to that check, just as a general
warning against "factor cramming" (that is, when you cram together two
factors of the program into one piece of code beacuse the factors are
superficially similar even though they are different semantically or
in nature.)

Yes.

It has to same value by definition not just same value
in current case.

Arne

Arne Vajhøj · Dec 19, 2012

[...]
If you are multiplying by a constant, there's a reason. Often, for
example,
you are converting units (hours per day, days per week, etc., following
your "date conversions" theme). The conversion itself is the correct
name
(e.g. "hoursPerDay", "daysPerWeek", etc.) in those examples. Similar
logic
can be applied to other values.

public static final int MILLIMETERS_PER_METER = 1000;
public static final int MILLIGRAMS_PER_GRAM = 1000;
public static final int MILLIAMPERES_PER_AMPERE = 1000;
public static final int MILLISECONDS_PER_SECOND = 1000;

Great aids to understanding, I'm sure. (And stop calling me Millie!)

Click to expand...

I guess your intent is to be sarcastic. But personally, I think using a
named constant does in fact make it clearer.

I would not be a stickler for making those named constants, and of course
have used literal values like those in code without naming them. But
using
them as a named constant would in fact make the code clearer.

And of course, constants such as 100 or 400 (such as Arved mentioned) are
even less obvious.

As I said, there may be exceptions. SI units are notoriously easy to
convert, so as long as your variables and methods are well-named (i.e.
are
clear about the units they represent), a named constant may not be called
for when converting. So, fine...if you like, there's one of your
exceptions.

That doesn't take way from the more general validity of my comments.

Pete

Click to expand...

Pete, I know where you're coming from, but Eric more accurately captured
the scenarios I was thinking of. Another example is the old-style and
somewhat language-agnostic pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Really?

FACTOR_FOR_ROUNDING_TO_TWO_DECIMALS seems as a name to me.

If it was named FACTOR_FOR_ROUNDING it could even be changed to
accommodate other roundings that to 0.01.

I would never use the first.

I am not even sure that I would use the last, but the last
is not that bad.

Arne

Arne Vajhøj · Dec 19, 2012

Pete, I know where you're coming from, but Eric more accurately captured
the scenarios I was thinking of.

Click to expand...

His examples are exceptions to the rule, not demonstrations of a good rule.

Another example is the old-style and somewhat language-agnostic pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Click to expand...

Writing code like that is silly. You should use a proper "round()" method
that takes as input the number of decimal places to use.

That only moves the problem.

But sometimes moving it brings it close to a method signature that
will document.

But it is not making the problem go away.

Arne

Arne Vajhøj · Dec 19, 2012

As for the 100 or 400, I was thinking for example of the Gregorian day
to Julian day number formula
(http://en.wikipedia.org/wiki/Julian..._Gregorian_calendar_date_to_Julian_Day_Number).
You can see how many constants are involved here, including 100 and 400.

I would use named constants for precisely ZERO (0) of these numbers.
Mainly because the Java or Perl or C# version of the method for doing
this conversion will have a descriptive name, I will have it commented
and include a link to a decent explanation perhaps.

Comments should be considered 2nd best to having it in the
code itself.

Arne

Arne Vajhøj · Dec 19, 2012

Pete, I know where you're coming from, but Eric more accurately captured
the scenarios I was thinking of.

Click to expand...

His examples are exceptions to the rule, not demonstrations of a good
rule.

Another example is the old-style and somewhat language-agnostic
pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Click to expand...

Writing code like that is silly. You should use a proper "round()" method
that takes as input the number of decimal places to use.

Click to expand...

As "silly" as it may be, it's a known technique for rounding to a
specified increment - see
http://en.wikipedia.org/wiki/Rounding#Rounding_to_a_specified_increment.

I didn't trot out the technique as an endorsement, it's an example.

Forget how silly the technique is, give me a great constant name that
replaces "100." in that formula.

In either case, arguably the number of decimal places to round by
should be
declared itself as a constant. Writing "100" by itself tells the
reader of
the code nothing about why one is rounding to two decimal places, versus
some other number. This is true whether you use a more descriptive, more
functional "round()" method or go with the "*100, /100" approach.

Click to expand...

Here's a thought - maybe _why_ you are rounding to 2 decimal places is
because you want to round to 2 decimal places. No reasonable named
constant is going to tell you that business requirement #17 required
that the rounding be the same as an existing Excel table from
spreadsheet such-and-such.

I didn't just pull that out of a hat either - I've had to do exactly
that the past few months. If you can come up with a named constant that
doesn't look like a paragraph that can express motives like that, please
trot out an example.

FACTOR_FOR_ROUNDING and document the business rule where the constant
is would be *a* way of doing it.

Arne

Arne Vajhøj · Dec 19, 2012

But for all various string syntaxes that Perl supports, it's still
missing a sane multiline string syntax.

Click to expand...

Does it?
[...]
$str=<<MULTI;
a line
another line
yet another line
one more line
MULTI

Click to expand...

Note that I wrote "sane". Here documents aren't sane. They cannot be
indented with the rest of the code,

If people want a multi-line as-is syntax, then that is a requirement
not a problem.

Arne

Arved Sandstrom · Dec 19, 2012

On Mon, 17 Dec 2012 20:25:56 -0500, Eric Sosman wrote:

[...]
If you are multiplying by a constant, there's a reason. Often, for
example,
you are converting units (hours per day, days per week, etc.,
following
your "date conversions" theme). The conversion itself is the correct
name
(e.g. "hoursPerDay", "daysPerWeek", etc.) in those examples. Similar
logic
can be applied to other values.

public static final int MILLIMETERS_PER_METER = 1000;
public static final int MILLIGRAMS_PER_GRAM = 1000;
public static final int MILLIAMPERES_PER_AMPERE = 1000;
public static final int MILLISECONDS_PER_SECOND = 1000;

Great aids to understanding, I'm sure. (And stop calling me Millie!)

I guess your intent is to be sarcastic. But personally, I think using a
named constant does in fact make it clearer.

I would not be a stickler for making those named constants, and of
course
have used literal values like those in code without naming them. But
using
them as a named constant would in fact make the code clearer.

And of course, constants such as 100 or 400 (such as Arved mentioned)
are
even less obvious.

As I said, there may be exceptions. SI units are notoriously easy to
convert, so as long as your variables and methods are well-named (i.e.
are
clear about the units they represent), a named constant may not be
called
for when converting. So, fine...if you like, there's one of your
exceptions.

That doesn't take way from the more general validity of my comments.

Pete

Click to expand...

Pete, I know where you're coming from, but Eric more accurately captured
the scenarios I was thinking of. Another example is the old-style and
somewhat language-agnostic pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Click to expand...

Really?

FACTOR_FOR_ROUNDING_TO_TWO_DECIMALS seems as a name to me.

If it was named FACTOR_FOR_ROUNDING it could even be changed to
accommodate other roundings that to 0.01.

I would never use the first.

I am not even sure that I would use the last, but the last
is not that bad.

Arne

If you absolutely had to have a name I suppose that's about as good as
it gets. But assuming that you're using this technique you might in one
spot want to round to 2 places, in another to 1 or 4 (not far-fetched
actually), if you *are* using named constants I think you'd have to go
with your first suggestion.

AHS

Arne Vajhøj · Dec 19, 2012

On 12/17/2012 10:13 PM, Peter Duniho wrote:
On Mon, 17 Dec 2012 20:25:56 -0500, Eric Sosman wrote:

[...]
If you are multiplying by a constant, there's a reason. Often, for
example,
you are converting units (hours per day, days per week, etc.,
following
your "date conversions" theme). The conversion itself is the correct
name
(e.g. "hoursPerDay", "daysPerWeek", etc.) in those examples. Similar
logic
can be applied to other values.

public static final int MILLIMETERS_PER_METER = 1000;
public static final int MILLIGRAMS_PER_GRAM = 1000;
public static final int MILLIAMPERES_PER_AMPERE = 1000;
public static final int MILLISECONDS_PER_SECOND = 1000;

Great aids to understanding, I'm sure. (And stop calling me Millie!)

I guess your intent is to be sarcastic. But personally, I think using a
named constant does in fact make it clearer.

I would not be a stickler for making those named constants, and of
course
have used literal values like those in code without naming them. But
using
them as a named constant would in fact make the code clearer.

And of course, constants such as 100 or 400 (such as Arved mentioned)
are
even less obvious.

As I said, there may be exceptions. SI units are notoriously easy to
convert, so as long as your variables and methods are well-named (i.e.
are
clear about the units they represent), a named constant may not be
called
for when converting. So, fine...if you like, there's one of your
exceptions.

That doesn't take way from the more general validity of my comments.

Pete

Pete, I know where you're coming from, but Eric more accurately captured
the scenarios I was thinking of. Another example is the old-style and
somewhat language-agnostic pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Click to expand...

Really?

FACTOR_FOR_ROUNDING_TO_TWO_DECIMALS seems as a name to me.

If it was named FACTOR_FOR_ROUNDING it could even be changed to
accommodate other roundings that to 0.01.

I would never use the first.

I am not even sure that I would use the last, but the last
is not that bad.

Click to expand...

If you absolutely had to have a name I suppose that's about as good as
it gets. But assuming that you're using this technique you might in one
spot want to round to 2 places, in another to 1 or 4 (not far-fetched
actually), if you *are* using named constants I think you'd have to go
with your first suggestion.

If one need to do multiple roundings, then multiple constants would
be needed.

The first is still not good.

The extended name should be related to the purpose not the factor
to handle the case where a rounding is changed for some calculations.

FACTOR_TO_ROUND_DOLLAR_AMOUNT
FACTOR_TO_ROUND_DISTANCE_MEASUREMENT
etc.

Arne

Lew · Dec 20, 2012

Arne said:
The extended name should be related to the purpose not the factor

This is best practice, and more consistently followed in proportion to experience in
the field, I assess.

to handle the case where a rounding is changed for some calculations.

FACTOR_TO_ROUND_DOLLAR_AMOUNT
FACTOR_TO_ROUND_DISTANCE_MEASUREMENT
etc.

These names serve your pedagogical purpose, but simultaneously serve to
illustrate that there's an element of style involved, not just engineering.

To my eye those names are longer than need be. I cannot and will not claim any
objective validity to that assessment, but I would in a project push for more elegant
names.

I don't necessarily mean abbreviated. Elegance is an ineffable match of form to
function to aesthetic expression. Sometimes terse, sometimes verbose, elegance
adapts to the circumstance and objectives of the moment.

My preference in this instance for the proffered scenarios would be more like

DOLLAR_ROUNDER
DISTANCE_ROUNDER

My reasoning involves degree of information conveyed relative to length andsubitizability
of morpheme count.

But my action is based on a more intuitive sense, arguably one that subsumes the
rationalized basis.

Arved Sandstrom · Dec 20, 2012

On 12/18/2012 5:34 AM, Arved Sandstrom wrote:
On 12/17/2012 10:13 PM, Peter Duniho wrote:
On Mon, 17 Dec 2012 20:25:56 -0500, Eric Sosman wrote:

[...]
If you are multiplying by a constant, there's a reason. Often, for
example,
you are converting units (hours per day, days per week, etc.,
following
your "date conversions" theme). The conversion itself is the correct
name
(e.g. "hoursPerDay", "daysPerWeek", etc.) in those examples. Similar
logic
can be applied to other values.

public static final int MILLIMETERS_PER_METER = 1000;
public static final int MILLIGRAMS_PER_GRAM = 1000;
public static final int MILLIAMPERES_PER_AMPERE = 1000;
public static final int MILLISECONDS_PER_SECOND = 1000;

Great aids to understanding, I'm sure. (And stop calling me Millie!)

I guess your intent is to be sarcastic. But personally, I think
using a
named constant does in fact make it clearer.

I would not be a stickler for making those named constants, and of
course
have used literal values like those in code without naming them. But
using
them as a named constant would in fact make the code clearer.

And of course, constants such as 100 or 400 (such as Arved mentioned)
are
even less obvious.

As I said, there may be exceptions. SI units are notoriously easy to
convert, so as long as your variables and methods are well-named (i.e.
are
clear about the units they represent), a named constant may not be
called
for when converting. So, fine...if you like, there's one of your
exceptions.

That doesn't take way from the more general validity of my comments.

Pete

Pete, I know where you're coming from, but Eric more accurately
captured
the scenarios I was thinking of. Another example is the old-style and
somewhat language-agnostic pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Really?

FACTOR_FOR_ROUNDING_TO_TWO_DECIMALS seems as a name to me.

If it was named FACTOR_FOR_ROUNDING it could even be changed to
accommodate other roundings that to 0.01.

I would never use the first.

I am not even sure that I would use the last, but the last
is not that bad.

Click to expand...

If you absolutely had to have a name I suppose that's about as good as
it gets. But assuming that you're using this technique you might in one
spot want to round to 2 places, in another to 1 or 4 (not far-fetched
actually), if you *are* using named constants I think you'd have to go
with your first suggestion.

Click to expand...

If one need to do multiple roundings, then multiple constants would
be needed.

The first is still not good.

The extended name should be related to the purpose not the factor
to handle the case where a rounding is changed for some calculations.

FACTOR_TO_ROUND_DOLLAR_AMOUNT
FACTOR_TO_ROUND_DISTANCE_MEASUREMENT
etc.

Arne

One of the points I am trying to get across, Arne, is that (1) you may
not know any of that, or (2) there may be no higher purpose than an
arbitrary choice.

As far as #2 goes, that was sort of my Excel spreadsheet scenario. This
is from real modernization and integration work I'm doing - in various
reports and spreadsheets that we are looking to replace, percentages may
be displayed let's say to one or two places. There is no reason given
for this choice, the raw numbers involved would support more places
actually. Since these percentages are used in various Excel tables and
charts for executive reporting, probably the *only* reason years and
years ago is that so many decimal places looked "good".

This is not an unusual scenario: display rounding of numbers with no
natural "right" number of decimal places. You can, say, go to 4 or 5
places by significant digits. But the table or chart gets too busy,
people object, so you round to 1 or 2 decimal places - for the decent
reason of effective presentation of information.

AHS

Gene Wirchenko · Dec 20, 2012

[snip]

FACTOR_FOR_ROUNDING_TO_TWO_DECIMALS seems as a name to me.

If it was named FACTOR_FOR_ROUNDING it could even be changed to
accommodate other roundings that to 0.01.

I would never use the first.

I am not even sure that I would use the last, but the last
is not that bad.

Both are rather long. Get a few of them in a statement, and you
are probably looking at a second line (assuming an 80 or so character
limit).

Names that are long can bloat statements horribly.

Sincerely,

Gene Wirchenko

BGB · Dec 21, 2012

It is very popular in some .NET circles.

Surprisingly the Wikipedia article
http://en.wikipedia.org/wiki/Fluent_interface
does not even cover .NET.

fair enough...

I have sometimes gone the other way though, finding a dedicated textual
representation to be more compact and easier to work with than an API,
but either way.

I guess it depends though...

RNGs: A double KISS	10	Apr 14, 2010
my fingerprint identification project , plz help	2	Apr 9, 2007
numpy help	2	Nov 3, 2006
Fourier transforms (coefficient calculation)...	2	Oct 2, 2003
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.vhdl FAQ part 1 of 4: general	0	Jul 8, 2003

multi-line Strings

Arne Vajhøj

Arved Sandstrom

BGB

Gene Wirchenko

Lew

Arne Vajhøj

Arved Sandstrom

Arne Vajhøj

Arne VajhÃ¸j

Arne Vajhøj

Arne Vajhøj

Arne Vajhøj

Arne Vajhøj

Arne Vajhøj

Arved Sandstrom

Arne Vajhøj

Lew

Arved Sandstrom

Gene Wirchenko

BGB

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads