multi-line Strings

A

Arne Vajhøj

Off-the-top-of-my-head (all classes and method are imaginary):

Regexp alpha = Regexp.fromList(java.lang.text.portable.Alphas);
alpha = alpha.or('_');
Regexp num = Regexp.fromList(java.lang.text.portable.Digits);
Regexp alphanum = alpha.or(num);
Regexp identifier = alpha.followedBy(alphanum.repeated());

I think that is what is widely known in the .NET world
as a fluent API.

Arne
 
A

Arved Sandstrom

On 12/18/2012 6:19 PM, Arved Sandstrom wrote: [ SNIP ]
Let's take another almost absurd example, the volume of a sphere. Apart
fro providing a named constant for pi, who is going to do that for the 4
or the two 3's? Well, from the sounds of it Peter would, but who else?

Point being, for any number of algorithms the numbers are _not_ magic.
They are documented formulae. If it is known what the formula is that is
being expressed in code (including my simplistic "doubleInteger"
method), it serves no purpose to supply named constants for the numeric
literals.

It do happen that values are obvious in a given context and will never
have to be changed.

My experience though is that "obvious" varies quite a bit between
people, geographic locations, time etc..

So if there is just the slightest doubt, then I suggest using a
named constant.

The cost is very small and maybe it will benefit the maintenance
programmer sitting with the code in 15 years.

Arne
Your suggestions - and indeed this is part of my overall argument - rely
on the ability to supply a *good* named constant. I think we all of us
would agree that that is important, just as it is for variable and
method names.

What I am saying is that I believe there are plenty of situations where
that is difficult (or nearly impossible) to do. And a lousy name for a
named constant is worse than using the literal.

I'm actually a pretty reasonable guy. I pull strings out into XML config
files and .properties files, and I've been known to use named constants
for numeric literals too. I just don't do it when it makes no sense.

Nice point btw about being careful about changing the value of a named
constant willy-nilly w/o inspecting all the usage sites, as it may break
business rules.

AHS
 
B

BGB

I think that is what is widely known in the .NET world
as a fluent API.

better term maybe than "big pile o' nasty...".

yes, regex syntax could be nicer, but probably not by making it into a
big pile of API calls.


maybe something more EBNF-like can be used, like say:
SyntaxPattern pat = new SyntaxPattern(
"alpha = ('A'-'Z') | ('a'-'z');"
"alpha2 = alpha | '_';",
"hexalpha = ('A'-'F') | ('a'-'f');"
"num = ('0'-'9');",
"hexnum = num | hexalpha;",
"alphanum = alpha2 | num;",
"basenumber = num+;",
"realnumber = basenumber '.' basenumber ['e' basenumber ];",
"hexnumber = '0x' hexnum+;",
"integer = basenumber | hexnumber;",
"identifier = alpha2 alphanum*;",
...);

StringReader strr = new StringReader("foo 999 bar69");
String tok;
....
if(pat.match(strr, "identifier"))
{
tok=pat.readNext(strr, "identifier");
...
}

or:
tok=pat.tryMatchRead(strr, "identifier");
if(tok!=null)
{
...
}

or:
SyntaxParser parse = new SyntaxParser(strr, pat);
tok=parse.tryMatchRead("identifier");
if(tok!=null)
{
...
}
tok=parse.tryMatchRead("integer");
if(tok!=null)
{
...
}


granted, yes, all this is probably something a bit different than using
regexes, but oh well.


or something...
 
G

Gene Wirchenko

Pete, I know where you're coming from, but Eric more accurately captured
the scenarios I was thinking of.

His examples are exceptions to the rule, not demonstrations of a good rule.
Another example is the old-style and somewhat language-agnostic pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Writing code like that is silly. You should use a proper "round()" method
that takes as input the number of decimal places to use.

Why? You are just going to have to raise 10 to the power of 2.
Not surprisingly, that is 100.
In either case, arguably the number of decimal places to round by should be
declared itself as a constant. Writing "100" by itself tells the reader of
the code nothing about why one is rounding to two decimal places, versus
some other number. This is true whether you use a more descriptive, more
functional "round()" method or go with the "*100, /100" approach.

It is a common idiom. Outside computing, multiplying by 100 by
moving the decimal point two places to the left is understood by many,
including elementary school children.

I did not get this right. It should be right and "right". Move
the decimal point two places to the right. It is still true about the
elementary school children.

Sincerely,

Gene Wirchenko
 
L

Lew

Arved said:
Nice point btw about being careful about changing the value of a named
constant willy-nilly w/o inspecting all the usage sites, as it may break
business rules.

That would depend on whether the clients recompile after the change.

It is quite conceivable to have a scenario where some artifacts still contain the
old value of the constant while others the new.
 
A

Arved Sandstrom

None taken


I think the point that I was trying to get across is not about a "better
syntax" for regular expressions, but about moving that kind of logic out of
syntax altogether and into programmer-domian objects and code.

So -- for me -- I don't /care/ whether it is less readable than the traditional
ex/awk/sed-style regexp syntax. Naturally, I'd prefer the code to be
reasonably readable, and I'll certaily concede that the example code I came up
with is none too expressive. But I'm willing to put up with that given the
limitations of Java.

(Though maybe the sort of approach to designing DSLs in terms of "plain Java"
shown in jMock, in particular the paper:
Evolving an Embedded Domain-Specific Language in Java
Steve Freeman, Nat Pryce
http://jmock.org/articles.html
could be adapted to improve the feel of the objects-only approach that I'm
advocating. Hmm...)

As far as syntax goes, things that work for me are Linq (most any
flavour) or Scala Squeryl. Nothing's perfect, but Linq for XML or
Squeryl achieve a nice mix of readability, conciseness and safety.
That's just me - I know folks that can't wrap their heads around
syntaxes like that.

I'll be honest - I have little interest in trying to improve Java along
these lines anymore. I like Java for GP programming, but for certain
things I'll solve the problem in another JVM language and call that code
from Java.
Largely agreed. (Actually agreed entirely with the exception of the
"fragmentation" thing.)

Well, that's an entire other debate, fragmentation caused by OOP. :)
My contrasting view is that regular language grammars and parsers can be
expressed directly in object language, which gives all sorts of advantages. So
they /ought/ to be Java (but in the current state of affairs) are not.

Oh well...

-- chirs
AHS
 
A

Arne Vajhøj

This is all sound and very sensible.

But... I'm amused to note that these conditions also apply to identifiers used
in the code too. Say I have a class name MyClass, and methods
MyClass.myMethod(). It is obviously possible for the name "MyClass" to change,
similarly for methods (in fact -- for me -- class and method name changes are
the rule rather than the exception). So, in the spirit of this discussion,
shouldn't I put the class and method names into a config file somewhere (or
"meta-config") file, and only refer to the class and method via some sort of
access to the config data.

<class:122> something = new <class:122>();
something.<method:382365>(true);

;-)

Funny.

It would not make it more readable. It would make
it less readable.

And changing the name is not the same problem as
changing a literal, because an inconsistent change
will result in a compiler error.

Arne
 
A

Arne Vajhøj

I like to add "for the same reason" to that check, just as a general
warning against "factor cramming" (that is, when you cram together two
factors of the program into one piece of code beacuse the factors are
superficially similar even though they are different semantically or
in nature.)

Yes.

It has to same value by definition not just same value
in current case.

Arne
 
A

Arne Vajhøj

[...]
If you are multiplying by a constant, there's a reason. Often, for
example,
you are converting units (hours per day, days per week, etc., following
your "date conversions" theme). The conversion itself is the correct
name
(e.g. "hoursPerDay", "daysPerWeek", etc.) in those examples. Similar
logic
can be applied to other values.

public static final int MILLIMETERS_PER_METER = 1000;
public static final int MILLIGRAMS_PER_GRAM = 1000;
public static final int MILLIAMPERES_PER_AMPERE = 1000;
public static final int MILLISECONDS_PER_SECOND = 1000;

Great aids to understanding, I'm sure. (And stop calling me Millie!)

I guess your intent is to be sarcastic. But personally, I think using a
named constant does in fact make it clearer.

I would not be a stickler for making those named constants, and of course
have used literal values like those in code without naming them. But
using
them as a named constant would in fact make the code clearer.

And of course, constants such as 100 or 400 (such as Arved mentioned) are
even less obvious.

As I said, there may be exceptions. SI units are notoriously easy to
convert, so as long as your variables and methods are well-named (i.e.
are
clear about the units they represent), a named constant may not be called
for when converting. So, fine...if you like, there's one of your
exceptions.

That doesn't take way from the more general validity of my comments.

Pete
Pete, I know where you're coming from, but Eric more accurately captured
the scenarios I was thinking of. Another example is the old-style and
somewhat language-agnostic pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Really?

FACTOR_FOR_ROUNDING_TO_TWO_DECIMALS seems as a name to me.

If it was named FACTOR_FOR_ROUNDING it could even be changed to
accommodate other roundings that to 0.01.

I would never use the first.

I am not even sure that I would use the last, but the last
is not that bad.

Arne
 
A

Arne Vajhøj

Pete, I know where you're coming from, but Eric more accurately captured
the scenarios I was thinking of.

His examples are exceptions to the rule, not demonstrations of a good rule.
Another example is the old-style and somewhat language-agnostic pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Writing code like that is silly. You should use a proper "round()" method
that takes as input the number of decimal places to use.

That only moves the problem.

But sometimes moving it brings it close to a method signature that
will document.

But it is not making the problem go away.

Arne
 
A

Arne Vajhøj

As for the 100 or 400, I was thinking for example of the Gregorian day
to Julian day number formula
(http://en.wikipedia.org/wiki/Julian..._Gregorian_calendar_date_to_Julian_Day_Number).
You can see how many constants are involved here, including 100 and 400.

I would use named constants for precisely ZERO (0) of these numbers.
Mainly because the Java or Perl or C# version of the method for doing
this conversion will have a descriptive name, I will have it commented
and include a link to a decent explanation perhaps.

Comments should be considered 2nd best to having it in the
code itself.

Arne
 
A

Arne Vajhøj

Pete, I know where you're coming from, but Eric more accurately captured
the scenarios I was thinking of.

His examples are exceptions to the rule, not demonstrations of a good
rule.
Another example is the old-style and somewhat language-agnostic
pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Writing code like that is silly. You should use a proper "round()" method
that takes as input the number of decimal places to use.

As "silly" as it may be, it's a known technique for rounding to a
specified increment - see
http://en.wikipedia.org/wiki/Rounding#Rounding_to_a_specified_increment.

I didn't trot out the technique as an endorsement, it's an example.

Forget how silly the technique is, give me a great constant name that
replaces "100." in that formula.
In either case, arguably the number of decimal places to round by
should be
declared itself as a constant. Writing "100" by itself tells the
reader of
the code nothing about why one is rounding to two decimal places, versus
some other number. This is true whether you use a more descriptive, more
functional "round()" method or go with the "*100, /100" approach.

Here's a thought - maybe _why_ you are rounding to 2 decimal places is
because you want to round to 2 decimal places. No reasonable named
constant is going to tell you that business requirement #17 required
that the rounding be the same as an existing Excel table from
spreadsheet such-and-such.

I didn't just pull that out of a hat either - I've had to do exactly
that the past few months. If you can come up with a named constant that
doesn't look like a paragraph that can express motives like that, please
trot out an example.

FACTOR_FOR_ROUNDING and document the business rule where the constant
is would be *a* way of doing it.

Arne
 
A

Arne Vajhøj

But for all various string syntaxes that Perl supports, it's still
missing a sane multiline string syntax.

Does it?
[...]
$str=<<MULTI;
a line
another line
yet another line
one more line
MULTI

Note that I wrote "sane". Here documents aren't sane. They cannot be
indented with the rest of the code,

If people want a multi-line as-is syntax, then that is a requirement
not a problem.

Arne
 
A

Arved Sandstrom

On Mon, 17 Dec 2012 20:25:56 -0500, Eric Sosman wrote:

[...]
If you are multiplying by a constant, there's a reason. Often, for
example,
you are converting units (hours per day, days per week, etc.,
following
your "date conversions" theme). The conversion itself is the correct
name
(e.g. "hoursPerDay", "daysPerWeek", etc.) in those examples. Similar
logic
can be applied to other values.

public static final int MILLIMETERS_PER_METER = 1000;
public static final int MILLIGRAMS_PER_GRAM = 1000;
public static final int MILLIAMPERES_PER_AMPERE = 1000;
public static final int MILLISECONDS_PER_SECOND = 1000;

Great aids to understanding, I'm sure. (And stop calling me Millie!)

I guess your intent is to be sarcastic. But personally, I think using a
named constant does in fact make it clearer.

I would not be a stickler for making those named constants, and of
course
have used literal values like those in code without naming them. But
using
them as a named constant would in fact make the code clearer.

And of course, constants such as 100 or 400 (such as Arved mentioned)
are
even less obvious.

As I said, there may be exceptions. SI units are notoriously easy to
convert, so as long as your variables and methods are well-named (i.e.
are
clear about the units they represent), a named constant may not be
called
for when converting. So, fine...if you like, there's one of your
exceptions.

That doesn't take way from the more general validity of my comments.

Pete
Pete, I know where you're coming from, but Eric more accurately captured
the scenarios I was thinking of. Another example is the old-style and
somewhat language-agnostic pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Really?

FACTOR_FOR_ROUNDING_TO_TWO_DECIMALS seems as a name to me.

If it was named FACTOR_FOR_ROUNDING it could even be changed to
accommodate other roundings that to 0.01.

I would never use the first.

I am not even sure that I would use the last, but the last
is not that bad.

Arne
If you absolutely had to have a name I suppose that's about as good as
it gets. But assuming that you're using this technique you might in one
spot want to round to 2 places, in another to 1 or 4 (not far-fetched
actually), if you *are* using named constants I think you'd have to go
with your first suggestion.

AHS
 
A

Arne Vajhøj

On 12/17/2012 10:13 PM, Peter Duniho wrote:
On Mon, 17 Dec 2012 20:25:56 -0500, Eric Sosman wrote:

[...]
If you are multiplying by a constant, there's a reason. Often, for
example,
you are converting units (hours per day, days per week, etc.,
following
your "date conversions" theme). The conversion itself is the correct
name
(e.g. "hoursPerDay", "daysPerWeek", etc.) in those examples. Similar
logic
can be applied to other values.

public static final int MILLIMETERS_PER_METER = 1000;
public static final int MILLIGRAMS_PER_GRAM = 1000;
public static final int MILLIAMPERES_PER_AMPERE = 1000;
public static final int MILLISECONDS_PER_SECOND = 1000;

Great aids to understanding, I'm sure. (And stop calling me Millie!)

I guess your intent is to be sarcastic. But personally, I think using a
named constant does in fact make it clearer.

I would not be a stickler for making those named constants, and of
course
have used literal values like those in code without naming them. But
using
them as a named constant would in fact make the code clearer.

And of course, constants such as 100 or 400 (such as Arved mentioned)
are
even less obvious.

As I said, there may be exceptions. SI units are notoriously easy to
convert, so as long as your variables and methods are well-named (i.e.
are
clear about the units they represent), a named constant may not be
called
for when converting. So, fine...if you like, there's one of your
exceptions.

That doesn't take way from the more general validity of my comments.

Pete

Pete, I know where you're coming from, but Eric more accurately captured
the scenarios I was thinking of. Another example is the old-style and
somewhat language-agnostic pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Really?

FACTOR_FOR_ROUNDING_TO_TWO_DECIMALS seems as a name to me.

If it was named FACTOR_FOR_ROUNDING it could even be changed to
accommodate other roundings that to 0.01.

I would never use the first.

I am not even sure that I would use the last, but the last
is not that bad.
If you absolutely had to have a name I suppose that's about as good as
it gets. But assuming that you're using this technique you might in one
spot want to round to 2 places, in another to 1 or 4 (not far-fetched
actually), if you *are* using named constants I think you'd have to go
with your first suggestion.

If one need to do multiple roundings, then multiple constants would
be needed.

The first is still not good.

The extended name should be related to the purpose not the factor
to handle the case where a rounding is changed for some calculations.

FACTOR_TO_ROUND_DOLLAR_AMOUNT
FACTOR_TO_ROUND_DISTANCE_MEASUREMENT
etc.

Arne
 
L

Lew

Arne said:
The extended name should be related to the purpose not the factor

This is best practice, and more consistently followed in proportion to experience in
the field, I assess.
to handle the case where a rounding is changed for some calculations.

FACTOR_TO_ROUND_DOLLAR_AMOUNT
FACTOR_TO_ROUND_DISTANCE_MEASUREMENT
etc.

These names serve your pedagogical purpose, but simultaneously serve to
illustrate that there's an element of style involved, not just engineering.

To my eye those names are longer than need be. I cannot and will not claim any
objective validity to that assessment, but I would in a project push for more elegant
names.

I don't necessarily mean abbreviated. Elegance is an ineffable match of form to
function to aesthetic expression. Sometimes terse, sometimes verbose, elegance
adapts to the circumstance and objectives of the moment.

My preference in this instance for the proffered scenarios would be more like

DOLLAR_ROUNDER
DISTANCE_ROUNDER

My reasoning involves degree of information conveyed relative to length andsubitizability
of morpheme count.

But my action is based on a more intuitive sense, arguably one that subsumes the
rationalized basis.
 
A

Arved Sandstrom

On 12/18/2012 5:34 AM, Arved Sandstrom wrote:
On 12/17/2012 10:13 PM, Peter Duniho wrote:
On Mon, 17 Dec 2012 20:25:56 -0500, Eric Sosman wrote:

[...]
If you are multiplying by a constant, there's a reason. Often, for
example,
you are converting units (hours per day, days per week, etc.,
following
your "date conversions" theme). The conversion itself is the correct
name
(e.g. "hoursPerDay", "daysPerWeek", etc.) in those examples. Similar
logic
can be applied to other values.

public static final int MILLIMETERS_PER_METER = 1000;
public static final int MILLIGRAMS_PER_GRAM = 1000;
public static final int MILLIAMPERES_PER_AMPERE = 1000;
public static final int MILLISECONDS_PER_SECOND = 1000;

Great aids to understanding, I'm sure. (And stop calling me Millie!)

I guess your intent is to be sarcastic. But personally, I think
using a
named constant does in fact make it clearer.

I would not be a stickler for making those named constants, and of
course
have used literal values like those in code without naming them. But
using
them as a named constant would in fact make the code clearer.

And of course, constants such as 100 or 400 (such as Arved mentioned)
are
even less obvious.

As I said, there may be exceptions. SI units are notoriously easy to
convert, so as long as your variables and methods are well-named (i.e.
are
clear about the units they represent), a named constant may not be
called
for when converting. So, fine...if you like, there's one of your
exceptions.

That doesn't take way from the more general validity of my comments.

Pete

Pete, I know where you're coming from, but Eric more accurately
captured
the scenarios I was thinking of. Another example is the old-style and
somewhat language-agnostic pseudocode

round(d*100.)/100.

for rounding to 2 decimal places (substitute other powers of ten for
rounding to less or more decimal places). [*] Here there is no meaning
for those constants other than TEN or a HUNDRED.

Really?

FACTOR_FOR_ROUNDING_TO_TWO_DECIMALS seems as a name to me.

If it was named FACTOR_FOR_ROUNDING it could even be changed to
accommodate other roundings that to 0.01.

I would never use the first.

I am not even sure that I would use the last, but the last
is not that bad.
If you absolutely had to have a name I suppose that's about as good as
it gets. But assuming that you're using this technique you might in one
spot want to round to 2 places, in another to 1 or 4 (not far-fetched
actually), if you *are* using named constants I think you'd have to go
with your first suggestion.

If one need to do multiple roundings, then multiple constants would
be needed.

The first is still not good.

The extended name should be related to the purpose not the factor
to handle the case where a rounding is changed for some calculations.

FACTOR_TO_ROUND_DOLLAR_AMOUNT
FACTOR_TO_ROUND_DISTANCE_MEASUREMENT
etc.

Arne
One of the points I am trying to get across, Arne, is that (1) you may
not know any of that, or (2) there may be no higher purpose than an
arbitrary choice.

As far as #2 goes, that was sort of my Excel spreadsheet scenario. This
is from real modernization and integration work I'm doing - in various
reports and spreadsheets that we are looking to replace, percentages may
be displayed let's say to one or two places. There is no reason given
for this choice, the raw numbers involved would support more places
actually. Since these percentages are used in various Excel tables and
charts for executive reporting, probably the *only* reason years and
years ago is that so many decimal places looked "good".

This is not an unusual scenario: display rounding of numbers with no
natural "right" number of decimal places. You can, say, go to 4 or 5
places by significant digits. But the table or chart gets too busy,
people object, so you round to 1 or 2 decimal places - for the decent
reason of effective presentation of information.

AHS
 
G

Gene Wirchenko

[snip]
FACTOR_FOR_ROUNDING_TO_TWO_DECIMALS seems as a name to me.

If it was named FACTOR_FOR_ROUNDING it could even be changed to
accommodate other roundings that to 0.01.

I would never use the first.

I am not even sure that I would use the last, but the last
is not that bad.

Both are rather long. Get a few of them in a statement, and you
are probably looking at a second line (assuming an 80 or so character
limit).

Names that are long can bloat statements horribly.

Sincerely,

Gene Wirchenko
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top