multi-line Strings

J

Joshua Cranmer

if the bulk of the string literals are things internal to the program
(rather than intended for an end user), then it makes little sense to
move them to external resources (IME, most string literals tend to be
program internal anyways, with human-readable messages few and far
between, and most of these in-turn being internal debugging messages).

You must not work with large user-facing applications then. :) My
practice is very nearly the opposite--most string literals are either
involved with debugging to log files, keys to preferences/other
configuration, or keys to human readable messages. The latter two
classes are things that tend to be grouped outside of the program itself
for simple reasons of reducing management complexity (Clang even uses an
external file for its command line arguments, kind of [1], despite not
doing any localization of strings).
with user-readable strings, the program could still be developed under a
policy like "if you need the messages in a language you can read, either
learn English (or Japanese or Chinese or similar) or get a dictionary",
so making them external may not make much sense in this case.

Even if you don't need to provide translated messages, there is benefit
to centralizing program messages in external files. Ensuring consistency
is one key benefit that I can think of.
even with language-specific strings, unless using magic numbers, a
string may still be needed to refer to them.

And a constant String is often used instead of copy-pasting the literal
around.

[1] The "kind of" is that this is turned into compiled code by a build step.
 
B

BGB

if the bulk of the string literals are things internal to the program
(rather than intended for an end user), then it makes little sense to
move them to external resources (IME, most string literals tend to be
program internal anyways, with human-readable messages few and far
between, and most of these in-turn being internal debugging messages).

You must not work with large user-facing applications then. :) My
practice is very nearly the opposite--most string literals are either
involved with debugging to log files, keys to preferences/other
configuration, or keys to human readable messages. The latter two
classes are things that tend to be grouped outside of the program itself
for simple reasons of reducing management complexity (Clang even uses an
external file for its command line arguments, kind of [1], despite not
doing any localization of strings).

I am mostly working on 3D stuff... (mostly a game, but also some 3D
tools, ...).

its "user facing" side is mostly the 3D renderer, sound mixing, and
user-input handling (mouse, keyboard actions, keyboard shortcuts, ...).

text isn't really a big part of the normal experience, nor much
information presented as natural-language text (there is a fair amount
more in terms of variables and formatted numerical output, but this
isn't really the same).


most of the code in the project, however, is internal infrastructural
code, most not really having much direct user interaction.


basically, it is a project which is slightly over 1 Mloc (1 million
lines of code). a few of the bigger chunks here are mostly stuff related
to my scripting language, and also the 3D renderer. together, they make
up a large percentage of the total codebase, followed roughly by the
"server end" in 3rd place (the server-end is what holds most of the
"gameplay logic", like physics code, weapons and items logic, enemy-AI
logic / behaviors, and so on...).


currently, there is very little in terms of a GUI (traditional GUI
elements are almost completely absent from the program).

there is an interactive console though, which mostly functions in a
manner vaguely similar to the Linux shell interface though (type
commands, see results). typically, commands are terse names, and don't
usually generate much printed output. these commands are implemented
in-program, and operate in terms of a program-local virtual-filesystem.
where relevant, commands have similar names and behaviors to their Linux
analogues (cd, ls, cat, pwd, ...). (there is little obvious reason
though why anyone would want to change these though, like 'cd' or 'ls'
should probably be fairly universal independent of language?...).

there is an in-program text-editor though, which provides an interface
partway between MS-Edit and Vim (cosmetically, it looks a little more
like MS-Edit, but handles user input in many ways a little more like
Vim, with ALT-';' switching to the command-entry prompt, ...).

some parts of the engine are controlled by "cvars" though, which
function in a manner vaguely similar to environment variables.


or, IOW, there is lots of stuff going on, and lots of stuff for the user
to interact with, just relatively little where textual feedback is
really called for (at least much beyond debug messages). (and,
presumably, normal users/players shouldn't normally be messing around in
the console anyways, apart from maybe to enter cheat-codes).

Even if you don't need to provide translated messages, there is benefit
to centralizing program messages in external files. Ensuring consistency
is one key benefit that I can think of.

could be, but it doesn't really tend to be a big use case.
most of what is printed, is usually an indication of where the message
is being printed from (function/method names and similar), and a terse
description of the event, and usually a few items giving the values of
relevant arguments or variables.

given most of this isn't really intended for end users, it doesn't
really make much sense for translation.

granted, a person could translate any voice-acted dialogue, which would
probably be a bigger use-case for translation I think, but at the
moment, there isn't a whole lot of this either (that is actually
relevant to gameplay).

even with language-specific strings, unless using magic numbers, a
string may still be needed to refer to them.

And a constant String is often used instead of copy-pasting the literal
around.

[1] The "kind of" is that this is turned into compiled code by a build
step.

could be, depends on whether the literal is one-off, or used more than
once...
 
B

BGB

I am not so happy about special additions to language to handle
special cases.

But then I am not a language person either ...

:)

a lot depends on language use-case and design philosophy...
one mans' useless is another mans vital...



for example, in my case I personally have relatively little need for
date-handling or code to help with monetary calculations...

but, I have more extensive use-cases for things like vector, quaternion,
and matrix math (as well as good old math-functions, like
sin/cos/atan2/sqrt/...).

so, for example, one language designer more aiming for business uses
might be like "why don't I make dates and money features be built into
the language?...", and I might be more like "why not make vector-math
and math-functions be built in?".


another person might really want built-in regexes, due to doing a lot of
text-processing.

....


then, another area might be "language minimalism" vs "throwing in
whatever might be potentially useful". one person might avoid adding any
feature unless it is painfully needed, and another person might just
throw in features "because they can" (especially features which don't
cost much to implement).

....


all these things leading to variations in the "style" of the language.

The regex syntax itself is not exactly a good example of readability.

yeah.

I guess it is more about being "common" than "readable".

for example, I initially didn't really want to add it to my language,
because it was pretty ugly, but ECMA-262 did include it as part of the
language description.



FWIW, I also added the "Type<T>" generic syntax as well (as part of the
parser), even if thus far, generics aren't actually implemented (it is
one thing to add parser support, and another to actually make it do
something...).


then there are a few features which are supported, but are on the "I
don't know what their future status will be" list.

one example, is supporting more conventional declaration syntax, in
contrast to the usual JS / AS declaration syntax.

like, currently, a person can type either:
"var a:int[];" or "int[] a;".

the uncertainty is mostly along the lines of "how much sense does it
make to have a language based of JS, but not use JS declaration syntax?...".

well, nevermind:
"int[256] arr;" which is equivalent to:
"var arr:int[256];"
or:
"int[] arr=new int[256];"
or:
"var arr:int[]=new int[256];"
....

I also considered before possibly allowing for:
"var:int[256] arr;"
but, there is no precedent for such a syntax...


such is the great fun (and uncertainty) of language-design.
 
A

Arne Vajhøj

This, I think, is the point. We don't need a special String syntax to fix the
problem with regexps -- we'd be much better off with a fixed regexp syntax.
Something OO. And (since this is Java) I don't think that we need be afraid of
something verbose.

Off-the-top-of-my-head (all classes and method are imaginary):

Regexp alpha = Regexp.fromList(java.lang.text.portable.Alphas);
alpha = alpha.or('_');
Regexp num = Regexp.fromList(java.lang.text.portable.Digits);
Regexp alphanum = alpha.or(num);
Regexp identifier = alpha.followedBy(alphanum.repeated());

Naturally, I'd prefer something a /bit/ less verbose, but Java won't support
that. But even with the verbosity, I think my version is /far/ better. It
puts the composition of regexps into the programmer's hands which means that it
can be approached like any other complex programming task. Quoting/escaping
problems go away. Grouping (bracketing) problems go away (and become
decoupled from the backreference concept. Comments become trivially easy to
add. Various kinds of abstraction and reuse are possible.

I would assume that it has been tried many times to come up with
a nicer syntax for regex. Tried without success.

It is not a problem for the simplex regex, but the complex ones
are tricky.
P.S. Mind you: my /real/ opinion is that regular expressions have no place in
production code except in the construction of scanners (for which a more
directly-applicable implementation than standalone regexps is helpful).
Regexps are for users to enter, or go into configuration data. At least the
suggestion above has the advantage -- from my point of view -- that regexps no
longer look like "quick and easy" fixes to problems, and maybe the programmer
would think more about whether they /actually/ solve [all of] the problem at
hand.

Regex is widely used for general data validations. From form input
validation in web apps to XML schema definitions.

Arne
 
P

Peter J. Holzer

The regex syntax itself is not exactly a good example of readability.

True, but there are ways to improve it. For example, Perl has a variant
Regexp syntax (indicated with the /x flag) where whitespace (including
newlines) is ignored and comments are allowed.

Together with variable substitution, even complex regexps can be quite
readable. For example compare this:

my $param = qr{ [-a-z]+ = " [^"]* " }x;
my $start_tag = qr{ < [a-z]+ (?: \s+ $param )* \s* /? > }x;
my $end_tag = qr{ </ [a-z]+ > }x;
my $comment = qr{ <!-- .*? --> }sx;

my $pcdata = qr{ [^<]*? }x;

my $link = qr{
<a (?: \s+ $param )* \s* >
(?:
$start_tag | $end_tag | $comment | $pcdata
) *?
</a>
}x;

with this:

<a(?:\s+(?:[-a-z]+="[^"]*"))*\s*>(?:(?:<[a-z]+(?:\s+(?:[-a-z]+="[^"]*"))*\s*/?>)|(?:</[a-z]+>)|(?^s:<!--.*?-->)|(?:[^<]*?))*?</a>

Oh, and you may notice the use of qr{} instead of // as delimiters.
The possibility to use alternate start and end delimiter of *any* string
(not just regexps) is quite a nifty feature and often removes the need
for escapes.

But for all various string syntaxes that Perl supports, it's still
missing a sane multiline string syntax.

hp
 
L

Lew

Right now, I have a mess like this:
private final String mLomoishShader =

That constant variable should be named in all uppercase letters with
underscores, per the Java coding conventions.
 
L

Lew

Arne said:
The two main reasons to move literals to constants are:
* safer change of value, because changing the constant changes it everywhere
* better documentation by using a descriptive name

I would add a third - even though a constant be used but once, be it even
'private', its declaration up top as a constant variable can make it easier
to maintain over time.

So a constant like '0' doesn't really qualify by either of Arne's standards
nor by mine. But a constant like:
"precision mediump float;\n" +
"uniform sampler2D tex_sampler_0;\n" +
"uniform vec2 seed;\n" +
"uniform float stepsizeX;\n" +
"uniform float stepsizeY;\n" +
"uniform float stepsize;\n" + ...

is all too likely to change over time. Its burial deep in code as a
literal would make it hard to maintain, whereas its declaration as a
constant variable simplifies locating it for update, and improves
readability per Arne's second criterion.
I believe one would get a decent indication of whether
to use a constant or not.

Regardless of how you come down in one particular case or another,
the perspective that Arne suggests emphasizes readability and
maintainability. If you intelligently apply these principles you will
not err by much.
Does the same literal occur more than once where it must be the
same value?

Would the code be more readable by using a descriptive name
instead of a literal?

Would it be easier to maintain changes over time as a constant variable?
 
R

Robert Klemme

True, but there are ways to improve it. For example, Perl has a variant
Regexp syntax (indicated with the /x flag) where whitespace (including
newlines) is ignored and comments are allowed.
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#COMMENTS

But for all various string syntaxes that Perl supports, it's still
missing a sane multiline string syntax.

Does it?

$ perl x.pl
a line
another line
yet another line
one more line
$ cat x.pl

$str=<<MULTI;
a line
another line
yet another line
one more line
MULTI

print($str);

Cheers

robert
 
B

BGB

Does it?

$ perl x.pl
a line
another line
yet another line
one more line
$ cat x.pl

$str=<<MULTI;
a line
another line
yet another line
one more line
MULTI

print($str);

I had before imagined the possibility of something like:
#<<identifier; ... identifier

IOW:
str = #<<EOF;
line 1
line 2
line 3
EOF;

but, never really added this, as heredoc syntax is kind of ugly IMO...
 
B

BGB

[...]
I wonder, in general, where the line should be drawn? Java coding
guidelines recommend that 1 and -1 can be used as literals, but other
integer constants should defined as a "constant" by the programmer.

Java coding guidelines suggest -1,0,1 can be literals,
but only in `for' loops. Use them elsewhere, or use those
values in any type other than `int', and you're supposed
to use a `static final'. That is, the guidelines frown
on `q = 1.0 - p;' and even on `System.exit(0);'.

What utter nonsense!

It could probably have been done better.

:)

yeah...


better reason IMO to more follow the rule of "do what makes sense".
like, adherence to rules for rules sake leads to all manner of absurdity.

granted, yes, sometimes there are "bigger things" at stake by following
or disregarding rules (like, moral ethics or the law), in which case, it
is more a matter of "follow this rule, or bad things will result".

actually, a little pet theory here is that "pretty much everything"
mostly boils down to cost/benefit tradeoffs anyways... like, egoism +
cost/benefit -> rules (both ethical and legal, as well as policies,
practices, and conventions). a person may benefit mostly by following
these rules (at least so far as they align with ones' benefit).

not that not all rules are good though, many are instead the result of
random peoples' opinions, and legalism... a good rule results from the
inherent tradeoffs of a situation, and a bad rule results from
"interpreting" statements based simply on what the words seem to saying
(and all the stuff that goes with it: some people really liking their
fine points of grammar and pulling out the dictionary to defend their
arguments).

(probably enough said here, don't need to wander off too far...).

Nobody is perfect.

and probably also JNI...

but, yeah, unsigned byte makes more sense, and for a signed byte, there
can be a type like, say: sbyte.


then again, the lack of unsigned types in general is also a little
annoying (and presumably it wouldn't have been *that* complicated to
support them either, but whatever...). (the only notable difference at
the VM level would likely have been needing to supply an unsigned divide
operator somewhere...).
 
R

Robert Klemme

I had before imagined the possibility of something like:
#<<identifier; ... identifier

IOW:
str = #<<EOF;
line 1
line 2
line 3
EOF;

but, never really added this, as heredoc syntax is kind of ugly IMO...

I don't really see the difference - or the improvement. You just added
a hash and a semi colon.

Cheers

robert
 
P

Peter J. Holzer

But for all various string syntaxes that Perl supports, it's still
missing a sane multiline string syntax.

Does it?
[...]
$str=<<MULTI;
a line
another line
yet another line
one more line
MULTI

Note that I wrote "sane". Here documents aren't sane. They cannot be
indented with the rest of the code, so something like

sub print_message {
my ($verbose) = @_;

if ($verbose) {
print <<EOS
This is a
very long message.

It goes on
for ever
and ever.
EOS
} else {
print <<EOS
This is a shorter message.
But it is still too long.
EOS
}
}

not only looks daft, it also makes it hard to follow the flow of the
program.

And I'm not even talking about stuff like

print <<S1, 5, <<S2, "\n";
one
S1
two
S2

which is the same as
print "one\n", 5, "two\n", "\n";
for those who don't know Perl.

A saner variant of here documents is used in a little-known language
called SPL[1], where you can specify an 'indentation character' together
with the terminator. So the example above would look like:

method print_message(verbose) {

if (verbose) {
print <<EOS:
:This is a
:very long message.
:
:It goes on
:for ever
:and ever.
EOS;
} else {
print <<EOS:
:This is a shorter message.
:But it is still too long.
EOS;
}
}

Much better.

(yes, I know various ways to get a similar effect in Perl - but they all
include processing the string at run time - or a source filter).

The YAML[2] indentation rules also look ok to me and might serve as a
basis for multiline strings in a programming language.

hp


[1] http://www.clifford.at/spl/
[2] http://yaml.org/
 
J

Jim Janney

Chris Uppal said:
Arne said:
The regex syntax itself is not exactly a good example of readability.

This, I think, is the point. We don't need a special String syntax to fix the
problem with regexps -- we'd be much better off with a fixed regexp syntax.
Something OO. And (since this is Java) I don't think that we need be afraid of
something verbose.

Off-the-top-of-my-head (all classes and method are imaginary):

Regexp alpha = Regexp.fromList(java.lang.text.portable.Alphas);
alpha = alpha.or('_');
Regexp num = Regexp.fromList(java.lang.text.portable.Digits);
Regexp alphanum = alpha.or(num);
Regexp identifier = alpha.followedBy(alphanum.repeated());

Naturally, I'd prefer something a /bit/ less verbose, but Java won't support
that. But even with the verbosity, I think my version is /far/ better. It
puts the composition of regexps into the programmer's hands which means that it
can be approached like any other complex programming task. Quoting/escaping
problems go away. Grouping (bracketing) problems go away (and become
decoupled from the backreference concept. Comments become trivially easy to
add. Various kinds of abstraction and reuse are possible.

-- chris

P.S. Mind you: my /real/ opinion is that regular expressions have no place in
production code except in the construction of scanners (for which a more
directly-applicable implementation than standalone regexps is helpful).
Regexps are for users to enter, or go into configuration data. At least the
suggestion above has the advantage -- from my point of view -- that regexps no
longer look like "quick and easy" fixes to problems, and maybe the programmer
would think more about whether they /actually/ solve [all of] the problem at
hand.

P.P.S UK post codes...

For me the native regexp syntax usually becomes unmanageable at about
the same time that I'm ready to decide that regexps aren't the best
approach anyway. But there are some Java libraries for building complex
regexps. A quick Google search turns up this one

http://reggert.github.com/reb4j/

I thought I remembered another one, but I can't find it now.
 
B

BGB

I don't really see the difference - or the improvement. You just added
a hash and a semi colon.

the '#' is mostly to help avoid syntactic ambiguity (and also to help
visually provide something for the '<<' to "go into"), and the final
semicolon is a statement terminator (it is not actually part of the
string, but can help tell the parser "hey, this statement has ended").
 
G

Gene Wirchenko

On Sat, 15 Dec 2012 11:54:15 -0000, "Chris Uppal"

[snip]
P.S. Mind you: my /real/ opinion is that regular expressions have no place in
production code except in the construction of scanners (for which a more
directly-applicable implementation than standalone regexps is helpful).

I find them useful for validation of data format. Apart from
that, I have little to no use for them.

I do not like them where I have to explain why something is
wrong. For that, I will use a state machine with multiple error
states.
Regexps are for users to enter, or go into configuration data. At least the
suggestion above has the advantage -- from my point of view -- that regexps no
longer look like "quick and easy" fixes to problems, and maybe the programmer
would think more about whether they /actually/ solve [all of] the problem at
hand.

I find that as soon as a regex becomes a bit hairy that that is
about the point where I want to break it up. I prefer short regexes
with a bit of control code.
P.P.S UK post codes...

As an example of what?

Sincerely,

Gene Wirchenko
 
G

Gene Wirchenko

This is my point! The line *is* blurry! And I'm not sure if any hard
and fast rules can be made. Even generalities are somewhat hard to talk
about authoritatively.

Quite. rec.arts.sf.written frequently has discussions,
sometimes heated, on the boundary between science fiction and fantasy
or said:
(On 0, well n+0 is a bit gauche, yes? But I'll admit that things like

Not necessarily, but almost certainly. (I might do it for
formatting or to make the point that I had considered what the value
should be were it at all in doubt. This is rather rare.)
array indexes or sub-string offsets, yes 0 as a literal is allowed by
Oracles guidelines, and useful.)

Or a sum variable's initialisation.
The whole method is longer, and also broken up into three methods and
one private class. I think it's worth looking at. There was obviously
a deliberate effort to break-down the procedure into functional units,
which I think helps the readability of the code as much as anything.
(But also makes character literals more readable as a result.)

I also try to keep my statements on one line unless it is
something like a long output statement or a procedure call where the
complexity is not there.

Several years ago, there was a post to, I think,
comp.lang.c.moderated where the OP asked how to optimise two
statements. I looked at them and did not see a way. Some time later,
someone posted a different version of the code.

The first version had long variable names, and each of the two
statements took two lines. The new version had shorter variable
names, and each statement fit on one line. The second version was
*much* more readable.
Unfortunately, the website with JDK source code isn't showing up on my
Google searches, so I can't make a link.

I will take your word for it. I can see how it would easily work
out as you stated.

Sincerely,

Gene Wirchenko
 
A

Arved Sandstrom

Arne said:
The regex syntax itself is not exactly a good example of readability.

This, I think, is the point. We don't need a special String syntax to fix the
problem with regexps -- we'd be much better off with a fixed regexp syntax.
Something OO. And (since this is Java) I don't think that we need be afraid of
something verbose.

Off-the-top-of-my-head (all classes and method are imaginary):

Regexp alpha = Regexp.fromList(java.lang.text.portable.Alphas);
alpha = alpha.or('_');
Regexp num = Regexp.fromList(java.lang.text.portable.Digits);
Regexp alphanum = alpha.or(num);
Regexp identifier = alpha.followedBy(alphanum.repeated());

Naturally, I'd prefer something a /bit/ less verbose, but Java won't support
that. But even with the verbosity, I think my version is /far/ better. It
puts the composition of regexps into the programmer's hands which means that it
can be approached like any other complex programming task. Quoting/escaping
problems go away. Grouping (bracketing) problems go away (and become
decoupled from the backreference concept. Comments become trivially easy to
add. Various kinds of abstraction and reuse are possible.

-- chris

P.S. Mind you: my /real/ opinion is that regular expressions have no place in
production code except in the construction of scanners (for which a more
directly-applicable implementation than standalone regexps is helpful).
Regexps are for users to enter, or go into configuration data. At least the
suggestion above has the advantage -- from my point of view -- that regexps no
longer look like "quick and easy" fixes to problems, and maybe the programmer
would think more about whether they /actually/ solve [all of] the problem at
hand.

P.P.S UK post codes...

No offense, Chris, but personally I find your syntax about as hard to
follow as the JPA Criteria API. Which latter I refuse to use, even
though I seriously dislike silent JPA provider failures when a JPQL
string is wrong.

I don't develop my regular expressions in Java. I work them up on the
command line using grep or sed, or in a good editor like Sublime Text.
As others have also said, at the point where an RE is getting ridiculous
even without Java escaping, I'll simplify with other forms of processing
- these most likely in Java.

I think verbose is *bad*. Anything that adds to it is bad. My opinion,
others may certainly (vociferously) disagree. OO can already suffer from
fragmentation, where logic that solves an immediate problem is found in
many different spots; that problem is exacerbated when any given chunk
of code is appreciably larger than it needs to be because of extra
verbosity. Java is already bad enough in this regard - let's not make it
worse.

Regular expressions are *not* Java, and IMHO they are about as readable
for what they are intended for as anything else that people could come
up with. I don't myself think of them as a quick or easy fix to anything
- I consider the development of a useful RE to be a mini-program project
that may merit several hours. *If* the original problem rates it.

AHS
 
A

Arved Sandstrom

I wonder, in general, where the line should be drawn? Java coding
guidelines recommend that 1 and -1 can be used as literals, but other
integer constants should defined as a "constant" by the programmer.
[ SNIP ]

If I am multiplying by 2, or 10, or 1000, or using 100 or 400 when doing
some forms of date conversions, almost all the time the context will
make it clear what that constant is. If I were to religiously follow the
coding guidelines, in no small number of cases I'd have to define
constants that were called TWO or THOUSAND...which is sort of stupid.

AHS
 
E

Eric Sosman

I wonder, in general, where the line should be drawn? Java coding
guidelines recommend that 1 and -1 can be used as literals, but other
integer constants should defined as a "constant" by the programmer.
[ SNIP ]

If I am multiplying by 2, or 10, or 1000, or using 100 or 400 when doing
some forms of date conversions, almost all the time the context will
make it clear what that constant is. If I were to religiously follow the
coding guidelines, in no small number of cases I'd have to define
constants that were called TWO or THOUSAND...which is sort of stupid.

Naming constants to be the same as the name of the value they represent,
yes...that's stupid.

But nothing about your example suggests that's actually how the constances
should be named in the scenarios you describe.

If you are multiplying by a constant, there's a reason. Often, for example,
you are converting units (hours per day, days per week, etc., following
your "date conversions" theme). The conversion itself is the correct name
(e.g. "hoursPerDay", "daysPerWeek", etc.) in those examples. Similar logic
can be applied to other values.

public static final int MILLIMETERS_PER_METER = 1000;
public static final int MILLIGRAMS_PER_GRAM = 1000;
public static final int MILLIAMPERES_PER_AMPERE = 1000;
public static final int MILLISECONDS_PER_SECOND = 1000;

Great aids to understanding, I'm sure. (And stop calling me Millie!)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top