multi-line Strings

B

bob smith

Should something be added to the Java language to make multi-line Strings more clear?

Maybe like what PHP has?

Right now, I have a mess like this:

private final String mLomoishShader =
"precision mediump float;\n" +
"uniform sampler2D tex_sampler_0;\n" +
"uniform vec2 seed;\n" +
"uniform float stepsizeX;\n" +
"uniform float stepsizeY;\n" +
"uniform float stepsize;\n" +
"uniform vec2 scale;\n" +
"uniform float inv_max_dist;\n" +
"varying vec2 v_texcoord;\n" +
"float rand(vec2 loc) {\n" +
" float theta1 = dot(loc, vec2(0.9898, 0.233));\n" +
" float theta2 = dot(loc, vec2(12.0, 78.0));\n" +
" float value = cos(theta1) * sin(theta2) + sin(theta1) * cos(theta2);\n" +
// keep value of part1 in range: (2^-14 to 2^14).
" float temp = mod(197.0 * value, 1.0) + value;\n" +
" float part1 = mod(220.0 * temp, 1.0) + temp;\n" +
" float part2 = value * 0.5453;\n" +
" float part3 = cos(theta1 + theta2) * 0.43758;\n" +
" return fract(part1 + part2 + part3);\n" +
"}\n" +
"void main() {\n" +
// sharpen
" vec3 nbr_color = vec3(0.0, 0.0, 0.0);\n" +
" vec2 coord;\n" +
" vec4 color = texture2D(tex_sampler_0, v_texcoord);\n" +
" coord.x = v_texcoord.x - 0.5 * stepsizeX;\n" +
" coord.y = v_texcoord.y - stepsizeY;\n" +
" nbr_color += texture2D(tex_sampler_0, coord).rgb - color.rgb;\n" +
" coord.x = v_texcoord.x - stepsizeX;\n" +
" coord.y = v_texcoord.y + 0.5 * stepsizeY;\n" +
" nbr_color += texture2D(tex_sampler_0, coord).rgb - color.rgb;\n" +
" coord.x = v_texcoord.x + stepsizeX;\n" +
" coord.y = v_texcoord.y - 0.5 * stepsizeY;\n" +
" nbr_color += texture2D(tex_sampler_0, coord).rgb - color.rgb;\n" +
" coord.x = v_texcoord.x + stepsizeX;\n" +
" coord.y = v_texcoord.y + 0.5 * stepsizeY;\n" +
" nbr_color += texture2D(tex_sampler_0, coord).rgb - color.rgb;\n" +
" vec3 s_color = vec3(color.rgb + 0.3 * nbr_color);\n" +
// cross process
" vec3 c_color = vec3(0.0, 0.0, 0.0);\n" +
" float value;\n" +
" if (s_color.r < 0.5) {\n" +
" value = s_color.r;\n" +
" } else {\n" +
" value = 1.0 - s_color.r;\n" +
" }\n" +
" float red = 4.0 * value * value * value;\n" +
" if (s_color.r < 0.5) {\n" +
" c_color.r = red;\n" +
" } else {\n" +
" c_color.r = 1.0 - red;\n" +
" }\n" +
" if (s_color.g < 0.5) {\n" +
" value = s_color.g;\n" +
" } else {\n" +
" value = 1.0 - s_color.g;\n" +
" }\n" +
" float green = 2.0 * value * value;\n" +
" if (s_color.g < 0.5) {\n" +
" c_color.g = green;\n" +
" } else {\n" +
" c_color.g = 1.0 - green;\n" +
" }\n" +
" c_color.b = s_color.b * 0.5 + 0.25;\n" +
// blackwhite
" float dither = rand(v_texcoord + seed);\n" +
" vec3 xform = clamp((c_color.rgb - 0.15) * 1.53846, 0.0, 1.0);\n" +
" vec3 temp = clamp((color.rgb + stepsize - 0.15) * 1.53846, 0.0, 1.0);\n" +
" vec3 bw_color = clamp(xform + (temp - xform) * (dither - 0.5), 0.0, 1.0);\n" +
// vignette
" coord = v_texcoord - vec2(0.5, 0.5);\n" +
" float dist = length(coord * scale);\n" +
" float lumen = 0.85 / (1.0 + exp((dist * inv_max_dist - 0.73) * 20.0)) + 0.15;\n" +
" gl_FragColor = vec4(bw_color * lumen, color.a);\n" +
"}\n";
 
A

Arne Vajhøj

Should something be added to the Java language to make multi-line Strings more clear?

Maybe like what PHP has?

Right now, I have a mess like this:

private final String mLomoishShader =
"precision mediump float;\n" +
"uniform sampler2D tex_sampler_0;\n" +

It could be added.

PHP has it. C# has it.

But I would not consider it a high priority.

It is most useful for demo code.

For real code then large chunks of texts would usually
be stored externally (file, DB etc.) not embedded into
the code.

Arne
 
D

Daniel Pitts

It could be added.

PHP has it. C# has it.

But I would not consider it a high priority.

It is most useful for demo code.

For real code then large chunks of texts would usually
be stored externally (file, DB etc.) not embedded into
the code.
+1
That definitely looks like it should be in a separate file. If the
shader is closely related to the class which contains that declaration,
I would look into using Class.getResourceAsStream() to load it.

If the class itself can be made to use any shader, I would externalize
it entirely, passing the shader text as a construction parameter perhaps.
 
M

markspace

It could be added....

But I would not consider it a high priority.

It is most useful for demo code.

For real code then large chunks of texts would usually
be stored externally (file, DB etc.) not embedded into
the code.

I agree. I was going to suggest that bob use a resource, in fact, which
is of course an external file.

<http://docs.oracle.com/javase/7/docs/technotes/guides/lang/resources.html#class>

I think what I'd like more than multi-line support is support for
strings without escape sequences. Like:

String regex = """\s[0-9](\.|\*)[_a-zA-Z]\w""";

is a lot more readable than trying to mentally decode all of the double
slashes that regex in Java frequently requires.

(That regex does nothing useful, btw; it's just an example.)
 
D

Daniel Pitts

It could be added....

But I would not consider it a high priority.

It is most useful for demo code.

For real code then large chunks of texts would usually
be stored externally (file, DB etc.) not embedded into
the code.

I agree. I was going to suggest that bob use a resource, in fact, which
is of course an external file.

<http://docs.oracle.com/javase/7/docs/technotes/guides/lang/resources.html#class>


I think what I'd like more than multi-line support is support for
strings without escape sequences. Like:

String regex = """\s[0-9](\.|\*)[_a-zA-Z]\w""";

is a lot more readable than trying to mentally decode all of the double
slashes that regex in Java frequently requires.

(That regex does nothing useful, btw; it's just an example.)
IntelliJ IDEA actually has a way to open just the expression in its own
edit window. The expression in the edit window is in the expression
language (regex in this case), so no Java escaping is necessary. You
can then modify it, and IDEA will add the appropriate escaping back in.

They support this for many languages in many contexts. Its a pretty
nifty feature IMHO.
 
M

markspace

IntelliJ IDEA actually has a way to open just the expression in its own
edit window.


That is indeed a nifty feature. However I believe in principle a
programming language should be readable without something having to
translate it for you. Java regex fails that test often enough where I
think a non-escaped string constant would be a benefit overall.
 
E

Eric Sosman

That is indeed a nifty feature. However I believe in principle a
programming language should be readable without something having to
translate it for you. Java regex fails that test often enough where I
think a non-escaped string constant would be a benefit overall.

FORTRAN solved this problem half a century ago:

5HHELLO
^^^^^
here's the string

13HHELLO, WORLD.
^^^^^^^^^^^^^
here's the string

3H1233H456
^^^ ^^^
here are the strings

8H1233H456
^^^^^^^^
here's the string

12H 1233H456
^^^^^^^^^^^^
here's the string

There was, of course, a certain amount of tedium (not to mention
opportunity for error) in manually counting each string, but if
Those Thrilling Days Of Yesteryear are what you crave ...
 
B

BGB

It could be added.

PHP has it. C# has it.

likewise, Lua, as a common extension to JavaScript, ...


in my language, there are several ways of doing it:
var str=
"first line
second line
third line
";

var str=
"""first line
second line
third line
""";

var str=
<[[first line
second line
third line]]>;

where the 3rd form may be nested, but is otherwise about the same as the
second form.

the main difference between the first form and the latter forms is that
the first form still interprets \ escapes, whereas the latter forms
don't use \ escapes.

note, it is also possible to use \ to avoid encoding newlines as well:
"first-word \
second-word"
where the \ will eat the newline and any following whitespace.


another difference has to do with maximum string length allowed in the
parser:
the first form is currently limited to around 4096 characters per string
constant;
the latter forms handle strings of up to 1MB IIRC (it is either a 1MB
max, or a 1MB initial/expandable buffer, I forget).

But I would not consider it a high priority.

It is most useful for demo code.

For real code then large chunks of texts would usually
be stored externally (file, DB etc.) not embedded into
the code.

a lot depends.


at least off in C land, I was using large arrays of strings to represent
"archives" for embedding collections of smaller resource files into C code.

technically, this was kind of "really annoying and sucked".


a little later on, this was replaced by embedding a WAD-based archive
format (ExWAD) into the PE/COFF (Windows EXE/DLL) and ELF (Linux
binary/shared-object) binaries. the WAD is historically related to the
id Software WAD formats (primarily Quake/Half-Life WAD2), but is not
exactly the same (different archive header, larger directory entries,
hierarchical, supports Deflate, ...).

mostly this is used for embedding metadata, script-code, or resource
data into program binaries (typically post-link). (for example: sticking
reflection metadata into a DLL so the script VM can use it easier,
stuffing the class-library into the relevant VM DLLs, ...).


functionally, though, this isn't too much different from embedding
resource files into the JAR though for Java code.

the main "obvious" difference is mostly that programs like WinZip and
similar wont be "clever" and assume that it is an archive, mostly
because they have no understanding of ExWAD.


note that most normal (plain data) data resources have their own files
or are stored in ZIP-based "pk" files though (for example:
"resource/data001.pk" for game data), vs say, "resource/gamex86.dll"
which may contain both Win32 / x86-specific native code, as well as a
lot of scripting-language code and similar.


or such...
 
A

Arne Vajhøj

FORTRAN solved this problem half a century ago:

5HHELLO
^^^^^
here's the string

13HHELLO, WORLD.
^^^^^^^^^^^^^
here's the string

3H1233H456
^^^ ^^^
here are the strings

8H1233H456
^^^^^^^^
here's the string

12H 1233H456
^^^^^^^^^^^^
here's the string

There was, of course, a certain amount of tedium (not to mention
opportunity for error) in manually counting each string, but if
Those Thrilling Days Of Yesteryear are what you crave ...

If Java was going to implement it, then I think the C# way
would be preferably to the Fortran way.

Arne

PS: And for those that do not know C#, then C# has "" strings
with \ as escape like Java, but also has @"" string where
\ is not an escape and where line change are allowed.
 
M

markspace

PS: And for those that do not know C#, then C# has "" strings
with \ as escape like Java, but also has @"" string where
\ is not an escape and where line change are allowed.


And that's what I was trying to imply with my triple quotes. I don't
know C# so I'm not aware of their syntax conventions. However, anything
at all works for me, as long as it's readable. (Manually counting the
characters in a string, not so much.)
 
E

Eric Sosman

[...]
PS: And for those that do not know C#, then C# has "" strings
with \ as escape like Java, but also has @"" string where
\ is not an escape and where line change are allowed.

As one of "those," and curious: Can a @"" string have an
embedded " character?

@""Escapes? We don' need no steenkin' escapes!" he snarled."
 
M

markspace

[...]
PS: And for those that do not know C#, then C# has "" strings
with \ as escape like Java, but also has @"" string where
\ is not an escape and where line change are allowed.

As one of "those," and curious: Can a @"" string have an
embedded " character?

@""Escapes? We don' need no steenkin' escapes!" he snarled."

That's why I like triple quotes. Single and double embedded quotes are
ok. In fact I'd provide an alternate syntax that harkened back to the
Unix shell 'here document':

String s = <<< ident """A string with "s in it.""" ident <<<;

Now you can adapt the closing delimiter so it doesn't duplicate any
substring portion of your constant. No escapes are ever required this
way. Even triple quotes can be embedded arbitrarily.
 
A

Arne Vajhøj

[...]
PS: And for those that do not know C#, then C# has "" strings
with \ as escape like Java, but also has @"" string where
\ is not an escape and where line change are allowed.

As one of "those," and curious: Can a @"" string have an
embedded " character?

Yes.

An " inside @"" is encoded as "".

Arne
 
E

Eric Sosman

[...]
PS: And for those that do not know C#, then C# has "" strings
with \ as escape like Java, but also has @"" string where
\ is not an escape and where line change are allowed.

As one of "those," and curious: Can a @"" string have an
embedded " character?

Yes.

An " inside @"" is encoded as "".

Aha! Another FORTRAN legacy! As of FORTRAN IV you could
write 'I''M HERE' instead of 8HI'M HERE, which most people
considered a great advance -- in the late 1960's.

My point, of course, is that there's still an escape mechanism
at work. It's a different mechanism, yes, but it still has the
What You See Ain't What You Get problem this thread has been
complaining about. And here's a funny thing about inventing an
escape mechanism: Even if the special character sequences were
surpassingly uninteresting and spectacularly rare before being
adopted as escapes, their very adoption makes them suddenly
interesting and much more common. You'll find yourself wanting
to write a regex that looks for "" inside a @"..." string, and
you'll get something like

@"@""([^""]*""""")*[^""]*"""

.... leaving you pretty much where you started, just with a new
suit of clothes on the Emperor. Also, we still need to produce

"\u0281 is the IPA voiced uvular fricative"

.... on input systems that cannot generate the IPA voiced uvular
fricative all by themselves.

Source has syntax -- at this level we usually speak of "lexing,"
but a lexer is really just a parser optimized to recognize a simple
syntax. A big job of the lexer is to distinguish metacharacters
from payload characters, and if every character could potentially
appear as payload there has to be some kind of convention to
discriminate the different usages. Those conventions mean that
WYSAWYG will inevitably occur, to a greater extent or a lesser.

It's unfortunate that both Java and regex use \ so heavily,
because it leads to a lot of escaping-of-escapes and harms
readability. But why should it be a given that Java's literals
should be different to avoid conflict with regex syntax? Why
not change the regex syntax instead, and use, say, ~ for the
role now taken by \? It might improve regexes to the point
where they're merely unreadable, instead of intolerable. ;-)
 
M

markspace

I've always liked the Awk and Perl default convention of delimiting
regexes with slashes: /regex/ - if their compilers can deal with this
cleanly, the Java compiler could surely do the same.

Perl, especially, and "cleanly" don't belong in the same sentence. Or
paragraph. Or solar system.
 
M

markspace

Yes, couldn't agree more. The only languages I've used that approach the
ugliness of Perl are Python (its object construction and handling are

Good, it's not just me that dislikes Python.
 
A

Arne Vajhøj

On 12/10/2012 3:08 PM, Arne Vajhøj wrote:
[...]
PS: And for those that do not know C#, then C# has "" strings
with \ as escape like Java, but also has @"" string where
\ is not an escape and where line change are allowed.

As one of "those," and curious: Can a @"" string have an
embedded " character?

Yes.

An " inside @"" is encoded as "".

Aha! Another FORTRAN legacy! As of FORTRAN IV you could
write 'I''M HERE' instead of 8HI'M HERE, which most people
considered a great advance -- in the late 1960's.

Doubling is also used in various Pascal, Basic, SQL.

My guess is that doubling is more common than escaping
in non-C-family languages.
My point, of course, is that there's still an escape mechanism
at work. It's a different mechanism, yes, but it still has the
What You See Ain't What You Get problem this thread has been
complaining about. And here's a funny thing about inventing an
escape mechanism: Even if the special character sequences were
surpassingly uninteresting and spectacularly rare before being
adopted as escapes, their very adoption makes them suddenly
interesting and much more common. You'll find yourself wanting
to write a regex that looks for "" inside a @"..." string, and
you'll get something like

@"@""([^""]*""""")*[^""]*"""

... leaving you pretty much where you started, just with a new
suit of clothes on the Emperor.

The doubling mechanism is used only for the string encloser character,
while true escape is used for many other characters as well.

Sp the doubling mechanism should result in fewer problems than
true escape.

Furthermore the suggestion was not to replace the current mechanism
but to supplement it. Which means that one can still pick the current
form if one think that it is more readable for some cases.
Also, we still need to produce

"\u0281 is the IPA voiced uvular fricative"

... on input systems that cannot generate the IPA voiced uvular
fricative all by themselves.

CHAR(0x0281) // 'is the IPA voiced uvular fricative'

or similar work in other languages.

Arne
 
A

Arne Vajhøj

I've always liked the Awk and Perl default convention of delimiting
regexes with slashes: /regex/ - if their compilers can deal with this
cleanly, the Java compiler could surely do the same.

That require regex to become a part of the language
syntax.

Arne
 
A

Arne Vajhøj

Good, it's not just me that dislikes Python.

There are probably thousands and thousands.

But I am not among them. I think Python is OK. I would
not use it for the same tasks as Java, but still.

Arne
 
B

BGB

I've always liked the Awk and Perl default convention of delimiting
regexes with slashes: /regex/ - if their compilers can deal with this
cleanly, the Java compiler could surely do the same.

FWIW, my language also inherited this syntax as well (from ECMAScript),
though the regex is essentially otherwise just a variant of a string.

var str = /[0-9]([0-9]|[A-F]|[a-f])+/;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top