create a string of <n> equal chars <c>

K

Kevin McMurtrie

Daniel Pitts said:
Roedy Green said:
On 13 Jul 2010 15:01:35 GMT, Andreas Leitgeb
someone who said :

It seems so basic that I can't believe such a feature wasn't in
the standard library:

it is part of the common11 tools for JDK 1.1+
http://mindprod.com/products1.html#COMMON11

The method is called StringTools.rep

/**
* Produce a String of a given repeating character.
*
* @param c the character to repeat
* @param count the number of times to repeat
*
* @return String, e.g. rep('*',4) returns "****"
* @noinspection WeakerAccess,SameParameterValue
*/
public static String rep( char c, int count )
{
if ( c == ' '&& count<= SOMESPACES.length() )
{
return SOMESPACES.substring( 0, count );
}
char[] s = new char[count];
for ( int i = 0; i< count; i++ )
{
s[ i ] = c;
}
return new String( s ).intern();
}

/**
* used to efficiently generate Strings of spaces of varying
length
*/
private static final String SOMESPACES = " ";

Why use intern() on the second case? It's has always been undocumented
where the pool storage is and what the cost of using it is. The only
time I use that method is when generating keys for a Properties class.
Why even use it there? I don't think I've ever seen a legitimate case
for using intern(). The *closest* I've seen to a valid use is someone
wanted to use it for synchronization based on a String key.

Sample code:

import java.io.IOException;
import java.io.StringReader;
import java.util.Map;
import java.util.Properties;

public class Foo
{
private static void runLookupTest (final Properties p)
{
int c= 0;
for (int i= 0; i < 10000000; ++i)
{
c+= (p.get("org.company.services.WidgetFactory.implementor") != null) ? 1 : 0;
c+= (p.get("com.company.util.Scheduler.useTimer") != null) ? 1 : 0;
c+= (p.get("com.company.imaging.ThumbRender.filter") != null) ? 1 : 0;
}
if (c != 30000000)
throw new RuntimeException ("Bad test: " + c);
}

private static void runTestCycle (boolean warmup) throws IOException
{
//Create Properties with string keys that are not internalized
final Properties p= new Properties();
p.load(new StringReader( "org.company.services.WidgetFactory.implementor=1000\n"
+ "com.company.util.Scheduler.useTimer=true\n"
+ "com.company.imaging.ThumbRender.filter=lanczos3\n"));

//Internalized lookup of non-internalized keys
final long start1= System.nanoTime();
runLookupTest(p);
final long end1= System.nanoTime();

//Run keys through String.intern()
final Properties temp= (Properties)p.clone();
p.clear(); //put() doesn't overwrite an existing key
for (Map.Entry<Object, Object> e : temp.entrySet())
p.put(String.valueOf(e.getKey()).intern(), e.getValue());

//Internalized lookup of internalized keys
final long start2= System.nanoTime();
runLookupTest(p);
final long end2= System.nanoTime();

if (warmup)
{
System.out.println("Warmup");
}
else
{
System.out.println("Not internalized: "
+ (end1 - start1)/1000000f + " ms");
System.out.println("Internalized: "
+ (end2 - start2)/1000000f + " ms");
}
System.out.println();
}

public static void main (final String args[]) throws Exception
{
runTestCycle(true);
runTestCycle(false);
runTestCycle(false);
runTestCycle(false);
}
}



==============================================
Warmup

Not internalized: 1725.81 ms
Internalized: 794.556 ms

Not internalized: 1725.5449 ms
Internalized: 795.296 ms

Not internalized: 1725.703 ms
Internalized: 794.618 ms
 
A

Arved Sandstrom

Tom said:
The standard library is already freak huge. If it had every item which
anyone couldn't believe wasn't in the standard library, it could be a
very leviathan, a behemoth, a colossus of codes.

What i think we could do with is a sort of Greater Standard Library.
Some way of giving certain packages official blessing, so that if you
built code on top of them, you'd have a reasonable expectation that
someone out there who wanted to run your code would already have them.
This already exists de facto for certain things - log4j, quite a bit of
Apache Commons, JUnit, and so on, but it could be useful to put it on a
more formal, although not completely formal, basis. We could tie this up
with a process for merging different libraries approaching the same
problem, reviewing and improving and integrating things, etc. We would
end up with something like a more comprehensive standard library, but
without the mandarins at Sunacle (or JCP Towers - shudder) having to
decree it.

Not that i'm volunteering to organise this, of course.

tom

Not the worst idea in the world. I suspect that the list of third-party
JARs for very many good-sized reasonably general-purpose applications
looks substantially the same; grab the third-party JAR list for a few
thousand or a few tens of thousands of realistic apps, store it in an
associative array by number of occurrences, and promulgate the list of
Top 10 or Top 20 as your Greater Standard Library (GSL).

Having such a thing might then help address Andreas' problem, which is
that of finding a given functionality; the existence of a GSL would make
it more clear than it is now that the JDK falls flat in certain areas,
and the types of JARs in the GSL would illustrate *where* the JDK falls
flat.

An even better metric than simple third-party JAR lists in applications
would at least check to see that the JARs are actually in use, and then
possibly how much.

On a related note I think you'd then find that most of the functionality
in such a GSL ought to have been part of the JDK a long time ago. I have
no problems in the JDK people being conservative (and often
retro-active) when it comes to creating new APIs (although it seems to
me that they could have usefully been much more conservative at times),
but OTOH some stuff is so obvious that I fault them for not having
created those libraries a long time ago. For example, Apache Commons
Lang StringUtils is loaded with so much obvious stuff that gets used all
the time; would it have been so difficult for the geniuses at Sun to
write something like that way back when?

I have to admit that I also find it slightly irritating when one's own
libraries are subsequently rolled up by official and semi-official (*)
API introductions/improvements/upgrades. As a consultant specializing in
maintenance of J2EE applications I see an appreciable percentage of code
made redundant or obsolete by what are fairly obvious API releases and
upgrades by Sun or other major library writers. I don't fault upgrades
or additions to APIs that are based on required marketplace exposure to
garner feedback; I mean rather corrections to deficient initial releases
or after-the-fact initial releases.

Before anyone points out that I'm whining, let me point out that many
working programmers have neither the time (nor often official approval)
to polish up their own libraries and release them to the adoring public.
I've also found that there aren't really all *that* many NIMBY
programmers out there; most would happily use a fine third-party library
rather than write their own code. One largish application that I worked
with extensively in the recent past was written by a good core team,
quite a few years ago, and it's replete with ?Utils classes, almost all
of which express functionality that was not available 5-7 years ago; at
least 75% of that functionality is now available in official or
semi-official libraries or their improved (read fixed) later versions.
Bit late IMHO...since none of it was all that esoteric.

AHS

* semi-official: unofficial but very common
 
D

Daniel Pitts

Daniel Pitts said:
On 13 Jul 2010 15:01:35 GMT, Andreas Leitgeb
someone who said :

It seems so basic that I can't believe such a feature wasn't in
the standard library:

it is part of the common11 tools for JDK 1.1+
http://mindprod.com/products1.html#COMMON11

The method is called StringTools.rep

/**
* Produce a String of a given repeating character.
*
* @param c the character to repeat
* @param count the number of times to repeat
*
* @return String, e.g. rep('*',4) returns "****"
* @noinspection WeakerAccess,SameParameterValue
*/
public static String rep( char c, int count )
{
if ( c == ' '&& count<= SOMESPACES.length() )
{
return SOMESPACES.substring( 0, count );
}
char[] s = new char[count];
for ( int i = 0; i< count; i++ )
{
s[ i ] = c;
}
return new String( s ).intern();
}

/**
* used to efficiently generate Strings of spaces of varying
length
*/
private static final String SOMESPACES = " ";

Why use intern() on the second case? It's has always been undocumented
where the pool storage is and what the cost of using it is. The only
time I use that method is when generating keys for a Properties class.
Why even use it there? I don't think I've ever seen a legitimate case
for using intern(). The *closest* I've seen to a valid use is someone
wanted to use it for synchronization based on a String key.

Sample code:

import java.io.IOException;
import java.io.StringReader;
import java.util.Map;
import java.util.Properties;

public class Foo
{
private static void runLookupTest (final Properties p)
{
int c= 0;
for (int i= 0; i< 10000000; ++i)
{
c+= (p.get("org.company.services.WidgetFactory.implementor") != null) ? 1 : 0;
c+= (p.get("com.company.util.Scheduler.useTimer") != null) ? 1 : 0;
c+= (p.get("com.company.imaging.ThumbRender.filter") != null) ? 1 : 0;
}
if (c != 30000000)
throw new RuntimeException ("Bad test: " + c);
}

private static void runTestCycle (boolean warmup) throws IOException
{
//Create Properties with string keys that are not internalized
final Properties p= new Properties();
p.load(new StringReader( "org.company.services.WidgetFactory.implementor=1000\n"
+ "com.company.util.Scheduler.useTimer=true\n"
+ "com.company.imaging.ThumbRender.filter=lanczos3\n"));

//Internalized lookup of non-internalized keys
final long start1= System.nanoTime();
runLookupTest(p);
final long end1= System.nanoTime();

//Run keys through String.intern()
final Properties temp= (Properties)p.clone();
p.clear(); //put() doesn't overwrite an existing key
for (Map.Entry<Object, Object> e : temp.entrySet())
p.put(String.valueOf(e.getKey()).intern(), e.getValue());

//Internalized lookup of internalized keys
final long start2= System.nanoTime();
runLookupTest(p);
final long end2= System.nanoTime();

if (warmup)
{
System.out.println("Warmup");
}
else
{
System.out.println("Not internalized: "
+ (end1 - start1)/1000000f + " ms");
System.out.println("Internalized: "
+ (end2 - start2)/1000000f + " ms");
}
System.out.println();
}

public static void main (final String args[]) throws Exception
{
runTestCycle(true);
runTestCycle(false);
runTestCycle(false);
runTestCycle(false);
}
}



==============================================
Warmup

Not internalized: 1725.81 ms
Internalized: 794.556 ms

Not internalized: 1725.5449 ms
Internalized: 795.296 ms

Not internalized: 1725.703 ms
Internalized: 794.618 ms
So you save 1 second reading 10 million properties, when a program
probably reads fewer than 1000 properties. So, during the executions of
the program, you have saved some nanoseconds. Good for you.
 
R

Roedy Green

So you save 1 second reading 10 million properties, when a program
probably reads fewer than 1000 properties. So, during the executions of
the program, you have saved some nanoseconds. Good for you.

The question was which was faster. Kevin did a test and discovered
intern was 54% faster, the opposite of what most people expected. I
think that is a quite interesting discovery. Further, Kevin made no
recommendations on how to code. I think you are just being a
shithead, embarrassed at making the wrong prediction. Stop looking for
excuses to put other people down.
 
D

Daniel Pitts

The question was which was faster.
The question, which you have conveniently not quoted, was why bother
using it there:
case for using intern().

Kevin did a test and discovered
intern was 54% faster, the opposite of what most people expected. I
think that is a quite interesting discovery.
Perhaps, but its usefulness has yet to be demonstrated to me.
Further, Kevin made no
recommendations on how to code.
Which is unfortunate, because my question was exactly that. Why use
intern() in the case of properties. It adds complexity and gains little
benefit.
I think you are just being a
shithead, embarrassed at making the wrong prediction.
Wow, Roedy, while I don't always agree with you, I always try to be
polite when explaining my point of view. I made no prediction
whatsoever, only questioned the reasoning of using intern() for Property.get
> Stop looking for
excuses to put other people down.
Perhaps my sarcasm was a bit harsh, however I didn't intend to put
anyone down.

"Premature optimization is the root of all evil." -- Donald Knuth.

My point is only that this is a premature optimization. The result is
interesting, agreed, but I still wouldn't use intern() in production code.
 
W

Wojtek

Roedy Green wrote :
for some reason his is displaying improperly. That should be about 40
spaces long.

Are you using an HTML editor? HTML eats any spaces after the first one.
Or you can use a non-breaking space &nbsp; to force extra displayable
spaces
 
A

Arne Vajhøj

Perl even has an own operator for it ('a' x 3 - or so),

I don't know about Perl, but Python allows:

'a' * 3

But my post about language features was only counting
pad methods of the string class.
but BASIC does not have anything like it

VB.NET has it but probably not the older flavors.

Arne
 
A

Arne Vajhøj

I can not come up with a better solution now, but would suggest
to hide this implementation behind an interface, so that you can
easily replace this implementation in all you projects, once this
is better supported in Java. Like,

interface/class MyStringUtils
{ /** Appends multiple copies of padCharacter to the source, so that
the result has lenght as its length when measured in Unicode code points. */
public java.lang.String pad
( java.lang.String source, java.lang.String padCharacter, int length );
... }

Then, go for the implementation that is most readable/maintainable first,
and only optimize it for run-time speed, /if/ this was shown to be
necessary.

Putting it in a reusable class makes a lot of sense.

I don't think an interface makes sense for something as low level
as this.

Arne
 
M

markspace

Roedy said:
The question was which was faster. Kevin did a test and discovered
intern was 54% faster,


But Daniel is still correct. What Kevin produced was a micro-benchmark,
not a full app. Optimize to the actual app, not to some idealized
version of a small part of it.
 
K

Kevin McMurtrie

markspace said:
But Daniel is still correct. What Kevin produced was a micro-benchmark,
not a full app. Optimize to the actual app, not to some idealized
version of a small part of it.

Wow, so much analysis of me and not the code! Somebody could have fired
up an IDE and figured it out in less than a minute. The code simply
demonstrates taking advantage of an optimized path in the Sun Map
implementations.

Properties loaded from a file or a stream have keys that are not
internalized. This is the common code path when calling get("Some
constant"):

String.hashCode ()
Indexed array read
hash comparison
object reference comparison
String.equals()


Rebuild the Properties with internalized keys. The common code path is
reduced to this:

String.hashCode ()
Indexed array read
hash comparison
object reference comparison

Most calls to String.equals() are eliminated because of reference
equality.

This works for all Sun Map classes but Properties are where the usage
conditions are most likely to make it work. Internalizing strings in
general is not productive.
 
M

markspace

Kevin said:
Wow, so much analysis of me and not the code!


Er, no, just the code, which is a micro benchmark. I don't doubt that
it's really faster, I just think that in a much larger app, the overhead
of property lookup would be effectively reduced to noise, or less. File
IO, network IO or some much larger data structure would most likely be
the bottleneck in a large app.
 
K

Kevin McMurtrie

markspace said:
Er, no, just the code, which is a micro benchmark. I don't doubt that
it's really faster, I just think that in a much larger app, the overhead
of property lookup would be effectively reduced to noise, or less. File
IO, network IO or some much larger data structure would most likely be
the bottleneck in a large app.

Of course it depends on the app. If you have a lot of dynamic
configuration options (DAL properties, remote service addresses, feature
switches, algorithm switches, internationalization, etc.) in a Map then
those few lines of code are worth the coding effort. It's something
that would be more likely on an enterprise web server where shutting
down for adjustments isn't an option.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,071
Latest member
MetabolicSolutionsKeto

Latest Threads

Top