Passing a Method Name to a Method, Redux

L

Lew

No. How could it?

"-server

"JVMs based on Sun's Hotspot technology initially compile class methods with a
low optimization level. These JVMs use a simple complier [sic] and an
optimizing JIT compiler. Normally the simple JIT compiler is used. However you
can use this option to make the optimizing compiler the one that is used. This
change will significantly increases [sic] the performance of the server but
the server takes longer to warm up when the optimizing compiler is used."

<http://publib.boulder.ibm.com/infoc...ere.express.doc/info/exp/ae/tprf_tunejvm.html>
 
G

Gene Wirchenko

You have a couple of problems with your code, one organizational and the
other understanding the effeciencies.

The organizational one relates to the idea that you'll just toss your
tests away. Don't ever do that! The test code is part of the project,

You are misunderstanding. The test code that I am referring to
is proof-of-concept code to test ideas, *not* my test cases.
and should remain with it. Test code is also put under code control,
and managed along with the projects. It's important because every time
you want to change your parser, you'll need to re-run the tests to make
sure everything is working.

Are you using an IDE? Most will auto generate a test framework for you.
It's very handy and you should be doing this regardless how you write
code. The IDE just makes it very handy.


The other thing, efficiency, I'll show you right now. The
organizational stuff is actually probably a bigger deal, but I think
you'll be happy to see how to make code faster.

This line here is the biggest offender.


This is super inefficient inside a loop. To do this, the system has to
create a new string with one extra character, and then toss away the old
string. Making a new object and tossing an old one is bound to slow you
down.

final public void parse() {
StringBuilder sb1 = new StringBuilder( 255 );
for( int xScan = 0; xScan <
TimingTesting.cParseString.length(); xScan++ ) {
char c = TimingTesting.cParseString.charAt( xScan );
if( find( c ) ) {
sb1.append( c );
}
}
String ... = sb1.toString()

Here's my adaptation of your loop. Notice I make a StringBuilder once,
outside the loop, and call append() inside the loop, which is much much
faster. Then I call toString once outside the loop again, so I only
create a new String once, not each time inside the loop. Try to
refactor your code to do this, it will make it much faster.

I have heard about the String/StringBuilder dichotomy. I will be
addressing it.
One last thing for now: on splitting a string into tokens, look at this:

String[] tok = TimingTesting.cParseString.split( "[^a-zA-Z0-9]+" );
System.out.println( Arrays.toString( tok ) );

But I do not want to do that. I am writing a preprocessor to
process files like:
***** Start of Test File *****
* testin.dat
* Test Input File for Preprocessor
* Last Modification: 2011-06-16
*
* This is VFP code.

$idchars ABC 1 2 A
$idchars
$quotes "" '' [] ~
$rem testin2.dat contains the definitions of STARTTEXT and ENDTEXT.
$rem
$include "testin2.dat"
$include testin2.dat
$include ~Atestin.datA
$include "testin2.dat"X
$include ~Atestin.datAX

$define FROM 1
$define TO 10
set talk off

? "STARTTEXT"

for i=FROM to TO
? i
endfor

? ENDTEXT

return
$undef FROM
$undef TO
***** End of Test File *****

Sincerely,

Gene Wirchenko
 
G

Gene Wirchenko

[snip]
I am writing a *simple* parser. It is not for grovelling over
Java code. It is for a preprocessor for SQL Server for better code
management. I mean for it to be fairly language-agnostic.
What means simple? JavaCC is the parser generator that I'm most familiar
with. <http://javacc.java.net/>

There are preprocessor commands. For them, the first character
of the line is "$". All of the other lines are text to be processed.
There is one level of string substitution.

Think simple version of the C preprocessor.

Sincerely,

Gene Wirchenko
 
M

markspace

There are preprocessor commands. For them, the first character
of the line is "$". All of the other lines are text to be processed.
There is one level of string substitution.


A parser generator makes parsers for you. You give it a syntax (like
"starts with $") and it makes the parser. Generally very efficient as
it will do optimizations in the code that are hard to find.
 
M

markspace

You are misunderstanding. The test code that I am referring to
is proof-of-concept code to test ideas, *not* my test cases.

I don't see what you are doing then. Is the code you showed us not your
"test/proof of concept?"

First, a proof of concept is a little silly for a parser. Of course
it's feasible. And then you are running timing test on it. Shouldn't
you be doing time tests on something you intend to throw away? Profile
the real code!

You seem to be making a lot of extra work for yourself, or at least
confusing the heck out of us.

But I do not want to do that. I am writing a preprocessor to
process files like:

You should be using input like what you showed to test then. The test
code you showed us won't parse this correctly.
 
S

Stefan Ram

markspace said:
A parser generator makes parsers for you. You give it a syntax (like
"starts with $") and it makes the parser. Generally very efficient as
it will do optimizations in the code that are hard to find.

Seems as if writing a grammar is more difficult for some
people that to write a parser, possibly, because it requires
abstraction.

The previous poster claimed, however, that his grammar (at
least: the set of characters allowed in an identifier) would
change at run time. This would mean that there is no static
grammar (at least for the identifier symbol).
 
S

Stefan Ram

Joshua Cranmer said:
Then why not use the C preprocessor?

The previous poster claimed that the set of characters that
can be used within an identifier is supposed to change
during the execution of his parser. This is not possible
with the C preprocessor whose set of identifier characters
is { "_", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
"k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v",
"w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G", "H",
"I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T",
"U", "V", "W", "X", "Y", "Z", "0", "1", "2", "3", "4", "5",
"6", "7", "8", "9" }.
 
G

Gene Wirchenko

I don't see what you are doing then. Is the code you showed us not your
"test/proof of concept?"

It is.
First, a proof of concept is a little silly for a parser. Of course
it's feasible. And then you are running timing test on it. Shouldn't
you be doing time tests on something you intend to throw away? Profile
the real code!

I wanted to know which way to jump on character identification. I
figured that squential would be bad though it is not as bad as I
thought. I did not know which of a binary search on a String or a
Treeset search would be faster. That was why I wrote the test.
You seem to be making a lot of extra work for yourself, or at least
confusing the heck out of us.

I am an experienced programmer, but I am not so experienced with
Java. I am trying to remedy the latter.

On the confusion, I see that a number of people have
micro-optimised for my code. I am looking at a bigger picture.
You should be using input like what you showed to test then. The test
code you showed us won't parse this correctly.

Of course not. The test code is just for dealing with
identifiers, and it is a simplified version to boot.

Sincerely,

Gene Wirchenko
 
B

blmblm

Good links. Thank you.

But my question was really about method calling to
SequentialSearch(), BinarySearch(), and TreesetSearch().

Interesting how the discussion has veered off onto other subjects --
but hey, this is Usenet!

But you did get at least two replies focusing on reducing the
amount of code duplication, one from markspace and one from me [*].
It would be nice to know whether you found them useful. ?

[*] Message-ID: <[email protected]>
 
M

markspace

I am an experienced programmer, but I am not so experienced with
Java. I am trying to remedy the latter.


Honestly, what you've done so far is pretty crazy. I figured you were a
2nd year student who got in over his head on a personal project, or a
homework problem. No sane programmer would try to search for characters
the way you are.

If you want to tell us what your actual experience is, it might help.
I'm guessing your experience isn't actually programming, maybe HTML and
Flash or something. But I don't want to get into an argument here so if
you don't want to tell us then it's ok to drop it.

On the confusion, I see that a number of people have
micro-optimised for my code. I am looking at a bigger picture.


It's hard to give you a big picture. Well, a couple have tried. JavaCC
or parboiled or similar compiler-compiler would help you the most. But
besides that you've given us only a very micro example. We can't do
anything else with the code other than micro-optimize.

And on my system using a StringBuilder instead of '+' yields a 200%
speed up. Times go from 22 seconds to around 6. I don't really call
that micro.

Of course not. The test code is just for dealing with
identifiers, and it is a simplified version to boot.


Part of the problem is that you aren't really testing those three search
routines. You're testing other things like string concatenation,
because those are dominating the running time of your tests.

I'd start over. Download an IDE like NetBeans. Start a new project
with a single class with just the search routines (shown below). Use
Tools -> Create Unit Tests. That will at least generate a saner
framework for you to put your testing. It does need testing, and you
can make a couple of tests do timing for you. I think it'll help you
think about the problem more clearly too, things were kinda messy in there.

class TimingTesting
{

static String cParseString =
"//identifier//IDENTIFIER//a_b_c "
+"abc1234b5%$__dbl;one;two;three;END";
static String IdentChars =
"0123456789" + "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + "_"
+ "abcdefghijklmnopqrstuvwxyz"; // sorted order!
static SortedSet<Character> IdentCharsSet
= new TreeSet<Character>();
static int nRepetitions = 1000000;


// Just these three methods and no more!!!

static boolean SequentialSearch(
char CurrChar )
{
boolean fFound = false;
for( int i = 0; i < IdentChars.length() && !fFound; i++ ) {
fFound = IdentChars.charAt( i ) == CurrChar;
}
return fFound;
}

static boolean BinarySearch(
char CurrChar )
{
int xLow = 0;
int xHigh = IdentChars.length() - 1;
int xTry;
boolean fFound = false;
while( xLow <= xHigh ) {
xTry = ( xLow + xHigh ) / 2;
if( CurrChar == IdentChars.charAt( xTry ) ) {
return true;
}
if( CurrChar < IdentChars.charAt( xTry ) ) {
xHigh = xTry - 1;
} else {
xLow = xTry + 1;
}
}
return false;
}

static boolean TreesetSearch(
char CurrChar )
{
return IdentCharsSet.contains( CurrChar );
}

}
 
G

Gene Wirchenko

Honestly, what you've done so far is pretty crazy. I figured you were a
2nd year student who got in over his head on a personal project, or a
homework problem. No sane programmer would try to search for characters
the way you are.

You do not know my requirements, but you claim I am not sane?

<plonk>

[snip]

Sincerely,

Gene Wirchenko
 
B

blmblm

Yes. I wanted a simple method call in the parser so I could
cut-and-paste. I did not know if I would need more than one call. I
am going to go with a Treeset so I will not have a separate method in
the implementation.

Another "no separate method" approach would be to use the String
class's indexOf method:

static boolean StringLibSearch
(
char CurrChar
)
{
return IdentChars.indexOf(CurrChar) >= 0;
}

I added this to your benchmark suite and found it to give performance
comparable to the TreeSet implementation (indeed, usually it was a
bit faster). The overhead of building the TreeSet probably doesn't
matter in the grand scheme of things, and probably it also doesn't
matter a lot that every call to the TreeSet's "contains" method
(AFAIK) has to convert a character primitive to a Character object,
but -- <shrug>.

But if you're going to use a Set, why a TreeSet? As best I can tell,
you don't use/need the sorted-ness it provides. Just out of curiosity,
I also added to your benchmark suite something that declares the set
as a Set and creates it as an instance of HashSet, and the resulting
code was noticeably faster than any of the other alternatives.


And finally, I wondered how all of these methods compared to
something using regular expressions (the java.util.regex classes), so
i tried that too, replacing your whole parse code with the following:

import java.util.regex.*;

// ....

static Pattern IdentRegexPattern=Pattern.compile("[" + IdentChars + "]+");

// ....

// code to be called repeatedly from timing loop
static void ParseRegex()
{
Matcher IdentMatcher = IdentRegexPattern.matcher(cParseString);
String sIdent;
while (IdentMatcher.find())
{
sIdent = IdentMatcher.group();
if (nRepetitions==1)
System.out.println(sIdent);
}
}

// ....

This was a clear winner (with regard to performance) on the system
where I measured performance, *unless* I ran the tests with the
"-server" flag, in which case it took second place, behind the
HashSet-based approach. As I understand things, though, the
"-server" flag results in the compiler doing more to try to optimize
the code, including being more aggressive about eliminating dead
code, so I'm not entirely confident about the results I'm getting
being meaningful.

(Probably your actual code needs to do something other than
finding and printing identifiers, so the above code would need
some adjustment. Still, if you like regular expressions, it's
another possibility, maybe .... )

[ snip ]
 
L

lewbloch

     Yes.  I wanted a simple method call in the parser so I could
cut-and-paste.  I did not know if I would need more than one call.  I
am going to go with a Treeset so I will not have a separate method in
the implementation.

Another "no separate method" approach would be to use the String
class's indexOf method:

    static boolean StringLibSearch
        (
         char CurrChar
        )
        {
            return IdentChars.indexOf(CurrChar) >= 0;
        }

I added this to your benchmark suite and found it to give performance
comparable to the TreeSet implementation (indeed, usually it was a
bit faster).  The overhead of building the TreeSet probably doesn't
matter in the grand scheme of things, and probably it also doesn't
matter a lot that every call to the TreeSet's "contains" method
(AFAIK) has to convert a character primitive to a Character object,
but -- <shrug>.

But if you're going to use a Set, why a TreeSet?  As best I can tell,
you don't use/need the sorted-ness it provides.  Just out of curiosity,
I also added to your benchmark suite something that declares the set
as a Set and creates it as an instance of HashSet, and the resulting
code was noticeably faster than any of the other alternatives.

And finally, I wondered how all of these methods compared to
something using regular expressions (the java.util.regex classes), so
i tried that too, replacing your whole parse code with the following:

    import java.util.regex.*;

    // ....

    static Pattern IdentRegexPattern=Pattern.compile("[" + IdentChars + "]+");

    // ....

    // code to be called repeatedly from timing loop
    static void ParseRegex()
    {
        Matcher IdentMatcher = IdentRegexPattern.matcher(cParseString);
        String sIdent;
        while (IdentMatcher.find())
        {
            sIdent = IdentMatcher.group();
            if (nRepetitions==1)
                System.out.println(sIdent);
        }
    }

    // ....

This was a clear winner (with regard to performance) on the system
where I measured performance, *unless* I ran the tests with the
"-server" flag, in which case it took second place, behind the
HashSet-based approach.  As I understand things, though, the
"-server" flag results in the compiler doing more to try to optimize
the code, including being more aggressive about eliminating dead
code, so I'm not entirely confident about the results I'm getting
being meaningful.

(Probably your actual code needs to do something other than
finding and printing identifiers, so the above code would need
some adjustment.  Still, if you like regular expressions, it's
another possibility, maybe .... )

[ snip ]

Your points are excellent, but the ongoing violations of the naming
conventions is making my brain hurt. Can't we please revert to
conformant names in our replies at least?
 
B

blmblm

[ snip ]
Another "no separate method" approach would be to use the String
class's indexOf method:

static boolean StringLibSearch
(
char CurrChar
)
{
return IdentChars.indexOf(CurrChar) >= 0;
}

I added this to your benchmark suite and found it to give performance
comparable to the TreeSet implementation (indeed, usually it was a
bit faster). The overhead of building the TreeSet probably doesn't
matter in the grand scheme of things, and probably it also doesn't
matter a lot that every call to the TreeSet's "contains" method
(AFAIK) has to convert a character primitive to a Character object,
but -- <shrug>.

But if you're going to use a Set, why a TreeSet? As best I can tell,
you don't use/need the sorted-ness it provides. Just out of curiosity,
I also added to your benchmark suite something that declares the set
as a Set and creates it as an instance of HashSet, and the resulting
code was noticeably faster than any of the other alternatives.

And finally, I wondered how all of these methods compared to
something using regular expressions (the java.util.regex classes), so
i tried that too, replacing your whole parse code with the following:

import java.util.regex.*;

// ....

static Pattern IdentRegexPattern=Pattern.compile("[" + IdentChars + "]+");

// ....

// code to be called repeatedly from timing loop
static void ParseRegex()
{
Matcher IdentMatcher = IdentRegexPattern.matcher(cParseString);
String sIdent;
while (IdentMatcher.find())
{
sIdent = IdentMatcher.group();
if (nRepetitions==1)
System.out.println(sIdent);
}
}

// ....

This was a clear winner (with regard to performance) on the system
where I measured performance, *unless* I ran the tests with the
"-server" flag, in which case it took second place, behind the
HashSet-based approach. As I understand things, though, the
"-server" flag results in the compiler doing more to try to optimize
the code, including being more aggressive about eliminating dead
code, so I'm not entirely confident about the results I'm getting
being meaningful.

(Probably your actual code needs to do something other than
finding and printing identifiers, so the above code would need
some adjustment. Still, if you like regular expressions, it's
another possibility, maybe .... )

[ snip ]

Your points are excellent, but the ongoing violations of the naming
conventions is making my brain hurt. Can't we please revert to
conformant names in our replies at least?


Well .... I guess I figure it's a choice between two things that
seem desirable -- (1) working in with the conventions of the code
I'm modifying and (2) applying the conventions used by most Java
programmers. I chose the former, though it rather makes my brain
hurt as well. :)?
 
A

Arne Vajhøj

I could not find one that would run standalone on my system.

Almost all C compiler has a way to do only preprocessing.

GCC, MS, DEC/CPQ/HP etc. has.

Arne
 
G

Gene Wirchenko

Almost all C compiler has a way to do only preprocessing.

GCC, MS, DEC/CPQ/HP etc. has.

I did not want a C compiler. I simply wanted a preprocessor.

Sincerely,

Gene Wirchenko
 
L

lewbloch

     I did not want a C compiler.  I simply wanted a preprocessor..

Awwwww. You get a free C compiler that gives you the preprocessor you
want, and you're complaining that it gives you more than you want?
Wow. That's pretty petulant.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top