Optimizing my algorithm?

L

luser- -droog

I'm working on a redesign of my postscript interpreter; and I'm
trying to upgrade each module as I lock it in. Gprof runs of my
current program (code.google.com/p/xpost) constantly list
opexec() as the worst offender. And it really is a mess.

It attempts to perform the tasks of selecting the appropriate
form of polymorphic operators, checking the types on the
stack, and calling the resulting operator function through a
variable-argument switch-rig.


/* Execute an operator by opcode.
Examines stack to match against the patterns in signature.
Calls operator function with arguments.
if no signatures for the opcode match,
it issues an error corresponding to the last failed check
(either stackunderflow or typecheck)
*/
void opexec (state *st, size opcode) {
oper op = optab[opcode];
int i, j;
bool pass;
object *siq = tos; /* stack-segment in question */
int err = typecheck;

current_op = opcode;
if (op.n == 0) error(st,unregistered);

for (i=0; i<op.n; i++) { /* try each signature */
pass = true;
current_op_sig = i;
siq = tos-op.sig.n;
if (tos-os < op.sig.n) { err = stackunderflow;
continue; }
for (j=0; j<op.sig.n; j++) { /* check types */
/* j indexes both siq[] and op.sig.t[] :
the stack and the pattern, respectively. */

/* anytype matches anything */
if (op.sig.t[j] == anytype) continue;

/* exact match */
if (siq[j].tag == op.sig.t[j]) continue;

/* the numbertype pattern matches integers or reals */
if ( op.sig.t[j] == numbertype
&& ( siq[j].tag == integertype
|| siq[j].tag == realtype) ) continue;

/* the floattype pattern causes automatic coercion
of integers to reals, ie. ints float up */
if ( op.sig.t[j] == floattype ) {
if (siq[j].tag == integertype) promote(siq[j]);
if (siq[j].tag == realtype) continue;
}

/* the proctype matches executable arrays */
if (op.sig.t[j] == proctype
&& siq[j].tag == arraytype
&& siq[j].flags.lit == 0) continue;

/* all these possibilities failed */
pass = false;
err = typecheck;
break;
}
if (pass) goto call;
}
tos = siq; /* pop the stack to the 'stack in question' */
error(st,err); /* pop before error since error will adjust the
stack */
return;

call: /* pass args bottom-up */
tos = siq; /* pop the stack to the 'stack in question' */
/* room for output? */
if (tos-os + op.sig.out > OSSIZE) error(st,stackoverflow);
switch (op.sig.n) {
case 0: op.sig.fp(st); break;
case 1: op.sig.fp(st, siq[0]); break;
case 2: op.sig.fp(st, siq[0], siq[1]); break;
case 3: op.sig.fp(st, siq[0], siq[1], siq[2]); break;
case 4: op.sig.fp(st, siq[0], siq[1], siq[2], siq[3]);
break;
case 5: op.sig.fp(st, siq[0], siq[1], siq[2], siq[3],
siq[4]); break;
case 6: op.sig.fp(st, siq[0], siq[1], siq[2], siq[3],
siq[4], siq[5]); break;
default: error(st,unregistered);
}
}

So I had this great idea to do bitmasks and talked it up in clp
http://groups.google.com/group/comp.lang.postscript/browse_thread/thread/e639638484f4fa56#
but I never get much feedback there on this sort of thing.

I think I can replace my type arrays (op.sig[].t[]) with a
big integer filled with byte masks and I can accept or
reject a candidate function with an & and a ==.
But a problem arises with the "anytype" pattern.
The two options I conceive are
* multiple patterns checking each bit of just the relevant
nibble of the byte.
* devoting an entire bit to valid/invalid.

And it seems I'm pouting about these choices.
I don't like either one and I can't think of anything else.

I suppose I haven't even really formulated a question here.

Thoughts/Ramblings/Anecdotes anyone?
 
W

Willem

luser- -droog wrote:
) I'm working on a redesign of my postscript interpreter; and I'm
) trying to upgrade each module as I lock it in. Gprof runs of my
) current program (code.google.com/p/xpost) constantly list
) opexec() as the worst offender. And it really is a mess.
)
) It attempts to perform the tasks of selecting the appropriate
) form of polymorphic operators, checking the types on the
) stack, and calling the resulting operator function through a
) variable-argument switch-rig.
)
)
) /* Execute an operator by opcode.
) Examines stack to match against the patterns in signature.
) Calls operator function with arguments.
) if no signatures for the opcode match,
) it issues an error corresponding to the last failed check
) (either stackunderflow or typecheck)
) */
) void opexec (state *st, size opcode) {
) oper op = optab[opcode];
) int i, j;
) bool pass;
) object *siq = tos; /* stack-segment in question */
) int err = typecheck;
)
) current_op = opcode;
) if (op.n == 0) error(st,unregistered);
)
) for (i=0; i<op.n; i++) { /* try each signature */
) pass = true;
) current_op_sig = i;
) siq = tos-op.sig.n;
) if (tos-os < op.sig.n) { err = stackunderflow;
) continue; }
) for (j=0; j<op.sig.n; j++) { /* check types */
) /* j indexes both siq[] and op.sig.t[] :
) the stack and the pattern, respectively. */
)
) /* anytype matches anything */
) if (op.sig.t[j] == anytype) continue;
)
) /* exact match */
) if (siq[j].tag == op.sig.t[j]) continue;
)
) /* the numbertype pattern matches integers or reals */
) if ( op.sig.t[j] == numbertype
) && ( siq[j].tag == integertype
) || siq[j].tag == realtype) ) continue;
)
) /* the floattype pattern causes automatic coercion
) of integers to reals, ie. ints float up */
) if ( op.sig.t[j] == floattype ) {
) if (siq[j].tag == integertype) promote(siq[j]);
) if (siq[j].tag == realtype) continue;
) }
)
) /* the proctype matches executable arrays */
) if (op.sig.t[j] == proctype
) && siq[j].tag == arraytype
) && siq[j].flags.lit == 0) continue;
)
) /* all these possibilities failed */
) pass = false;
) err = typecheck;
) break;
) }
) if (pass) goto call;
) }
) tos = siq; /* pop the stack to the 'stack in question' */
) error(st,err); /* pop before error since error will adjust the
) stack */
) return;
)
) call: /* pass args bottom-up */
) tos = siq; /* pop the stack to the 'stack in question' */
) /* room for output? */
) if (tos-os + op.sig.out > OSSIZE) error(st,stackoverflow);
) switch (op.sig.n) {
) case 0: op.sig.fp(st); break;
) case 1: op.sig.fp(st, siq[0]); break;
) case 2: op.sig.fp(st, siq[0], siq[1]); break;
) case 3: op.sig.fp(st, siq[0], siq[1], siq[2]); break;
) case 4: op.sig.fp(st, siq[0], siq[1], siq[2], siq[3]);
) break;
) case 5: op.sig.fp(st, siq[0], siq[1], siq[2], siq[3],
) siq[4]); break;
) case 6: op.sig.fp(st, siq[0], siq[1], siq[2], siq[3],
) siq[4], siq[5]); break;
) default: error(st,unregistered);
) }
) }
)
) So I had this great idea to do bitmasks and talked it up in clp
) http://groups.google.com/group/comp.lang.postscript/browse_thread/thread/e639638484f4fa56#
) but I never get much feedback there on this sort of thing.
)
) I think I can replace my type arrays (op.sig[].t[]) with a
) big integer filled with byte masks and I can accept or
) reject a candidate function with an & and a ==.
) But a problem arises with the "anytype" pattern.
) The two options I conceive are
) * multiple patterns checking each bit of just the relevant
) nibble of the byte.
) * devoting an entire bit to valid/invalid.
)
) And it seems I'm pouting about these choices.
) I don't like either one and I can't think of anything else.
)
) I suppose I haven't even really formulated a question here.
)
) Thoughts/Ramblings/Anecdotes anyone?

Looks like there's a maximum of 6 arguments to check, and 3 possible
types to check against? That's just 18 bits, isn't it?

I'm not sure what you envisioned with bytemasks, but this sounds lke you
could do it with a single bitmask: simply build up a bitstring with each
of the types on the stack, and compare it to the bitmask of each op.
'anytype' is then, simply, all (three?) bits set. And 'numbertype' has
bits set on both 'int' and 'real'.

(Finding out if promotions are needed involves masking out all 'int'
argument types, shifting those to the 'real' position and then masking
against the signature bits, leaving just the 'int->float' bits)

You could also tweak it to check, say, just four arguments, and then go
into a special check for ops that need more than four arguments, which
may or may not speed up things, because you need to check fewer args
for common cases.

Depending on how many polymorphic ops there are, you could do a linear
search or perhaps some kind of trie approach.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
W

Willem

Willem wrote:
) Looks like there's a maximum of 6 arguments to check, and 3 possible
) types to check against? That's just 18 bits, isn't it?
)
) I'm not sure what you envisioned with bytemasks, but this sounds lke you
) could do it with a single bitmask: simply build up a bitstring with each
) of the types on the stack, and compare it to the bitmask of each op.
) 'anytype' is then, simply, all (three?) bits set. And 'numbertype' has
) bits set on both 'int' and 'real'.

I just realised that if you turn it into a rejection mask, then you don't
even need the '==' operation.

You do need a separate 'promotiontype' bitmask, because there is both
'numbertype' and 'floattype' which accept ints and reals.
Or, you could just code the promotion into each function and make it
'numbertype'.

) Depending on how many polymorphic ops there are, you could do a linear
) search or perhaps some kind of trie approach.

Offhand, I think you won't see benefits of a trie approach unless there
are more than about a dozen different signatures to check, perhaps even
a lot more.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
B

BartC

It attempts to perform the tasks of selecting the appropriate
form of polymorphic operators, checking the types on the
stack, and calling the resulting operator function through a
variable-argument switch-rig.
void opexec (state *st, size opcode) {
I think I can replace my type arrays (op.sig[].t[]) with a
big integer filled with byte masks and I can accept or
reject a candidate function with an & and a ==.
But a problem arises with the "anytype" pattern.
The two options I conceive are
* multiple patterns checking each bit of just the relevant
nibble of the byte.
* devoting an entire bit to valid/invalid.

And it seems I'm pouting about these choices.
I don't like either one and I can't think of anything else.

I suppose I haven't even really formulated a question here.

Thoughts/Ramblings/Anecdotes anyone?

You have up to six operands/parameters for each op? I take it these aren't
the normal C-style operators then, but Postscript functions too?

How many signatures does a typical op have? Would each signature (for that
op) have the same number of arguments?

What proportion of ops (both in static number, and the ones actually
executed) have more than one or two arguments?

And, how many different types are there? (BTW what's the difference between
floattype and realtype?)

This sort of stuff can be streamlined by using tables, but that's difficult
with up to six operands. But you can use separate code for the cases where
there is only one operand, or only two.

You also seem to be doing operand promotion in the loop, before you know
whether this particular signature will match; an integer promoted to real
now, might not then match a integer sig in the next loop. (Perhaps you
needn't bother doing this sort of promotion here at all; it can be done
inside the op-function.)

(I've tried to relate this to what I do, with type-dispatching on two
operands, for an interpreter (dispatching on a single operand is trivial).

Each type is represented by a number, and every likely combination of two
types is given a 'typesig' (another small number). A 2D matrix is used to
convert the two types of the operands on the stack, to a single typesig.
Each binary operator in the language uses a linear table of function
pointers, indexed by the typesig code.

Where an illegal combinations of operand types is used, this maps to a
special typesig, which calls an error function. In C, the whole thing
roughly looks like this (for an example operator 'add'):

add_fntable[sigmatrix[x->tag][y->tag]]();

If the types were few, then add_fntable could itself be a matrix:
add_fntable[x->tag][y->tag]. But there are no loops.

Your scheme has patterns such as 'anytype'; I wouldn't bother even looking
at it. Just make sure it ends up calling a suitable handler.

So if there was an operator OP(anytype,anytype), then I would make sure that
every value of OP_fntable[x->tag][y->tag] contained the address of the
implementation of op(anytype,anytype).

All this does mean that you'd need to treat 1, 2 or N operands separately,
but if you want it fast..)
 
B

BGB

I'm working on a redesign of my postscript interpreter; and I'm
trying to upgrade each module as I lock it in. Gprof runs of my
current program (code.google.com/p/xpost) constantly list
opexec() as the worst offender. And it really is a mess.

snip...


I suppose I haven't even really formulated a question here.

Thoughts/Ramblings/Anecdotes anyone?

maybe add a second stage:
first stage processes input, and maybe builds opcodes as a list or
table, and resolves all operations to function pointers;
second stage executes code simply by calling function pointers.

argument and type-specialization can be handled by selecting an
appropriate function pointer, which will (general) omit all of the
dispatch-related logic.


in my current script VM's interpreter, similar is already used:
the input (for this stage), is VM bytecode;
it parses the bytecode into a linked-list structure;
it figures out which opcode functions to call, and places them into the
list nodes.


this way, much of the execution is simply a loop:
cur=ctx->thop;
if(cur)
{
while(cur && !ctx->rs)
{ cur=cur->fcn(ctx, cur); }
ctx->thop=cur;
if(cur)ctx->ip=cur->ip;
return(ctx->rs);
}
(this is a slightly older version, but is clearer).

"ctx->rs" is basically a "return state" value, or essentially an
"interrupt" (when it is non-zero, it indicates a state which needs to be
handled by the calling logic, which is typically things like function
calls/returns, and operations which would block).

"ctx->ip" is basically the bytecode IP address.

in all, performance seems reasonably good (although granted, still a bit
short of native speeds).


I think I was getting a 2-3x slowdown (vs C) with similar logic in some
specially written tests (involving performing math operations and
similar). my actual interpreter is a bit slower though (closer to a
75-100x slowdown, for test like "fib()" and "bubble sort"), but is not
nearly so highly micro-optimized.

the specialized tests assumed static types (they were written assuming a
new VM architecture), whereas most of my interpreter logic is still
dynamically typed (I ended up later adding this to my preexisting VM,
replacing the use of dispatching via a giant switch table).

ironically, this was more done for "flexibility" than for performance,
but performance did improve some as well.


most of the time, however, goes into other things (not particularly
related to instruction dispatch), and optimization effort has been
sparse as it is "generally fast enough" for what I am using it for (in
actual use, it is often relatively faster, as dedicated C code does most
of the "heavy lifting").

ironically, the major time-waster at the moment is the logic for
getting/setting local variables.


or such...
 
L

luser- -droog

luser- -droog wrote:

) I'm working on a redesign of my postscript interpreter; and I'm
) trying to upgrade each module as I lock it in. Gprof runs of my
) current program (code.google.com/p/xpost) constantly list
) opexec() as the worst offender. And it really is a mess.
)
) It attempts to perform the tasks of selecting the appropriate
) form of polymorphic operators, checking the types on the
) stack, and calling the resulting operator function through a
) variable-argument switch-rig.
)
)
)     /* Execute an operator by opcode.
)        Examines stack to match against the patterns in signature.
)        Calls operator function with arguments.
)        if no signatures for the opcode match,
)        it issues an error corresponding to the last failed check
)        (either stackunderflow or typecheck)
)      */
)     void opexec (state *st, size opcode) {
)         oper op = optab[opcode];
)         int i, j;
)         bool pass;
)         object *siq = tos; /* stack-segment in question */
)         int err = typecheck;
)
)         current_op = opcode;
)         if (op.n == 0) error(st,unregistered);
)
)         for (i=0; i<op.n; i++) { /* try each signature */
)             pass = true;
)             current_op_sig = i;
)             siq = tos-op.sig.n;
)             if (tos-os < op.sig.n) { err = stackunderflow;
) continue; }
)             for (j=0; j<op.sig.n; j++) { /* check types */
)                 /* j indexes both siq[] and op.sig.t[] :
)                    the stack and the pattern, respectively. */
)
)                 /* anytype matches anything */
)                 if (op.sig.t[j] == anytype) continue;
)
)                 /* exact match */
)                 if (siq[j].tag == op.sig.t[j]) continue;
)
)                 /* the numbertype pattern matches integers or reals */
)                 if ( op.sig.t[j] == numbertype
)                   && ( siq[j].tag == integertype
)                       || siq[j].tag == realtype) ) continue;
)
)                 /* the floattype pattern causes automatic coercion
)                    of integers to reals, ie. ints float up */
)                 if ( op.sig.t[j] == floattype ) {
)                     if (siq[j].tag == integertype) promote(siq[j]);
)                     if (siq[j].tag == realtype)continue;
)                 }
)
)                 /* the proctype matches executable arrays */
)                 if (op.sig.t[j] == proctype
)                   && siq[j].tag == arraytype
)                   && siq[j].flags.lit == 0) continue;
)
)                 /* all these possibilities failed */
)                 pass = false;
)                 err = typecheck;
)                 break;
)             }
)             if (pass) goto call;
)         }
)         tos = siq; /* pop the stack to the 'stack in question' */
)         error(st,err); /* pop before error since error will adjust the
) stack */
)         return;
)
)     call: /* pass args bottom-up */
)         tos = siq; /* pop the stack to the 'stack in question' */
)         /* room for output? */
)         if (tos-os + op.sig.out > OSSIZE) error(st,stackoverflow);
)         switch (op.sig.n) {
)             case 0: op.sig.fp(st); break;
)             case 1: op.sig.fp(st, siq[0]); break;
)             case 2: op.sig.fp(st, siq[0], siq[1]); break;
)             case 3: op.sig.fp(st, siq[0], siq[1], siq[2]); break;
)             case 4: op.sig.fp(st, siq[0], siq[1], siq[2], siq[3]);
) break;
)             case 5: op.sig.fp(st, siq[0], siq[1], siq[2], siq[3],
) siq[4]); break;
)             case 6: op.sig.fp(st, siq[0], siq[1], siq[2], siq[3],
) siq[4], siq[5]); break;
)             default: error(st,unregistered);
)         }
)     }
)
) So I had this great idea to do bitmasks and talked it up in clp
)http://groups.google.com/group/comp.lang.postscript/browse_thread/thr...
) but I never get much feedback there on this sort of thing.
)
) I think I can replace my type arrays (op.sig[].t[]) with a
) big integer filled with byte masks and I can accept or
) reject a candidate function with an & and a ==.
) But a problem arises with the "anytype" pattern.
) The two options I conceive are
) * multiple patterns checking each bit of just the relevant
)   nibble of the byte.
) * devoting an entire bit to valid/invalid.
)
) And it seems I'm pouting about these choices.
) I don't like either one and I can't think of anything else.
)
) I suppose I haven't even really formulated a question here.
)
) Thoughts/Ramblings/Anecdotes anyone?

Looks like there's a maximum of 6 arguments to check, and 3 possible
types to check against?  That's just 18 bits, isn't it?


That's the kind of important information I omitted! :(
There are 19 types listed in the PRLM (of which I only use 13,
presently)

_Simple_ _Composite_
boolean array
fontID(*not used) condition(*)
integer dictionary
mark file
name gstate(*)
null lock(*)
operator packedarray(*)
real string
save

So in addition to matching specific types (like `string index int -put-
`)
There are "patterns" of types, like `array index any -put-`
or `dict any any -put-`. And I've got a C-function that performs
the type-specific operation with a mangled name like SIIput, YIAput,
DAAput.
 
L

luser- -droog

It attempts to perform the tasks of selecting the appropriate
form of polymorphic operators, checking the types on the
stack, and calling the resulting operator function through a
variable-argument switch-rig.
   void opexec (state *st, size opcode) {

I think I can replace my type arrays (op.sig[].t[]) with a
big integer filled with byte masks and I can accept or
reject a candidate function with an & and a ==.
But a problem arises with the "anytype" pattern.
The two options I conceive are
* multiple patterns checking each bit of just the relevant
 nibble of the byte.
* devoting an entire bit to valid/invalid.
And it seems I'm pouting about these choices.
I don't like either one and I can't think of anything else.
I suppose I haven't even really formulated a question here.
Thoughts/Ramblings/Anecdotes anyone?

You have up to six operands/parameters for each op? I take it these aren't
the normal C-style operators then, but Postscript functions too?

Correct. 6 is the most I've needed so far.
How many signatures does a typical op have? Would each signature (for that
op) have the same number of arguments?

There are different commonalities in different areas.
Math ops are usually 1 or 2. Array and Dict and String
are usually 2 or 3. Graphics ops are usually multiples of 2.
What proportion of ops (both in static number, and the ones actually
executed) have more than one or two arguments?

And, how many different types are there? (BTW what's the difference between
floattype and realtype?)

Floattype is the "automatic promotion" from int to real.
Whereas numbertype is either int or real.
This sort of stuff can be streamlined by using tables, but that's difficult
with up to six operands. But you can use separate code for the cases where
there is only one operand, or only two.

You also seem to be doing operand promotion in the loop, before you know
whether this particular signature will match; an integer promoted to real
now, might not then match a integer sig in the next loop. (Perhaps you
needn't bother doing this sort of promotion here at all; it can be done
inside the op-function.)

Yes, that does help simplify.

(I've tried to relate this to what I do, with type-dispatching on two
operands, for an interpreter (dispatching on a single operand is trivial)..

Each type is represented by a number, and every likely combination of two
types is given a 'typesig' (another small number). A 2D matrix is used to
convert the two types of the operands on the stack, to a single typesig.
Each binary operator in the language uses a linear table of function
pointers, indexed by the typesig code.

Where an illegal combinations of operand types is used, this maps to a
special typesig, which calls an error function. In C, the whole thing
roughly looks like this (for an example operator 'add'):

  add_fntable[sigmatrix[x->tag][y->tag]]();

If the types were few, then add_fntable could itself be a matrix:
add_fntable[x->tag][y->tag]. But there are no loops.

Your scheme has patterns such as 'anytype'; I wouldn't bother even looking
at it. Just make sure it ends up calling a suitable handler.

So if there was an operator OP(anytype,anytype), then I would make sure that
every value of OP_fntable[x->tag][y->tag] contained the address of the
implementation of op(anytype,anytype).

All this does mean that you'd need to treat 1, 2 or N operands separately,
but if you want it fast..)
 
B

BartC

luser- -droog said:
That's the kind of important information I omitted! :(
There are 19 types listed in the PRLM (of which I only use 13,
presently)

_Simple_ _Composite_
boolean array
fontID(*not used) condition(*)
integer dictionary
mark file
name gstate(*)
null lock(*)
operator packedarray(*)
real string
save

So in addition to matching specific types (like `string index int -put-
`)
There are "patterns" of types, like `array index any -put-`
or `dict any any -put-`. And I've got a C-function that performs
the type-specific operation with a mangled name like SIIput, YIAput,
DAAput.

So, about 20 types. From 1 to 6 operands/operator (or parameters/function).

We don't yet know how many different operators (and maybe the user (ie.
programmer) can add more, so that could rule out certain more static
solutions.). Nor do we know the typical proportion of operators with 1 or 2
operands (so if it was 90%, then that could certainly be streamlined).

But do we know what proportion of runtime is spent inside opexec()? Is that
just disentangling the parameters, or does it include eventually calling the
operator in question? If the latter, then you need to isolate the runtime
spent inside opexec() itself. If it is a small part of overall runtime, then
it might not be worth doing anything too complicated.

There are a few small things that can be looked at:

o Have separate, dedicated code to deal with one and two-operand cases. Then
you don't have the loop overheads, and don't need to use ,[j] indexing.

o As it is now, you do a check for a perfect match on parameter j, then do
more complicated checks, then do j+1. With the dedicated one- and
two-operand code especially, it might be worth doing a perfect check for all
parameters (and perhaps on all signatures), before the detailed checks. It
depends on the proportion of lookups that will likely be perfect matches
anyway.

o You're doing lots of double-indexing inside an inner loop. While the
compiler should optimise this stuff, it looks inefficient!

o Since you're accessing the same siq[] tags several times in each outer
loop, it might be worth extracting the tags once at the start (again, this
will be easier if you have dedicated code for 1 or 2 operands, perhaps even
for all N operands).

o How many operators have 'anytype' for all operands? It might be worth
marking such an operator, then no checks need to be done at all!

o It might also be worth marking operators where
numbertype/floattype/proctype don't appear at all; then these will only need
an 'anytype' match' or a perfect match, allowing more streamlining.

Etc. But if this doesn't make a difference, and the body of opexec() is
still the main bottleneck, then some more different approaches might be
needed. Also, forget Gprof, build-in your own profiling code; build
histograms of each operator, and how many times it's called, a histogram of
how many of each operand count is used (eg. 10% 1-operand, 80% 2-operand,
that sort of thing).

Willem suggested using bitmasks to help in matching. With 6 operands, of 5
bits each to encode each of 19 types, that fits neatly into 32 bits, and
might be worth trying. *But* this information must be already prepacked
(don't bother doing it inside opexec()). You then have to create the same
packing for the actual operands, and do it at the start of opexec() (and try
and avoid loops). This will help detecting perfect matches when there are
many operands.
 
L

luser- -droog

luser- -droog wrote:
) I'm working on a redesign of my postscript interpreter; and I'm
) trying to upgrade each module as I lock it in. Gprof runs of my
) current program (code.google.com/p/xpost) constantly list
) opexec() as the worst offender. And it really is a mess.
)
) It attempts to perform the tasks of selecting the appropriate
) form of polymorphic operators, checking the types on the
) stack, and calling the resulting operator function through a
) variable-argument switch-rig.
)
)
)     /* Execute an operator by opcode.
)        Examines stack to match against the patterns in signature.
)        Calls operator function with arguments.
)        if no signatures for the opcode match,
)        it issues an error corresponding to the last failed check
)        (either stackunderflow or typecheck)
)      */
)     void opexec (state *st, size opcode) {
)         oper op = optab[opcode];
)         int i, j;
)         bool pass;
)         object *siq = tos; /* stack-segment in question */
)         int err = typecheck;
)
)         current_op = opcode;
)         if (op.n == 0) error(st,unregistered);
)
)         for (i=0; i<op.n; i++) { /* try each signature */
)             pass = true;
)             current_op_sig = i;
)             siq = tos-op.sig.n;
)             if (tos-os < op.sig.n) { err = stackunderflow;
) continue; }
)             for (j=0; j<op.sig.n; j++) { /* check types */
)                 /* j indexes both siq[] and op.sig..t[] :
)                    the stack and the pattern, respectively. */
)
)                 /* anytype matches anything */
)                 if (op.sig.t[j] == anytype) continue;
)
)                 /* exact match */
)                 if (siq[j].tag == op.sig.t[j])continue;
)
)                 /* the numbertype pattern matches integers or reals */
)                 if ( op.sig.t[j] == numbertype
)                   && ( siq[j].tag == integertype
)                       || siq[j].tag == realtype) ) continue;
)
)                 /* the floattype pattern causes automatic coercion
)                    of integers to reals, ie. intsfloat up */
)                 if ( op.sig.t[j] == floattype ) {
)                     if (siq[j].tag == integertype) promote(siq[j]);
)                     if (siq[j].tag == realtype) continue;
)                 }
)
)                 /* the proctype matches executable arrays */
)                 if (op.sig.t[j] == proctype
)                   && siq[j].tag == arraytype
)                   && siq[j].flags.lit == 0) continue;
)
)                 /* all these possibilities failed */
)                 pass = false;
)                 err = typecheck;
)                 break;
)             }
)             if (pass) goto call;
)         }
)         tos = siq; /* pop the stack to the 'stack in question' */
)         error(st,err); /* pop before error since error will adjust the
) stack */
)         return;
)
)     call: /* pass args bottom-up */
)         tos = siq; /* pop the stack to the 'stack in question' */
)         /* room for output? */
)         if (tos-os + op.sig.out > OSSIZE) error(st,stackoverflow);
)         switch (op.sig.n) {
)             case 0: op.sig.fp(st); break;
)             case 1: op.sig.fp(st, siq[0]); break;
)             case 2: op.sig.fp(st, siq[0], siq[1]); break;
)             case 3: op.sig.fp(st, siq[0], siq[1], siq[2]); break;
)             case 4: op.sig.fp(st, siq[0], siq[1], siq[2], siq[3]);
) break;
)             case 5: op.sig.fp(st, siq[0], siq[1], siq[2], siq[3],
) siq[4]); break;
)             case 6: op.sig.fp(st, siq[0], siq[1], siq[2], siq[3],
) siq[4], siq[5]); break;
)             default: error(st,unregistered);
)         }
)     }
)
) So I had this great idea to do bitmasks and talked it up in clp
)http://groups.google.com/group/comp.lang.postscript/browse_thread/thr....
) but I never get much feedback there on this sort of thing.
)
) I think I can replace my type arrays (op.sig[].t[]) with a
) big integer filled with byte masks and I can accept or
) reject a candidate function with an & and a ==.
) But a problem arises with the "anytype" pattern.
) The two options I conceive are
) * multiple patterns checking each bit of just the relevant
)   nibble of the byte.
) * devoting an entire bit to valid/invalid.
)
) And it seems I'm pouting about these choices.
) I don't like either one and I can't think of anything else.
)
) I suppose I haven't even really formulated a question here.
)
) Thoughts/Ramblings/Anecdotes anyone?

Looks like there's a maximum of 6 arguments to check, and 3 possible
types to check against?  That's just 18 bits, isn't it?

That's the kind of important information I omitted! :(
There are 19 types listed in the PRLM (of which I only use 13,
presently)

_Simple_          _Composite_
boolean           array
fontID(*not used) condition(*)
integer           dictionary
mark              file
name              gstate(*)
null              lock(*)
operator          packedarray(*)
real              string
save

So in addition to matching specific types (like `string index int -put-
`)
There are "patterns" of types,


[sorry, my brain jumped tracks here.]

.... patterns like 'proc' which is an array with the executable bit.
And there are polymorphic operators ...
like `array index any -put-`
or `dict any any -put-`. And I've got a C-function that performs
the type-specific operation with a mangled name like SIIput, YIAput,
DAAput.

And there are a few oddballs like 'copy' which takes one integer OR
a pair of matching composites (dict dict, string string).

My thought was that I could pack 8 byte-sized tags into a 64-bit int,
and avoid counting the stack all the time as a bonus. The portion
off the bottom of the stack would be represented in the stack "digest"
as all-bits-zero (an invalid object), and the mask for the operator
signature would be zero beyond the interesting part. So checking
a signature would go something like:

digest = getdigest(); // one loop to take the "snapshot"
if ((digest & op[opcode].sig[j].mask) == op[opcode].sig[j].match)
opcall(opcode, j);

One of the motives for this "mask & match" idea was to do patterns by
modifying the mask. If integertype is 6 and realtype is 7, then
mask=6, match=6 should snag both.
 
L

luser- -droog

maybe add a second stage:
first stage processes input, and maybe builds opcodes as a list or
table, and resolves all operations to function pointers;
second stage executes code simply by calling function pointers.

argument and type-specialization can be handled by selecting an
appropriate function pointer, which will (general) omit all of the
dispatch-related logic.

Unfortunately, this just isn't possible with postscript.
There's no way to predict what an operator will find until
it happens. Because you can extract bound operators out
of procedure bodies and they must work the same as if
mentioned by name.

So I can't just factor-out the dispatching.
 
L

luser- -droog

That's the kind of important information I omitted! :(
There are 19 types listed in the PRLM (of which I only use 13,
presently)
_Simple_          _Composite_
boolean           array
fontID(*not used) condition(*)
integer           dictionary
mark              file
name              gstate(*)
null              lock(*)
operator          packedarray(*)
real              string
save
So in addition to matching specific types (like `string index int -put-
`)
There are "patterns" of types, like `array index any -put-`
or `dict any any -put-`. And I've got a C-function that performs
the type-specific operation with a mangled name like SIIput, YIAput,
DAAput.

So, about 20 types. From 1 to 6 operands/operator (or parameters/function).

We don't yet know how many different operators (and maybe the user (ie.
programmer) can add more, so that could rule out certain more static
solutions.). Nor do we know the typical proportion of operators with 1 or2
operands (so if it was 90%, then that could certainly be streamlined).

But do we know what proportion of runtime is spent inside opexec()? Is that
just disentangling the parameters, or does it include eventually calling the
operator in question? If the latter, then you need to isolate the runtime
spent inside opexec() itself. If it is a small part of overall runtime, then
it might not be worth doing anything too complicated.

There are a few small things that can be looked at:

o Have separate, dedicated code to deal with one and two-operand cases. Then
you don't have the loop overheads, and don't need to use ,[j] indexing..

o As it is now, you do a check for a perfect match on parameter j, then do
more complicated checks, then do j+1. With the dedicated one- and
two-operand code especially, it might be worth doing a perfect check for all
parameters (and perhaps on all signatures), before the detailed checks. It
depends on the proportion of lookups that will likely be perfect matches
anyway.

o You're doing lots of double-indexing inside an inner loop. While the
compiler should optimise this stuff, it looks inefficient!

o Since you're accessing the same siq[] tags several times in each outer
loop, it might be worth extracting the tags once at the start (again, this
will be easier if you have dedicated code for 1 or 2 operands, perhaps even
for all N operands).

o How many operators have 'anytype' for all operands? It might be worth
marking such an operator, then no checks need to be done at all!

o It might also be worth marking operators where
numbertype/floattype/proctype don't appear at all; then these will only need
an 'anytype' match' or a perfect match, allowing more streamlining.

Etc. But if this doesn't make a difference, and the body of opexec() is
still the main bottleneck, then some more different approaches might be
needed. Also, forget Gprof, build-in your own profiling code; build
histograms of each operator, and how many times it's called, a histogram of
how many of each operand count is used (eg. 10% 1-operand, 80% 2-operand,
that sort of thing).

Willem suggested using bitmasks to help in matching. With 6 operands, of 5
bits each to encode each of 19 types, that fits neatly into 32 bits, and
might be worth trying. *But* this information must be already prepacked
(don't bother doing it inside opexec()). You then have to create the same
packing for the actual operands, and do it at the start of opexec() (and try
and avoid loops). This will help detecting perfect matches when there are
many operands.



Thanks, Bart! I expect to keep returning to this message to tease out
its
wisdom like a Vedanta Sutra.

I suspect you've already answered any questions I may have, I just
need to
"do the work" now. Thanks a bunch!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top