S
Skybuck Flying
Hi,
I needed a method to determine if a point was on a line segment in 2D. So I
googled for some help and so far I have evaluated two methods.
The first method was only a formula, the second method was a piece of C code
which turned out to be incorrect and incomplete but by modifieing it would
still be usuable.
The first method was this piece of text:
"
if x is between StartX and EndX or if y is between StartY and EndY, then the
point (x,y) is on the line.
Another way is the parametric form of a line,ie:
X = StartX + t0 * (EndX - StartX)
Y = StartY + t1 * (EndY - StartY)
calculate t0 and t1, if t0 and t1 are equal, then the point is on the line
if the parameter (t0 = t1) is between 0 and 1, the the point is
on the segment.
"
The second method was this piece of text:
"
Very simple,
First test if the x-value of the point is within the interval StartX,EndX.
If this is not true the point is not on the line.
int on_line(x, y, StartX, StartY, EndX, EndY)
{ int dx, dy, dx1, dy1;
if (x<StartX || x>EndX) return 0;
dx = EndX-StartX;
dy = EndY-StartY;
dx1 = x-StartX;
dy1 = y-StartY;
return (dx*dy1 == dy*dx1);
}
"
So after reading the first method I needed to be able to calculate T0 and T1
since all other variables are known the formula's can be rewritten as to
calculate T0 and T1.
Changing the formula's to get the final formula:
Step1 original:
X = StartX + t0 * (EndX - StartX)
Y = StartY + t1 * (EndY - StartY)
Step2 delta's introduced:
DeltaX = EndX - StartX
DeltaY = EndY - Starty
X = StartX + t0 * (DeltaX)
Y = StartY + t1 * (DeltaY)
Step3 switch
StartX + t0 * (DeltaX) = X
StartY + t1 * (DeltaY) = Y
Step4 start to isolate t0 and t1 by moving StartX and StartY to the
otherside
(thanks to my math teacher for those usefull lessons =D)
t0 * (DeltaX) = X - StartX
t1 * (DeltaY) = Y - StartY
Step5 isolate to and t1 further by moving delta's to the other side
(thanks to my math teacher again for those usefull lessons =D)
t0 = (X - StartX)/DeltaX
t1 = (Y - StartY)/DeltaY
If you look at the original formula you will notice how t0 and t1 are sort
of distances on the line segment itself ranging from 0.0 to 1.0 (like a
percentage).
The second method looks a little bit similair but works totally different.
It multiplies the delta's with the line segment with the delta's with the
start and point. Which kinda looks like a dot product.. I dont know what
that's all about
Maybe this is just another way of calculating how much
distance each component has. (There are two components x and y, to be on the
line they both need to have the same distance from the start point for
example
ofcourse this could mean a circle... but since you working with
delta's there is only one possible direction)
First of all the C method is incorrect unless the coordinates are pre-sorted
from left to right. If the coordinates are not sorted, for example
(100,10)-(10,50) (going from right to left) the method will simply exit
because of the first if statement which is just wrong. So this is easily
fixed by removing that if statement.
However I also consider the C method to be incomplete. It's fine if one
wants to calculate if a point is on an infinite line. It's impossible to
tell if the original poster wanted a line segment or just a line... he does
mention an start and end point... For my purposes however I need to be able
to tell if a point is on a line segment. So this C method needs to be
modified/expanded to accomadate for it.
The fascination starts with the C method looking very fast. Just a few lines
of code and a multiplication.
It also looks a bit dangerous because of the multiplication which could
overflow... but all in all not to bad/shabby and definetly worth a try.
The first method looks kinda big and slower because of more variables/code
and divisions etc, but it does look complete and stable and robust.
So now for a reality check.
I have implemented both versions in pascal/delphi which I believe to produce
correct results everytime.
The question is which method is faster. I would place my money on the first
method. Simply because the code is shorter and it has to do less floating
point comparisions and less jumps than the second method.
The C method (second method) actually needs a bounding box to be constructed
to be usefull (for line segment tests). The bounding box is necessary to
determine if the point is inside the line segment. Therefore I believe the C
method in this case to be slower. It requires more instructions, more
comparisions and more jumps.
I think I have done a fair job of optimizing both implementations by trying
to reduce the number of comparisions, jumps and optimizing for branch
prediction (most likely case fall through etc).
I spent lot's of time optimizing method1 to reduce the number of compares by
first checking one
delta before checking the next etc.. that simply saved one or two compares
and jumps compared
to previous versions.
I also spent some time optimizing method2 but probably a little bit less
time. I inlined the bounding box calculation and I also postponed it until
after the if statement etc as to not execute if the if statement is false.
Finally the point needs to be checked to be inside or outside the bounding
box. Checking for outside is easier/shorter for multiple bounding so I coded
it like that. However in this case there is only one bounding box so in this
case it doesn't matter and it could also be checked to be inside. The
generated assembler would however be the same so it doesn't matter.
For my purposes the diagonal case will be the most likely and occuring case
so it's most interesting to see which method would be faster.
The first method requires 53 instructions for the diagonal case (with point
on line)
The second method requires 81 instructions for the diagonal case (with point
on line)
Both methods require the same ammounts of pushes onto the stack so those
pushes have not been counted.
The current implementations look like this:
v3 is the first method
v5 is the second method
While writing this post I discovered a flaw in v3, fixed it, and even
optimized it.
I think v3 should now be much faster for case horizontal and case vertical
than v5.
So the main focus is on the diagonal case.
My question to you the delphi, electronics and computer architecture
community is:
Which code do you think is faster ?
Can you explain it purely based on theory without doing any performance
measurements ?
For example: number of instructions, instruction cycles, branch prediction,
pipe line optimizations, pairing, etc.
I shall provide you with the current pascal/delphi and the assembler code:
If you think you really are an expert in the optimization
field/instruction/hardware/assembler field then next question will be
ofcourse:
Can you optimize these methods or come up with a faster method ?
Quite frankly I am obsessed with this code. It's such a trivial code that I
dont want my PC to be spending to much time on it... currently this code is
not being used in any algorithms. I could use alternative information from a
line segment intersection test algorithm which can already determine if one
of the 4 points is on a line segment yes/no etc... but that would make the
routine's header much more complex and maybe I dont want that at first at
least.... so I might choose to use a routine like below to keep things a bit
more simple and pragmatic... etc.
So after each line segment intersection test this routine would also need to
be called, so it would be called a lot, so I dont want it to be a lot of
overhead if you know what I mean
And even if this code will never be used it's still quite fascinating and
educational to see/learn how floating point optimizations work etc... that's
the main reason why I am investigating this and what you to have a look at
this because I am puzzled =D
Ok now the subjective stuff:
My gutt feeling says v3 is faster than v5 because v3 uses less compares and
less jumps which should be good for branch prediction etc.. v3 also has less
instructions.
However v3 uses two divisions (fdiv) which might be a lot slower than
multiplications (fmul) which version v5 uses two of. However my gutt feeling
says v5 has the bounding box thing going on etc so it needs to do more
stuff/instructions which could make it slow
Add the unknown factors like branch prediction, pipe lining, pairing and god
knows what else and you start to see the doubt hehehehehe.
Can you sort it out ?
=D
Or do you need performance measurements like me ?
And let's be honest about that... How are we going to do accurate
performance measurements ?
Lot's of context switching, multi threading, harddisk activity going on, on
windows xp, so I am not too sure if that is reliable but ok
Here is the pascal/delphi code:
function point_on_line_segment_in_2d_v3( StartX, StartY : double;
EndX, EndY : double;
PointX, PointY : double ) : boolean;
var
vDeltaX : double;
vDeltaY : double;
vDistanceXcomponent : double;
vDistanceYcomponent : double;
begin
// default is false
result := false;
vDeltaX := EndX - StartX;
vDeltaY := EndY - StartY;
// if vDeltaX is not zero then that means the line is horizontal or
diagonal
if (vDeltaX <> 0) then
begin
// if vDeltaY is not zero then that means the line is diagonal
if (vDeltaY <> 0) then
begin
// line is diagonal
// calculate both components and look if they are the same
// if yes then see if they both lie between 0 and 1.
vDistanceXcomponent := (PointX - StartX) / vDeltaX;
vDistanceYcomponent := (PointY - StartY) / vDeltaY;
if (vDistanceXcomponent = vDistanceYcomponent) then
begin
if (vDistanceXcomponent>0) and (vDistanceXcomponent<1) then
begin
result := true;
end;
end;
end else
// else it means it is horizontal
begin
// line is horizontal
// check if the point is at the same y position as the horizontal line
if (PointY = StartY) then
begin
if StartX < EndX then
begin
if (PointX > StartX) and (PointX < EndX) then
begin
result := true;
end;
end else
begin
if (PointX > EndX) and (PointX < StartX) then
begin
result := true;
end;
end;
// not necessary simply check if between startx and endx or
// vica versa depending on if endx is greater then startx etc
// since fdiv seems to be slow let's do it with some compares
// see above
{
// calculate and look at the horizontal component if it lies between 0
and 1.
vDistanceXcomponent := (PointX - StartX) / vDeltaX;
if (vDistanceXComponent>0) and (vDistanceXComponent<1) then
begin
result := true;
end;
}
end;
end;
end else
// else that means the line is vertical or a point
// if vDeltaY is not zero then that means the line is vertical
if (vDeltaY <> 0) then
begin
// line is vertical
// check if point is at same x position as the vertical line.
if (PointX = StartX) then
begin
if StartY < EndY then
begin
if (PointY > StartY) and (PointY < EndY) then
begin
result := true;
end;
end else
begin
if (PointY > EndY) and (PointY < StartY) then
begin
result := true;
end;
end;
{
// calculate and see if the vertical component lies between 0 and 1
vDistanceYcomponent := (PointY - StartY) / vDeltaY;
if (vDistanceYComponent>0) and (vDistanceYComponent<1) then
begin
result := true;
end;
}
end;
// not necessary to do ith with divisions etc.
end;
// else the line is not a line but a point so exit
end;
function point_on_line_segment_in_2d_v5( StartX, StartY : double;
EndX, EndY : double;
PointX, PointY : double ) : boolean;
var
vDeltaX : double;
vDeltaY : double;
vDistanceX : double;
vDistanceY : double;
vBoundingBoxX1 : double;
vBoundingBoxY1 : double;
vBoundingBoxX2 : double;
vBoundingBoxY2 : double;
begin
result := false;
vDeltaX := EndX - StartX;
vDeltaY := EndY - StartY;
vDistanceX := PointX - StartX;
vDistanceY := PointY - StartY;
if (vDeltaX * vDistanceY) = (vDeltaY * vDistanceX) then
begin
if (StartX<EndX) then
begin
vBoundingBoxX1 := StartX;
vBoundingBoxX2 := EndX;
end else
begin
vBoundingBoxX1 := EndX;
vBoundingBoxX2 := StartX;
end;
if (StartY<EndY) then
begin
vBoundingBoxY1 := StartY;
vBoundingBoxY2 := EndY;
end else
begin
vBoundingBoxY1 := EndY;
vBoundingBoxY2 := StartY;
end;
// I'll check if the point is outside to determine the overlap etc.
// which is consistent which how it's done in the line segment
intersection/overlap
// test where two bounding boxes need to be checked for overlap.
// determining if they are outside is much easier and shorter then
checking if each
// point is inside
// so let's not be stupid and stick with it
// determining overlap should be done this way:
// checking for outside or on it if not then assume it's inside.
if (PointX <= vBoundingBoxX1) or (PointX >= vBoundingBoxX2) or
(PointY <= vBoundingBoxY1) or (PointY >= vBoundingBoxY2) then exit;
result := true;
// which is the same as:
// assembler generated is exactly the same if I am not mistaken
// this code looks cleaner tough...
// if (PointX > vBoundingBoxX1) and (PointX < vBoundingBoxX2) and
// (PointY > vBoundingBoxY1) and (PointY < vBoundingBoxY2) then
// begin
// result := true;
// end;
end;
end;
And here is some simple test code:
var
a1x, a1y,
a2x, a2y,
px, py : double;
begin
a1x := 10;
a1y := 10;
a2x := 100;
a2y := 100;
px := 50;
py := 50;
if point_on_line_segment_in_2d_v3( a1x, a1y,
a2x, a2y,
px, py ) then
begin
ShowMessage('yes');
end else
begin
ShowMessage('no');
end;
if point_on_line_segment_in_2d_v5( a1x, a1y,
a2x, a2y,
px, py ) then
begin
ShowMessage('yes');
end else
begin
ShowMessage('no');
end;
Here is the assembler code. (since delphi's cpu window does not allow easy
copy/paste this code has been taking from w32dasm disassembler, which
shouldn't matter. Actually the disassembly is better since it shows exactly
where jumps jump too
(Taken from the "release" version without debugging information... I think
the assembler is the same for the debugging version so it doesnt matter
(?)):
// assembler for v3:
// the call:
004523DE FF742404 push [esp+04]
:004523E2 FF742404 push [esp+04]
:004523E6 FF742414 push [esp+14]
:004523EA FF742414 push [esp+14]
:004523EE FF742424 push [esp+24]
:004523F2 FF742424 push [esp+24]
:004523F6 FF742434 push [esp+34]
:004523FA FF742434 push [esp+34]
:004523FE FF742444 push [esp+44]
:00452402 FF742444 push [esp+44]
:00452406 FF742454 push [esp+54]
:0045240A FF742454 push [esp+54]
:0045240E E839FDFFFF call 0045214C
:00452413 84C0 test al, al
:00452415 740C je 00452423
// the routine:
:0045214C 55 push ebp
:0045214D 8BEC mov ebp, esp
:0045214F 83C4E0 add esp, FFFFFFE0
:00452152 33D2 xor edx, edx
:00452154 DD4520 fld qword ptr [ebp+20]
:00452157 DC6530 fsub qword ptr [ebp+30]
:0045215A DD5DF8 fstp qword ptr [ebp-08]
:0045215D 9B wait
:0045215E DD4518 fld qword ptr [ebp+18]
:00452161 DC6528 fsub qword ptr [ebp+28]
:00452164 DD5DF0 fstp qword ptr [ebp-10]
:00452167 9B wait
:00452168 DD45F8 fld qword ptr [ebp-08]
:0045216B D81D88224500 fcomp dword ptr [00452288]
:00452171 DFE0 fstsw ax
:00452173 9E sahf
:00452174 0F84B0000000 je 0045222A
:0045217A DD45F0 fld qword ptr [ebp-10]
:0045217D D81D88224500 fcomp dword ptr [00452288]
:00452183 DFE0 fstsw ax
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00452114(C)
|
:00452185 9E sahf
:00452186 7454 je 004521DC
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00452112(C)
|
:00452188 DD4510 fld qword ptr [ebp+10]
:0045218B DC6530 fsub qword ptr [ebp+30]
:0045218E DC75F8 fdiv qword ptr [ebp-08]
:00452191 DD5DE8 fstp qword ptr [ebp-18]
:00452194 9B wait
:00452195 DD4508 fld qword ptr [ebp+08]
:00452198 DC6528 fsub qword ptr [ebp+28]
:0045219B DC75F0 fdiv qword ptr [ebp-10]
:0045219E DD5DE0 fstp qword ptr [ebp-20]
:004521A1 9B wait
:004521A2 DD45E8 fld qword ptr [ebp-18]
:004521A5 DC5DE0 fcomp qword ptr [ebp-20]
:004521A8 DFE0 fstsw ax
:004521AA 9E sahf
:004521AB 0F85CF000000 jne 00452280
:004521B1 DD45E8 fld qword ptr [ebp-18]
:004521B4 D81D88224500 fcomp dword ptr [00452288]
:004521BA DFE0 fstsw ax
:004521BC 9E sahf
:004521BD 0F86BD000000 jbe 00452280
:004521C3 DD45E8 fld qword ptr [ebp-18]
:004521C6 D81D8C224500 fcomp dword ptr [0045228C]
:004521CC DFE0 fstsw ax
:004521CE 9E sahf
:004521CF 0F83AB000000 jnb 00452280
:004521D5 B201 mov dl, 01
:004521D7 E9A4000000 jmp 00452280
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00452186(C)
|
:004521DC DD4508 fld qword ptr [ebp+08]
:004521DF DC5D28 fcomp qword ptr [ebp+28]
:004521E2 DFE0 fstsw ax
:004521E4 9E sahf
:004521E5 0F8595000000 jne 00452280
:004521EB DD4530 fld qword ptr [ebp+30]
:004521EE DC5D20 fcomp qword ptr [ebp+20]
:004521F1 DFE0 fstsw ax
:004521F3 9E sahf
:004521F4 731A jnb 00452210
:004521F6 DD4510 fld qword ptr [ebp+10]
:004521F9 DC5D30 fcomp qword ptr [ebp+30]
:004521FC DFE0 fstsw ax
:004521FE 9E sahf
:004521FF 767F jbe 00452280
:00452201 DD4510 fld qword ptr [ebp+10]
:00452204 DC5D20 fcomp qword ptr [ebp+20]
:00452207 DFE0 fstsw ax
:00452209 9E sahf
:0045220A 7374 jnb 00452280
:0045220C B201 mov dl, 01
:0045220E EB70 jmp 00452280
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:004521F4(C)
|
:00452210 DD4510 fld qword ptr [ebp+10]
:00452213 DC5D20 fcomp qword ptr [ebp+20]
:00452216 DFE0 fstsw ax
:00452218 9E sahf
:00452219 7665 jbe 00452280
:0045221B DD4510 fld qword ptr [ebp+10]
:0045221E DC5D30 fcomp qword ptr [ebp+30]
:00452221 DFE0 fstsw ax
:00452223 9E sahf
:00452224 735A jnb 00452280
:00452226 B201 mov dl, 01
:00452228 EB56 jmp 00452280
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00452174(C)
|
:0045222A DD45F0 fld qword ptr [ebp-10]
:0045222D D81D88224500 fcomp dword ptr [00452288]
:00452233 DFE0 fstsw ax
:00452235 9E sahf
:00452236 7448 je 00452280
:00452238 DD4510 fld qword ptr [ebp+10]
:0045223B DC5D30 fcomp qword ptr [ebp+30]
:0045223E DFE0 fstsw ax
:00452240 9E sahf
:00452241 753D jne 00452280
:00452243 DD4528 fld qword ptr [ebp+28]
:00452246 DC5D18 fcomp qword ptr [ebp+18]
:00452249 DFE0 fstsw ax
:0045224B 9E sahf
:0045224C 731A jnb 00452268
:0045224E DD4508 fld qword ptr [ebp+08]
:00452251 DC5D28 fcomp qword ptr [ebp+28]
:00452254 DFE0 fstsw ax
:00452256 9E sahf
:00452257 7627 jbe 00452280
:00452259 DD4508 fld qword ptr [ebp+08]
:0045225C DC5D18 fcomp qword ptr [ebp+18]
:0045225F DFE0 fstsw ax
:00452261 9E sahf
:00452262 731C jnb 00452280
:00452264 B201 mov dl, 01
:00452266 EB18 jmp 00452280
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:0045224C(C)
|
:00452268 DD4508 fld qword ptr [ebp+08]
:0045226B DC5D18 fcomp qword ptr [ebp+18]
:0045226E DFE0 fstsw ax
:00452270 9E sahf
:00452271 760D jbe 00452280
:00452273 DD4508 fld qword ptr [ebp+08]
:00452276 DC5D28 fcomp qword ptr [ebp+28]
:00452279 DFE0 fstsw ax
:0045227B 9E sahf
:0045227C 7302 jnb 00452280
:0045227E B201 mov dl, 01
* Referenced by a (U)nconditional or (C)onditional Jump at Addresses:
|:004521AB(C), :004521BD(C), :004521CF(C), :004521D7(U), :004521E5(C)
|:004521FF(C), :0045220A(C), :0045220E(U), :00452219(C), :00452224(C)
|:00452228(U), :00452236(C), :00452241(C), :00452257(C), :00452262(C)
|:00452266(U), :00452271(C), :0045227C(C)
|
:00452280 8BC2 mov eax, edx
:00452282 8BE5 mov esp, ebp
:00452284 5D pop ebp
:00452285 C23000 ret 0030
// assembler for v5:
// the call:
:0045242D FF742404 push [esp+04]
:00452431 FF742404 push [esp+04]
:00452435 FF742414 push [esp+14]
:00452439 FF742414 push [esp+14]
:0045243D FF742424 push [esp+24]
:00452441 FF742424 push [esp+24]
:00452445 FF742434 push [esp+34]
:00452449 FF742434 push [esp+34]
:0045244D FF742444 push [esp+44]
:00452451 FF742444 push [esp+44]
:00452455 FF742454 push [esp+54]
:00452459 FF742454 push [esp+54]
:0045245D E82EFEFFFF call 00452290
:00452462 84C0 test al, al
:00452464 740C je 00452472
// the routine:
* Referenced by a CALL at Address:
|:0045245D
|
:00452290 55 push ebp
:00452291 8BEC mov ebp, esp
:00452293 83C4C0 add esp, FFFFFFC0
:00452296 33D2 xor edx, edx
:00452298 DD4520 fld qword ptr [ebp+20]
:0045229B DC6530 fsub qword ptr [ebp+30]
:0045229E DD5DF8 fstp qword ptr [ebp-08]
:004522A1 9B wait
:004522A2 DD4518 fld qword ptr [ebp+18]
:004522A5 DC6528 fsub qword ptr [ebp+28]
:004522A8 DD5DF0 fstp qword ptr [ebp-10]
:004522AB 9B wait
:004522AC DD4510 fld qword ptr [ebp+10]
:004522AF DC6530 fsub qword ptr [ebp+30]
:004522B2 DD5DE8 fstp qword ptr [ebp-18]
:004522B5 9B wait
:004522B6 DD4508 fld qword ptr [ebp+08]
:004522B9 DC6528 fsub qword ptr [ebp+28]
:004522BC DD5DE0 fstp qword ptr [ebp-20]
:004522BF 9B wait
:004522C0 DD45F8 fld qword ptr [ebp-08]
:004522C3 DC4DE0 fmul qword ptr [ebp-20]
:004522C6 DD45F0 fld qword ptr [ebp-10]
:004522C9 DC4DE8 fmul qword ptr [ebp-18]
:004522CC DED9 fcompp
:004522CE DFE0 fstsw ax
:004522D0 9E sahf
:004522D1 0F85A8000000 jne 0045237F
:004522D7 DD4530 fld qword ptr [ebp+30]
:004522DA DC5D20 fcomp qword ptr [ebp+20]
:004522DD DFE0 fstsw ax
:004522DF 9E sahf
:004522E0 731A jnb 004522FC
:004522E2 8B4530 mov eax, dword ptr [ebp+30]
:004522E5 8945D8 mov dword ptr [ebp-28], eax
:004522E8 8B4534 mov eax, dword ptr [ebp+34]
:004522EB 8945DC mov dword ptr [ebp-24], eax
:004522EE 8B4520 mov eax, dword ptr [ebp+20]
:004522F1 8945C8 mov dword ptr [ebp-38], eax
:004522F4 8B4524 mov eax, dword ptr [ebp+24]
:004522F7 8945CC mov dword ptr [ebp-34], eax
:004522FA EB18 jmp 00452314
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:004522E0(C)
|
:004522FC 8B4520 mov eax, dword ptr [ebp+20]
:004522FF 8945D8 mov dword ptr [ebp-28], eax
:00452302 8B4524 mov eax, dword ptr [ebp+24]
:00452305 8945DC mov dword ptr [ebp-24], eax
:00452308 8B4530 mov eax, dword ptr [ebp+30]
:0045230B 8945C8 mov dword ptr [ebp-38], eax
:0045230E 8B4534 mov eax, dword ptr [ebp+34]
:00452311 8945CC mov dword ptr [ebp-34], eax
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:004522FA(U)
|
:00452314 DD4528 fld qword ptr [ebp+28]
:00452317 DC5D18 fcomp qword ptr [ebp+18]
:0045231A DFE0 fstsw ax
:0045231C 9E sahf
:0045231D 731A jnb 00452339
:0045231F 8B4528 mov eax, dword ptr [ebp+28]
:00452322 8945D0 mov dword ptr [ebp-30], eax
:00452325 8B452C mov eax, dword ptr [ebp+2C]
:00452328 8945D4 mov dword ptr [ebp-2C], eax
:0045232B 8B4518 mov eax, dword ptr [ebp+18]
:0045232E 8945C0 mov dword ptr [ebp-40], eax
:00452331 8B451C mov eax, dword ptr [ebp+1C]
:00452334 8945C4 mov dword ptr [ebp-3C], eax
:00452337 EB18 jmp 00452351
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:0045231D(C)
|
:00452339 8B4518 mov eax, dword ptr [ebp+18]
:0045233C 8945D0 mov dword ptr [ebp-30], eax
:0045233F 8B451C mov eax, dword ptr [ebp+1C]
:00452342 8945D4 mov dword ptr [ebp-2C], eax
:00452345 8B4528 mov eax, dword ptr [ebp+28]
:00452348 8945C0 mov dword ptr [ebp-40], eax
:0045234B 8B452C mov eax, dword ptr [ebp+2C]
:0045234E 8945C4 mov dword ptr [ebp-3C], eax
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00452337(U)
|
:00452351 DD4510 fld qword ptr [ebp+10]
:00452354 DC5DD8 fcomp qword ptr [ebp-28]
:00452357 DFE0 fstsw ax
:00452359 9E sahf
:0045235A 7623 jbe 0045237F
:0045235C DD4510 fld qword ptr [ebp+10]
:0045235F DC5DC8 fcomp qword ptr [ebp-38]
:00452362 DFE0 fstsw ax
:00452364 9E sahf
:00452365 7318 jnb 0045237F
:00452367 DD4508 fld qword ptr [ebp+08]
:0045236A DC5DD0 fcomp qword ptr [ebp-30]
:0045236D DFE0 fstsw ax
:0045236F 9E sahf
:00452370 760D jbe 0045237F
:00452372 DD4508 fld qword ptr [ebp+08]
:00452375 DC5DC0 fcomp qword ptr [ebp-40]
:00452378 DFE0 fstsw ax
:0045237A 9E sahf
:0045237B 7302 jnb 0045237F
:0045237D B201 mov dl, 01
* Referenced by a (U)nconditional or (C)onditional Jump at Addresses:
|:004522D1(C), :0045235A(C), :00452365(C), :00452370(C), :0045237B(C)
|
:0045237F 8BC2 mov eax, edx
:00452381 8BE5 mov esp, ebp
:00452383 5D pop ebp
:00452384 C23000 ret 0030
The assembler is provided in case you dont have any pascal/delphi compilers
at hand
and allows one to easily see what is going on.
I think I have provided enough assembler listing by providing the call and
the routine. There was some extra stuff at the beginning and the end like
dup(0) stuff and nop etc... but I dont know why it's there etc.. it's
probably not needed to understand what is going on with these routines, how
they work and how fast they will work
I am curious if anybody is able to tell anything from these listings
If not then that's ok I can still fall back on performance
testing/measurements =D though looking at it from a theoretical point of
view is very interesting
P.S.: Ok I am obsessed with this code lol I admit =D
Bye,
Skybuck.
I needed a method to determine if a point was on a line segment in 2D. So I
googled for some help and so far I have evaluated two methods.
The first method was only a formula, the second method was a piece of C code
which turned out to be incorrect and incomplete but by modifieing it would
still be usuable.
The first method was this piece of text:
"
if x is between StartX and EndX or if y is between StartY and EndY, then the
point (x,y) is on the line.
Another way is the parametric form of a line,ie:
X = StartX + t0 * (EndX - StartX)
Y = StartY + t1 * (EndY - StartY)
calculate t0 and t1, if t0 and t1 are equal, then the point is on the line
if the parameter (t0 = t1) is between 0 and 1, the the point is
on the segment.
"
The second method was this piece of text:
"
Very simple,
First test if the x-value of the point is within the interval StartX,EndX.
If this is not true the point is not on the line.
int on_line(x, y, StartX, StartY, EndX, EndY)
{ int dx, dy, dx1, dy1;
if (x<StartX || x>EndX) return 0;
dx = EndX-StartX;
dy = EndY-StartY;
dx1 = x-StartX;
dy1 = y-StartY;
return (dx*dy1 == dy*dx1);
}
"
So after reading the first method I needed to be able to calculate T0 and T1
since all other variables are known the formula's can be rewritten as to
calculate T0 and T1.
Changing the formula's to get the final formula:
Step1 original:
X = StartX + t0 * (EndX - StartX)
Y = StartY + t1 * (EndY - StartY)
Step2 delta's introduced:
DeltaX = EndX - StartX
DeltaY = EndY - Starty
X = StartX + t0 * (DeltaX)
Y = StartY + t1 * (DeltaY)
Step3 switch
StartX + t0 * (DeltaX) = X
StartY + t1 * (DeltaY) = Y
Step4 start to isolate t0 and t1 by moving StartX and StartY to the
otherside
(thanks to my math teacher for those usefull lessons =D)
t0 * (DeltaX) = X - StartX
t1 * (DeltaY) = Y - StartY
Step5 isolate to and t1 further by moving delta's to the other side
(thanks to my math teacher again for those usefull lessons =D)
t0 = (X - StartX)/DeltaX
t1 = (Y - StartY)/DeltaY
If you look at the original formula you will notice how t0 and t1 are sort
of distances on the line segment itself ranging from 0.0 to 1.0 (like a
percentage).
The second method looks a little bit similair but works totally different.
It multiplies the delta's with the line segment with the delta's with the
start and point. Which kinda looks like a dot product.. I dont know what
that's all about
distance each component has. (There are two components x and y, to be on the
line they both need to have the same distance from the start point for
example
delta's there is only one possible direction)
First of all the C method is incorrect unless the coordinates are pre-sorted
from left to right. If the coordinates are not sorted, for example
(100,10)-(10,50) (going from right to left) the method will simply exit
because of the first if statement which is just wrong. So this is easily
fixed by removing that if statement.
However I also consider the C method to be incomplete. It's fine if one
wants to calculate if a point is on an infinite line. It's impossible to
tell if the original poster wanted a line segment or just a line... he does
mention an start and end point... For my purposes however I need to be able
to tell if a point is on a line segment. So this C method needs to be
modified/expanded to accomadate for it.
The fascination starts with the C method looking very fast. Just a few lines
of code and a multiplication.
It also looks a bit dangerous because of the multiplication which could
overflow... but all in all not to bad/shabby and definetly worth a try.
The first method looks kinda big and slower because of more variables/code
and divisions etc, but it does look complete and stable and robust.
So now for a reality check.
I have implemented both versions in pascal/delphi which I believe to produce
correct results everytime.
The question is which method is faster. I would place my money on the first
method. Simply because the code is shorter and it has to do less floating
point comparisions and less jumps than the second method.
The C method (second method) actually needs a bounding box to be constructed
to be usefull (for line segment tests). The bounding box is necessary to
determine if the point is inside the line segment. Therefore I believe the C
method in this case to be slower. It requires more instructions, more
comparisions and more jumps.
I think I have done a fair job of optimizing both implementations by trying
to reduce the number of comparisions, jumps and optimizing for branch
prediction (most likely case fall through etc).
I spent lot's of time optimizing method1 to reduce the number of compares by
first checking one
delta before checking the next etc.. that simply saved one or two compares
and jumps compared
to previous versions.
I also spent some time optimizing method2 but probably a little bit less
time. I inlined the bounding box calculation and I also postponed it until
after the if statement etc as to not execute if the if statement is false.
Finally the point needs to be checked to be inside or outside the bounding
box. Checking for outside is easier/shorter for multiple bounding so I coded
it like that. However in this case there is only one bounding box so in this
case it doesn't matter and it could also be checked to be inside. The
generated assembler would however be the same so it doesn't matter.
For my purposes the diagonal case will be the most likely and occuring case
so it's most interesting to see which method would be faster.
The first method requires 53 instructions for the diagonal case (with point
on line)
The second method requires 81 instructions for the diagonal case (with point
on line)
Both methods require the same ammounts of pushes onto the stack so those
pushes have not been counted.
The current implementations look like this:
v3 is the first method
v5 is the second method
While writing this post I discovered a flaw in v3, fixed it, and even
optimized it.
I think v3 should now be much faster for case horizontal and case vertical
than v5.
So the main focus is on the diagonal case.
My question to you the delphi, electronics and computer architecture
community is:
Which code do you think is faster ?
Can you explain it purely based on theory without doing any performance
measurements ?
For example: number of instructions, instruction cycles, branch prediction,
pipe line optimizations, pairing, etc.
I shall provide you with the current pascal/delphi and the assembler code:
If you think you really are an expert in the optimization
field/instruction/hardware/assembler field then next question will be
ofcourse:
Can you optimize these methods or come up with a faster method ?
Quite frankly I am obsessed with this code. It's such a trivial code that I
dont want my PC to be spending to much time on it... currently this code is
not being used in any algorithms. I could use alternative information from a
line segment intersection test algorithm which can already determine if one
of the 4 points is on a line segment yes/no etc... but that would make the
routine's header much more complex and maybe I dont want that at first at
least.... so I might choose to use a routine like below to keep things a bit
more simple and pragmatic... etc.
So after each line segment intersection test this routine would also need to
be called, so it would be called a lot, so I dont want it to be a lot of
overhead if you know what I mean
And even if this code will never be used it's still quite fascinating and
educational to see/learn how floating point optimizations work etc... that's
the main reason why I am investigating this and what you to have a look at
this because I am puzzled =D
Ok now the subjective stuff:
My gutt feeling says v3 is faster than v5 because v3 uses less compares and
less jumps which should be good for branch prediction etc.. v3 also has less
instructions.
However v3 uses two divisions (fdiv) which might be a lot slower than
multiplications (fmul) which version v5 uses two of. However my gutt feeling
says v5 has the bounding box thing going on etc so it needs to do more
stuff/instructions which could make it slow
Add the unknown factors like branch prediction, pipe lining, pairing and god
knows what else and you start to see the doubt hehehehehe.
Can you sort it out ?
Or do you need performance measurements like me ?
And let's be honest about that... How are we going to do accurate
performance measurements ?
Lot's of context switching, multi threading, harddisk activity going on, on
windows xp, so I am not too sure if that is reliable but ok
Here is the pascal/delphi code:
function point_on_line_segment_in_2d_v3( StartX, StartY : double;
EndX, EndY : double;
PointX, PointY : double ) : boolean;
var
vDeltaX : double;
vDeltaY : double;
vDistanceXcomponent : double;
vDistanceYcomponent : double;
begin
// default is false
result := false;
vDeltaX := EndX - StartX;
vDeltaY := EndY - StartY;
// if vDeltaX is not zero then that means the line is horizontal or
diagonal
if (vDeltaX <> 0) then
begin
// if vDeltaY is not zero then that means the line is diagonal
if (vDeltaY <> 0) then
begin
// line is diagonal
// calculate both components and look if they are the same
// if yes then see if they both lie between 0 and 1.
vDistanceXcomponent := (PointX - StartX) / vDeltaX;
vDistanceYcomponent := (PointY - StartY) / vDeltaY;
if (vDistanceXcomponent = vDistanceYcomponent) then
begin
if (vDistanceXcomponent>0) and (vDistanceXcomponent<1) then
begin
result := true;
end;
end;
end else
// else it means it is horizontal
begin
// line is horizontal
// check if the point is at the same y position as the horizontal line
if (PointY = StartY) then
begin
if StartX < EndX then
begin
if (PointX > StartX) and (PointX < EndX) then
begin
result := true;
end;
end else
begin
if (PointX > EndX) and (PointX < StartX) then
begin
result := true;
end;
end;
// not necessary simply check if between startx and endx or
// vica versa depending on if endx is greater then startx etc
// since fdiv seems to be slow let's do it with some compares
// see above
{
// calculate and look at the horizontal component if it lies between 0
and 1.
vDistanceXcomponent := (PointX - StartX) / vDeltaX;
if (vDistanceXComponent>0) and (vDistanceXComponent<1) then
begin
result := true;
end;
}
end;
end;
end else
// else that means the line is vertical or a point
// if vDeltaY is not zero then that means the line is vertical
if (vDeltaY <> 0) then
begin
// line is vertical
// check if point is at same x position as the vertical line.
if (PointX = StartX) then
begin
if StartY < EndY then
begin
if (PointY > StartY) and (PointY < EndY) then
begin
result := true;
end;
end else
begin
if (PointY > EndY) and (PointY < StartY) then
begin
result := true;
end;
end;
{
// calculate and see if the vertical component lies between 0 and 1
vDistanceYcomponent := (PointY - StartY) / vDeltaY;
if (vDistanceYComponent>0) and (vDistanceYComponent<1) then
begin
result := true;
end;
}
end;
// not necessary to do ith with divisions etc.
end;
// else the line is not a line but a point so exit
end;
function point_on_line_segment_in_2d_v5( StartX, StartY : double;
EndX, EndY : double;
PointX, PointY : double ) : boolean;
var
vDeltaX : double;
vDeltaY : double;
vDistanceX : double;
vDistanceY : double;
vBoundingBoxX1 : double;
vBoundingBoxY1 : double;
vBoundingBoxX2 : double;
vBoundingBoxY2 : double;
begin
result := false;
vDeltaX := EndX - StartX;
vDeltaY := EndY - StartY;
vDistanceX := PointX - StartX;
vDistanceY := PointY - StartY;
if (vDeltaX * vDistanceY) = (vDeltaY * vDistanceX) then
begin
if (StartX<EndX) then
begin
vBoundingBoxX1 := StartX;
vBoundingBoxX2 := EndX;
end else
begin
vBoundingBoxX1 := EndX;
vBoundingBoxX2 := StartX;
end;
if (StartY<EndY) then
begin
vBoundingBoxY1 := StartY;
vBoundingBoxY2 := EndY;
end else
begin
vBoundingBoxY1 := EndY;
vBoundingBoxY2 := StartY;
end;
// I'll check if the point is outside to determine the overlap etc.
// which is consistent which how it's done in the line segment
intersection/overlap
// test where two bounding boxes need to be checked for overlap.
// determining if they are outside is much easier and shorter then
checking if each
// point is inside
// so let's not be stupid and stick with it
// determining overlap should be done this way:
// checking for outside or on it if not then assume it's inside.
if (PointX <= vBoundingBoxX1) or (PointX >= vBoundingBoxX2) or
(PointY <= vBoundingBoxY1) or (PointY >= vBoundingBoxY2) then exit;
result := true;
// which is the same as:
// assembler generated is exactly the same if I am not mistaken
// this code looks cleaner tough...
// if (PointX > vBoundingBoxX1) and (PointX < vBoundingBoxX2) and
// (PointY > vBoundingBoxY1) and (PointY < vBoundingBoxY2) then
// begin
// result := true;
// end;
end;
end;
And here is some simple test code:
var
a1x, a1y,
a2x, a2y,
px, py : double;
begin
a1x := 10;
a1y := 10;
a2x := 100;
a2y := 100;
px := 50;
py := 50;
if point_on_line_segment_in_2d_v3( a1x, a1y,
a2x, a2y,
px, py ) then
begin
ShowMessage('yes');
end else
begin
ShowMessage('no');
end;
if point_on_line_segment_in_2d_v5( a1x, a1y,
a2x, a2y,
px, py ) then
begin
ShowMessage('yes');
end else
begin
ShowMessage('no');
end;
Here is the assembler code. (since delphi's cpu window does not allow easy
copy/paste this code has been taking from w32dasm disassembler, which
shouldn't matter. Actually the disassembly is better since it shows exactly
where jumps jump too
(Taken from the "release" version without debugging information... I think
the assembler is the same for the debugging version so it doesnt matter
(?)):
// assembler for v3:
// the call:
004523DE FF742404 push [esp+04]
:004523E2 FF742404 push [esp+04]
:004523E6 FF742414 push [esp+14]
:004523EA FF742414 push [esp+14]
:004523EE FF742424 push [esp+24]
:004523F2 FF742424 push [esp+24]
:004523F6 FF742434 push [esp+34]
:004523FA FF742434 push [esp+34]
:004523FE FF742444 push [esp+44]
:00452402 FF742444 push [esp+44]
:00452406 FF742454 push [esp+54]
:0045240A FF742454 push [esp+54]
:0045240E E839FDFFFF call 0045214C
:00452413 84C0 test al, al
:00452415 740C je 00452423
// the routine:
:0045214C 55 push ebp
:0045214D 8BEC mov ebp, esp
:0045214F 83C4E0 add esp, FFFFFFE0
:00452152 33D2 xor edx, edx
:00452154 DD4520 fld qword ptr [ebp+20]
:00452157 DC6530 fsub qword ptr [ebp+30]
:0045215A DD5DF8 fstp qword ptr [ebp-08]
:0045215D 9B wait
:0045215E DD4518 fld qword ptr [ebp+18]
:00452161 DC6528 fsub qword ptr [ebp+28]
:00452164 DD5DF0 fstp qword ptr [ebp-10]
:00452167 9B wait
:00452168 DD45F8 fld qword ptr [ebp-08]
:0045216B D81D88224500 fcomp dword ptr [00452288]
:00452171 DFE0 fstsw ax
:00452173 9E sahf
:00452174 0F84B0000000 je 0045222A
:0045217A DD45F0 fld qword ptr [ebp-10]
:0045217D D81D88224500 fcomp dword ptr [00452288]
:00452183 DFE0 fstsw ax
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00452114(C)
|
:00452185 9E sahf
:00452186 7454 je 004521DC
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00452112(C)
|
:00452188 DD4510 fld qword ptr [ebp+10]
:0045218B DC6530 fsub qword ptr [ebp+30]
:0045218E DC75F8 fdiv qword ptr [ebp-08]
:00452191 DD5DE8 fstp qword ptr [ebp-18]
:00452194 9B wait
:00452195 DD4508 fld qword ptr [ebp+08]
:00452198 DC6528 fsub qword ptr [ebp+28]
:0045219B DC75F0 fdiv qword ptr [ebp-10]
:0045219E DD5DE0 fstp qword ptr [ebp-20]
:004521A1 9B wait
:004521A2 DD45E8 fld qword ptr [ebp-18]
:004521A5 DC5DE0 fcomp qword ptr [ebp-20]
:004521A8 DFE0 fstsw ax
:004521AA 9E sahf
:004521AB 0F85CF000000 jne 00452280
:004521B1 DD45E8 fld qword ptr [ebp-18]
:004521B4 D81D88224500 fcomp dword ptr [00452288]
:004521BA DFE0 fstsw ax
:004521BC 9E sahf
:004521BD 0F86BD000000 jbe 00452280
:004521C3 DD45E8 fld qword ptr [ebp-18]
:004521C6 D81D8C224500 fcomp dword ptr [0045228C]
:004521CC DFE0 fstsw ax
:004521CE 9E sahf
:004521CF 0F83AB000000 jnb 00452280
:004521D5 B201 mov dl, 01
:004521D7 E9A4000000 jmp 00452280
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00452186(C)
|
:004521DC DD4508 fld qword ptr [ebp+08]
:004521DF DC5D28 fcomp qword ptr [ebp+28]
:004521E2 DFE0 fstsw ax
:004521E4 9E sahf
:004521E5 0F8595000000 jne 00452280
:004521EB DD4530 fld qword ptr [ebp+30]
:004521EE DC5D20 fcomp qword ptr [ebp+20]
:004521F1 DFE0 fstsw ax
:004521F3 9E sahf
:004521F4 731A jnb 00452210
:004521F6 DD4510 fld qword ptr [ebp+10]
:004521F9 DC5D30 fcomp qword ptr [ebp+30]
:004521FC DFE0 fstsw ax
:004521FE 9E sahf
:004521FF 767F jbe 00452280
:00452201 DD4510 fld qword ptr [ebp+10]
:00452204 DC5D20 fcomp qword ptr [ebp+20]
:00452207 DFE0 fstsw ax
:00452209 9E sahf
:0045220A 7374 jnb 00452280
:0045220C B201 mov dl, 01
:0045220E EB70 jmp 00452280
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:004521F4(C)
|
:00452210 DD4510 fld qword ptr [ebp+10]
:00452213 DC5D20 fcomp qword ptr [ebp+20]
:00452216 DFE0 fstsw ax
:00452218 9E sahf
:00452219 7665 jbe 00452280
:0045221B DD4510 fld qword ptr [ebp+10]
:0045221E DC5D30 fcomp qword ptr [ebp+30]
:00452221 DFE0 fstsw ax
:00452223 9E sahf
:00452224 735A jnb 00452280
:00452226 B201 mov dl, 01
:00452228 EB56 jmp 00452280
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00452174(C)
|
:0045222A DD45F0 fld qword ptr [ebp-10]
:0045222D D81D88224500 fcomp dword ptr [00452288]
:00452233 DFE0 fstsw ax
:00452235 9E sahf
:00452236 7448 je 00452280
:00452238 DD4510 fld qword ptr [ebp+10]
:0045223B DC5D30 fcomp qword ptr [ebp+30]
:0045223E DFE0 fstsw ax
:00452240 9E sahf
:00452241 753D jne 00452280
:00452243 DD4528 fld qword ptr [ebp+28]
:00452246 DC5D18 fcomp qword ptr [ebp+18]
:00452249 DFE0 fstsw ax
:0045224B 9E sahf
:0045224C 731A jnb 00452268
:0045224E DD4508 fld qword ptr [ebp+08]
:00452251 DC5D28 fcomp qword ptr [ebp+28]
:00452254 DFE0 fstsw ax
:00452256 9E sahf
:00452257 7627 jbe 00452280
:00452259 DD4508 fld qword ptr [ebp+08]
:0045225C DC5D18 fcomp qword ptr [ebp+18]
:0045225F DFE0 fstsw ax
:00452261 9E sahf
:00452262 731C jnb 00452280
:00452264 B201 mov dl, 01
:00452266 EB18 jmp 00452280
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:0045224C(C)
|
:00452268 DD4508 fld qword ptr [ebp+08]
:0045226B DC5D18 fcomp qword ptr [ebp+18]
:0045226E DFE0 fstsw ax
:00452270 9E sahf
:00452271 760D jbe 00452280
:00452273 DD4508 fld qword ptr [ebp+08]
:00452276 DC5D28 fcomp qword ptr [ebp+28]
:00452279 DFE0 fstsw ax
:0045227B 9E sahf
:0045227C 7302 jnb 00452280
:0045227E B201 mov dl, 01
* Referenced by a (U)nconditional or (C)onditional Jump at Addresses:
|:004521AB(C), :004521BD(C), :004521CF(C), :004521D7(U), :004521E5(C)
|:004521FF(C), :0045220A(C), :0045220E(U), :00452219(C), :00452224(C)
|:00452228(U), :00452236(C), :00452241(C), :00452257(C), :00452262(C)
|:00452266(U), :00452271(C), :0045227C(C)
|
:00452280 8BC2 mov eax, edx
:00452282 8BE5 mov esp, ebp
:00452284 5D pop ebp
:00452285 C23000 ret 0030
// assembler for v5:
// the call:
:0045242D FF742404 push [esp+04]
:00452431 FF742404 push [esp+04]
:00452435 FF742414 push [esp+14]
:00452439 FF742414 push [esp+14]
:0045243D FF742424 push [esp+24]
:00452441 FF742424 push [esp+24]
:00452445 FF742434 push [esp+34]
:00452449 FF742434 push [esp+34]
:0045244D FF742444 push [esp+44]
:00452451 FF742444 push [esp+44]
:00452455 FF742454 push [esp+54]
:00452459 FF742454 push [esp+54]
:0045245D E82EFEFFFF call 00452290
:00452462 84C0 test al, al
:00452464 740C je 00452472
// the routine:
* Referenced by a CALL at Address:
|:0045245D
|
:00452290 55 push ebp
:00452291 8BEC mov ebp, esp
:00452293 83C4C0 add esp, FFFFFFC0
:00452296 33D2 xor edx, edx
:00452298 DD4520 fld qword ptr [ebp+20]
:0045229B DC6530 fsub qword ptr [ebp+30]
:0045229E DD5DF8 fstp qword ptr [ebp-08]
:004522A1 9B wait
:004522A2 DD4518 fld qword ptr [ebp+18]
:004522A5 DC6528 fsub qword ptr [ebp+28]
:004522A8 DD5DF0 fstp qword ptr [ebp-10]
:004522AB 9B wait
:004522AC DD4510 fld qword ptr [ebp+10]
:004522AF DC6530 fsub qword ptr [ebp+30]
:004522B2 DD5DE8 fstp qword ptr [ebp-18]
:004522B5 9B wait
:004522B6 DD4508 fld qword ptr [ebp+08]
:004522B9 DC6528 fsub qword ptr [ebp+28]
:004522BC DD5DE0 fstp qword ptr [ebp-20]
:004522BF 9B wait
:004522C0 DD45F8 fld qword ptr [ebp-08]
:004522C3 DC4DE0 fmul qword ptr [ebp-20]
:004522C6 DD45F0 fld qword ptr [ebp-10]
:004522C9 DC4DE8 fmul qword ptr [ebp-18]
:004522CC DED9 fcompp
:004522CE DFE0 fstsw ax
:004522D0 9E sahf
:004522D1 0F85A8000000 jne 0045237F
:004522D7 DD4530 fld qword ptr [ebp+30]
:004522DA DC5D20 fcomp qword ptr [ebp+20]
:004522DD DFE0 fstsw ax
:004522DF 9E sahf
:004522E0 731A jnb 004522FC
:004522E2 8B4530 mov eax, dword ptr [ebp+30]
:004522E5 8945D8 mov dword ptr [ebp-28], eax
:004522E8 8B4534 mov eax, dword ptr [ebp+34]
:004522EB 8945DC mov dword ptr [ebp-24], eax
:004522EE 8B4520 mov eax, dword ptr [ebp+20]
:004522F1 8945C8 mov dword ptr [ebp-38], eax
:004522F4 8B4524 mov eax, dword ptr [ebp+24]
:004522F7 8945CC mov dword ptr [ebp-34], eax
:004522FA EB18 jmp 00452314
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:004522E0(C)
|
:004522FC 8B4520 mov eax, dword ptr [ebp+20]
:004522FF 8945D8 mov dword ptr [ebp-28], eax
:00452302 8B4524 mov eax, dword ptr [ebp+24]
:00452305 8945DC mov dword ptr [ebp-24], eax
:00452308 8B4530 mov eax, dword ptr [ebp+30]
:0045230B 8945C8 mov dword ptr [ebp-38], eax
:0045230E 8B4534 mov eax, dword ptr [ebp+34]
:00452311 8945CC mov dword ptr [ebp-34], eax
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:004522FA(U)
|
:00452314 DD4528 fld qword ptr [ebp+28]
:00452317 DC5D18 fcomp qword ptr [ebp+18]
:0045231A DFE0 fstsw ax
:0045231C 9E sahf
:0045231D 731A jnb 00452339
:0045231F 8B4528 mov eax, dword ptr [ebp+28]
:00452322 8945D0 mov dword ptr [ebp-30], eax
:00452325 8B452C mov eax, dword ptr [ebp+2C]
:00452328 8945D4 mov dword ptr [ebp-2C], eax
:0045232B 8B4518 mov eax, dword ptr [ebp+18]
:0045232E 8945C0 mov dword ptr [ebp-40], eax
:00452331 8B451C mov eax, dword ptr [ebp+1C]
:00452334 8945C4 mov dword ptr [ebp-3C], eax
:00452337 EB18 jmp 00452351
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:0045231D(C)
|
:00452339 8B4518 mov eax, dword ptr [ebp+18]
:0045233C 8945D0 mov dword ptr [ebp-30], eax
:0045233F 8B451C mov eax, dword ptr [ebp+1C]
:00452342 8945D4 mov dword ptr [ebp-2C], eax
:00452345 8B4528 mov eax, dword ptr [ebp+28]
:00452348 8945C0 mov dword ptr [ebp-40], eax
:0045234B 8B452C mov eax, dword ptr [ebp+2C]
:0045234E 8945C4 mov dword ptr [ebp-3C], eax
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00452337(U)
|
:00452351 DD4510 fld qword ptr [ebp+10]
:00452354 DC5DD8 fcomp qword ptr [ebp-28]
:00452357 DFE0 fstsw ax
:00452359 9E sahf
:0045235A 7623 jbe 0045237F
:0045235C DD4510 fld qword ptr [ebp+10]
:0045235F DC5DC8 fcomp qword ptr [ebp-38]
:00452362 DFE0 fstsw ax
:00452364 9E sahf
:00452365 7318 jnb 0045237F
:00452367 DD4508 fld qword ptr [ebp+08]
:0045236A DC5DD0 fcomp qword ptr [ebp-30]
:0045236D DFE0 fstsw ax
:0045236F 9E sahf
:00452370 760D jbe 0045237F
:00452372 DD4508 fld qword ptr [ebp+08]
:00452375 DC5DC0 fcomp qword ptr [ebp-40]
:00452378 DFE0 fstsw ax
:0045237A 9E sahf
:0045237B 7302 jnb 0045237F
:0045237D B201 mov dl, 01
* Referenced by a (U)nconditional or (C)onditional Jump at Addresses:
|:004522D1(C), :0045235A(C), :00452365(C), :00452370(C), :0045237B(C)
|
:0045237F 8BC2 mov eax, edx
:00452381 8BE5 mov esp, ebp
:00452383 5D pop ebp
:00452384 C23000 ret 0030
The assembler is provided in case you dont have any pascal/delphi compilers
at hand
I think I have provided enough assembler listing by providing the call and
the routine. There was some extra stuff at the beginning and the end like
dup(0) stuff and nop etc... but I dont know why it's there etc.. it's
probably not needed to understand what is going on with these routines, how
they work and how fast they will work
I am curious if anybody is able to tell anything from these listings
If not then that's ok I can still fall back on performance
testing/measurements =D though looking at it from a theoretical point of
view is very interesting
P.S.: Ok I am obsessed with this code lol I admit =D
Bye,
Skybuck.