flat assembler
Message board for the users of flat assembler.

Index > Main > [SSE code] Why this code works ... and this not?

Author
Thread Post new topic Reply to topic
macgub



Joined: 11 Jan 2006
Posts: 348
Location: Poland
macgub 20 Feb 2014, 12:37
Why this code works
Code:
;===alternative2
    pushad
    mov      ecx,XRES
    movaps  xmm0,[py1]
    cvtss2si eax,xmm0
    movaps   xmm1,[px1]
    cvtss2si ebx,xmm1
    mul      ecx
    add      eax,ebx
    shl      eax,2
    add      eax,buffer
    mov      dword [eax],0x000000ff
    psrldq   xmm0,4
    cvtss2si eax,xmm0
    psrldq   xmm1,4
    cvtss2si ebx,xmm1
    mul      ecx
    add      eax,ebx
    shl      eax,2
    add      eax,buffer
    mov      dword [eax],0x000000ff
    psrldq   xmm0,4
    cvtss2si eax,xmm0
    psrldq   xmm1,4
    cvtss2si ebx,xmm1
    mul      ecx
    add      eax,ebx
    shl      eax,2
    add      eax,buffer
    mov      dword [eax],0x000000ff
    psrldq   xmm0,4
    cvtss2si eax,xmm0
    psrldq   xmm1,4
    cvtss2si ebx,xmm1
    mul      ecx
    add      eax,ebx
    shl      eax,2
    add      eax,buffer
    mov      dword [eax],0x000000ff
    popad              
    


..and this not
Code:
; ==aternative       won't works
    pushad
    mov      edx,XRES
    cvtsi2ss xmm0,edx
    shufps   xmm0,xmm0,0
    mulps    xmm0,[py1]
    addps    xmm0,[px1]
    cvtps2dq xmm0,xmm0
    pslld    xmm0,2
    mov      edx,buffer
    movd     xmm1,edx
    shufps   xmm1,xmm1,0
    paddd    xmm0,xmm1
    movd     eax,xmm0
    mov      [eax], dword 0x00ffffff
    psrldq   xmm0,4
    movd     eax,xmm0
    mov      [eax], dword 0x00ffffff
    psrldq   xmm0,4
    movd     eax,xmm0
    mov      [eax], dword 0x00ffffff
    psrldq   xmm0,4
    movd     eax,xmm0
    mov      [eax], dword 0x00ffffff
    popad                  
    


Question
Post 20 Feb 2014, 12:37
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20356
Location: In your JS exploiting you and your system
revolution 20 Feb 2014, 12:42
What do you mean by "work"?

What is it that the code does anyway?
Post 20 Feb 2014, 12:42
View user's profile Send private message Visit poster's website Reply with quote
macgub



Joined: 11 Jan 2006
Posts: 348
Location: Poland
macgub 20 Feb 2014, 12:49
Work correctly. This code calculate adress of pixel given in four places in px1 and py1.
px1 and py1 contains four float dword pixel adresses.
...and i forgot data:
Code:
 px1:     dd ?
    px2   dd ?
    px3   dd ?
    px4   dd ?
 py1:     dd ?
    py2   dd ?
    py3   dd ?
    py4   dd ?   
    
Post 20 Feb 2014, 12:49
View user's profile Send private message Visit poster's website Reply with quote
macgub



Joined: 11 Jan 2006
Posts: 348
Location: Poland
macgub 20 Feb 2014, 12:54
Both fragments of code do the same (or, better - shoould do the same)
Post 20 Feb 2014, 12:54
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20356
Location: In your JS exploiting you and your system
revolution 20 Feb 2014, 13:03
"mulps xmm0,[py1]" expects to find float data. Your first code converts from integer data first, but your second code doesn't convert.
Post 20 Feb 2014, 13:03
View user's profile Send private message Visit poster's website Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1903
DOS386 20 Feb 2014, 13:12
You use also 2 constants "XRES" and "buffer" and write strange stuff into the memory (different in the 2 examples). I still don't understand what your code is supposed to do.
Post 20 Feb 2014, 13:12
View user's profile Send private message Reply with quote
macgub



Joined: 11 Jan 2006
Posts: 348
Location: Poland
macgub 20 Feb 2014, 13:22
@DOS386
First code isn't optimized at all but it works. Second code I try bit optimized.
@rev
"cvtsi2ss xmm0,edx " don't convert xmm0, but IMHO converts edx and put as a float in xmm0.
Post 20 Feb 2014, 13:22
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20356
Location: In your JS exploiting you and your system
revolution 20 Feb 2014, 13:28
Okay, I took a second look and I can't see any difference. AFAICT they both do the same.

Show your test code. There is probably something wrong elsewhere in your code.
Post 20 Feb 2014, 13:28
View user's profile Send private message Visit poster's website Reply with quote
macgub



Joined: 11 Jan 2006
Posts: 348
Location: Poland
macgub 21 Feb 2014, 09:15
OK as attachment I give whole sources.


Description:
Download
Filename: bezier9.zip
Filesize: 114.41 KB
Downloaded: 385 Time(s)

Post 21 Feb 2014, 09:15
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20356
Location: In your JS exploiting you and your system
revolution 21 Feb 2014, 12:54
Seems like the rounding affects the output:
Code:
;...
    pushad
    mov      edx,XRES
    cvtsi2ss xmm0,edx
    shufps   xmm0,xmm0,0
if 0
    mulps    xmm0,[py1]
    addps    xmm0,[px1]
else
    movaps  xmm1,[py1]
    movaps  xmm2,[px1]
    cvtps2dq xmm1,xmm1
    cvtps2dq xmm2,xmm2
    cvtdq2ps xmm1,xmm1
    cvtdq2ps xmm2,xmm2
    mulps xmm0,xmm1
    addps xmm0,xmm2
end if
    cvtps2dq xmm0,xmm0
    pslld    xmm0,2
    mov      edx,buffer
;...    
I am not suggesting that this is the best method of rounding, it is just to show the difference.
Post 21 Feb 2014, 12:54
View user's profile Send private message Visit poster's website Reply with quote
macgub



Joined: 11 Jan 2006
Posts: 348
Location: Poland
macgub 21 Feb 2014, 13:28
I probably will abandon this problem. Thanks anyway.
Post 21 Feb 2014, 13:28
View user's profile Send private message Visit poster's website Reply with quote
pabloreda



Joined: 24 Jan 2007
Posts: 116
Location: Argentina
pabloreda 21 Feb 2014, 19:33
I draw bezier curves, cuadratic and qubic, with only integers but with the recursive aproach, very short code in my lang, easy to traslate to asm, the great trick is keep the draw order..
Post 21 Feb 2014, 19:33
View user's profile Send private message Visit poster's website Reply with quote
macgub



Joined: 11 Jan 2006
Posts: 348
Location: Poland
macgub 24 Feb 2014, 07:47
Are you going to release the sources?
Post 24 Feb 2014, 07:47
View user's profile Send private message Visit poster's website Reply with quote
pabloreda



Joined: 24 Jan 2007
Posts: 116
Location: Argentina
pabloreda 24 Feb 2014, 12:35
yes, is open
in https://code.google.com/p/reda4/source/browse/trunk/r4/System/asmbase/asmbase.txt

this is the graphics rutines for the compiler..
lines 210 to 240..
curve and curve3.


In my poor man compiler (can be better) the first word are:
Code:

w27: ; ::: CURVE ::: uso:-4 dD:-4 
mov ebx,dword [esi+8]
add ebx,dword [wB]
mov ecx,dword [esi]
sal ecx,1
sub ebx,ecx
mov edx,ebx
sar edx,31
add ebx,edx
xor ebx,edx
mov ecx,dword [esi+4]
add ecx,dword [wC]
mov edx,eax
sal edx,1
sub ecx,edx
mov edx,ecx
sar edx,31
add ecx,edx
xor ecx,edx
add ebx,ecx
cmp ebx,$4
jge _4F
lea esi,[esi+8]
mov eax,dword [esi-4]
jmp w25                      ;LINE 
_4F: mov ebx,dword [esi+8]
add ebx,dword [esi]
sar ebx,1
mov ecx,dword [esi+4]
add ecx,eax
sar ecx,1
add eax,dword [wA]
sar eax,1
mov edx,dword [esi]
add edx,dword [w9]
sar edx,1
mov edi,ebx
add edi,edx
sar edi,1
mov [esi-4],ebx
mov ebx,ecx
add ebx,eax
sar ebx,1
lea esi,[esi-16]
xchg dword [esi+12],ecx
mov dword [esi+8],edi
mov dword [esi+4],ebx
mov dword [esi],edx
mov dword [esi+16],ecx
call w27         ; curve
jmp w27        ; curve
    


like forth, esi is a second stack, the parameters are in [esi] and eax (top of stack)
wB and wC are X and Y previous with scale, because the line rutine draw with antialias
w9 and wA are X and Y without scale

for draw a curve with point 10,10 control point 50,50 and final point 100,100
the words are
10 10 op
50 50 100 100 curve

in asm, set 10 10 to xp, yp (and the scaled version), the stack grow to negative
mov [esi+8], 50
mov [esi+4], 50
mov [esi], 100
mov eax, 100
call w27

or traslate to use the other stack.
Post 24 Feb 2014, 12:35
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.