flat assembler
Message board for the users of flat assembler.
Index
> Windows > Memory Block calculations |
Author |
|
comrade 01 Aug 2005, 03:40
SSE
|
|||
01 Aug 2005, 03:40 |
|
sq4² 01 Aug 2005, 09:51
Can you point me in the right direction?
(SSE related) Pseudo code : Code: ptrSrc = start of source memory block ptrDst = start of destination memory block Constant=(2^15) For i = 0 to blocksize-1 Float = PeekFloat(ptrSrc) PokeLong(ptrDst,Float*Constant) ptrSrc+(i*4) : ptrDst+(i*4) next Thanks. |
|||
01 Aug 2005, 09:51 |
|
MCD 01 Aug 2005, 13:22
It's not completely clear for me whether those should be constants or variables in your pseudo-code:
ptrSrc ptrDst Constant blocksize If those are variables, than that's perhaps what he meant: Code: mov esi,[ptrSrc] mov edi,[ptrDst] mov ecx,[blocksize] sub esi,ecx sub edi,ecx movss xmm1,[constant] unpckps xmm1,xmm1 movlhps xmm1,xmm1 .MulLoop: movups xmm0,[esi+ecx] mulps xmm0,xmm1 movups [edi+ecx],xmm0 add ecx,16 jnc .MulLoop Note: Both Src and Dst must be a multiple of 16 byte long, and should be aligned on a 16byte boundary to speed it up, so you can use this code: Code: mov esi,[ptrSrc] mov edi,[ptrDst] mov ecx,[blocksize] sub esi,ecx sub edi,ecx movss xmm1,[constant] unpckps xmm1,xmm1 movlhps xmm1,xmm1 .MulLoop: movaps xmm0,[esi+ecx] mulps xmm0,xmm1 movaps [edi+ecx],xmm0 ; .1) add ecx,16 jnc .MulLoop If you know that your floats aren't to be used anytime soon, you can also replace the line 1.) with "movntps [edi+ecx],xmm0". If that isn't fast enough, you can further unroll the loop and process multiple multiplications at once, in different registers, like this, but data must be 64byte aligned than: Code: mov esi,[ptrSrc] mov edi,[ptrDst] mov ecx,[blocksize] sub esi,ecx sub edi,ecx movss xmm7,[constant] unpckps xmm7,xmm7 movlhps xmm7,xmm7 .MulLoop: movaps xmm0,[esi+ecx] mulps xmm0,xmm7 movaps xmm1,[esi+ecx+16] mulps xmm1,xmm7 movaps xmm2,[esi+ecx+32] mulps xmm2,xmm7 movaps xmm3,[esi+ecx+48] mulps xmm3,xmm7 movaps [edi+ecx],xmm0 ;1.) movaps [edi+ecx+16],xmm0 ;1.) movaps [edi+ecx+32],xmm0 ;1.) movaps [edi+ecx+48],xmm0 ;1.) add ecx,64 jnc .MulLoop It all just depends on how much data you have to process. _________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| || |
|||
01 Aug 2005, 13:22 |
|
sq4² 02 Aug 2005, 23:55
thanks alot
i'll try it and let you know. |
|||
02 Aug 2005, 23:55 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.