flat assembler
Message board for the users of flat assembler.

Index > Windows > SSE alternative for FPU::FABS ?

Author
Thread Post new topic Reply to topic
sq4²



Joined: 31 Jul 2005
Posts: 13
sq4² 06 Aug 2005, 21:40
I need to remove the sign from all scalars in an SSE register.
Did a search on google, but can't find any alternative for FABS.

Someone any idea?
Post 06 Aug 2005, 21:40
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 06 Aug 2005, 22:10
Perhaps not very elegant: Wink
Code:
andps xmm0,dqword [mask]    

with:
Code:
align 16
mask dd 4 dup 7FFFFFFFh    
Post 06 Aug 2005, 22:10
View user's profile Send private message Visit poster's website Reply with quote
sq4²



Joined: 31 Jul 2005
Posts: 13
sq4² 06 Aug 2005, 22:51
Wow, that's fast.

Better ask this too :

I need to : addps xmm0,xmm1
When a scalars exceeds 1.0f / -1.0f I need to set the scalar to 1.0/-1.0
I have a fast routine in C (using Abs())
But again, asm is for me completely new...

Anyway, thanks a lot!
Post 06 Aug 2005, 22:51
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 06 Aug 2005, 23:05
Again for packed single (four 32-bit fp values):
Code:
        maxps xmm0,dqword [min]
        minps xmm0,dqword [max]    

with:
Code:
min dd -1.0,-1.0,-1.0,-1.0
max dd 1.0,1.0,1.0,1.0    
Post 06 Aug 2005, 23:05
View user's profile Send private message Visit poster's website Reply with quote
sq4²



Joined: 31 Jul 2005
Posts: 13
sq4² 07 Aug 2005, 00:46
and now for the last one : Embarassed

the last step is to move the 4 Fp's as 16bitInteger to another memory location after multiplying them:
I have this :

!MOV edi, [v_L0]
!MOV esi, [v_L1]
!MOV ebx, [v_LD]
!MOV ecx, 256
!.While1:
!movups xmm0,[edi]
!movups xmm1,[esi]
!addps xmm0,xmm1
!mulps xmm0, 65000 ---------- ??
!movups [ebx],xmm0 ---------- ??
!ADD esi,16
!ADD edi,16
!ADD ebx,8
!LOOP .While1
Post 07 Aug 2005, 00:46
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 08 Aug 2005, 11:03
Shouldn't you first move constants to XMMs and after that do the multiplication:
Code:
MOV edi, [v_L0]
MOV esi, [v_L1]
MOV ebx, [v_LD]
MOV ecx, 256
.While1:
movups xmm0,[edi]
movups xmm1,[esi]
addps xmm0,xmm1
mulps xmm0, dqword[sixtyfiveT] ;Instead you could try moving 65000.0 and shuffling to other parts 4 times.
movups [ebx],xmm0
ADD esi,16
ADD edi,16
ADD ebx,8
LOOP .While1

v_L0       dd 0
v_L1       dd 1
v_LD       dd ?
sixtyfiveT dd 65000.0,65000.0,65000.0,65000.0
    
Post 08 Aug 2005, 11:03
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
sq4²



Joined: 31 Jul 2005
Posts: 13
sq4² 08 Aug 2005, 11:11
This routine is working perfect!
BUT : only sometimes.
v_LD points to an Asio-buffer (allocated by the Asio driver)
It looks (at least it sounds) like the buffer should be Aligned by 4.
The problem is that I do not have control over this buffer.
Could that be the cause?


Code:
  !movss xmm2,[v_Gain]
  !unpcklps xmm2,xmm2
  !movlhps xmm2,xmm2
  
  !MOV edi, [v_L0]
  !MOV esi, [v_L1]
  !MOV edx, [v_LD]
  !MOV ecx, 256
  !.While1:
    !movups xmm0,[edi]
    !movups xmm1,[esi]
    !addps xmm0,xmm1
    !mulps xmm0, xmm2
    !cvtps2pi mm0,xmm0
    !movhlps xmm0,xmm0
    !cvtps2pi mm1,xmm0  
    !packssdw mm0,mm1
    !movq [edx],mm0
    !ADD esi,16
    !ADD edi,16
    !ADD edx,8
  !LOOP .While1
  !EMMS
    
[/code][/b]
Post 08 Aug 2005, 11:11
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 08 Aug 2005, 12:49
I think it should be aligned to 16, but I'm not sure that its possible if what you say is true - you have no control over it Sad
Post 08 Aug 2005, 12:49
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 08 Aug 2005, 14:06
Madis731 wrote:
I think it should be aligned to 16, but I'm not sure that its possible if what you say is true - you have no control over it Sad

Sounds like it's exactly the same problem I got with Delphi 6, which allows the usage of MMX/SSE in it's inline assembler, but only allows aligning data on 1, 2, 4 and 8 byte boundary, but NOT 16 byte boundary. What a pain of a compiler Sad

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 08 Aug 2005, 14:06
View user's profile Send private message Reply with quote
sq4²



Joined: 31 Jul 2005
Posts: 13
sq4² 08 Aug 2005, 16:15
MCD wrote:
Madis731 wrote:
I think it should be aligned to 16, but I'm not sure that its possible if what you say is true - you have no control over it Sad

Sounds like it's exactly the same problem I got with Delphi 6, which allows the usage of MMX/SSE in it's inline assembler, but only allows aligning data on 1, 2, 4 and 8 byte boundary, but NOT 16 byte boundary. What a pain of a compiler Sad


Ok, let's assume it's an alignment problem.
I could work with an extra buffer + (WinAPI)CopyMemory.
The question is : how do I allocate memory that is 16 byte aligned?
Post 08 Aug 2005, 16:15
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 08 Aug 2005, 16:19
If you cannot trust memory allocation, you can allocate 15 bytes more than you need and choose the starting address to be the first aligned one inside the block.
Post 08 Aug 2005, 16:19
View user's profile Send private message Visit poster's website Reply with quote
sq4²



Joined: 31 Jul 2005
Posts: 13
sq4² 08 Aug 2005, 16:22
Tomasz Grysztar wrote:
If you cannot trust memory allocation, you can allocate 15 bytes more than you need and choose the starting address to be the first aligned one inside the block.


I know, but does 16 byte alignment mean that the starting address of a memory block has to be dividable by 16?

Btw, should I FXSave/FXRSTOR?
Post 08 Aug 2005, 16:22
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 08 Aug 2005, 16:24
Yes - and this also means that the lowest four bits of address have to be 0000.
Post 08 Aug 2005, 16:24
View user's profile Send private message Visit poster's website Reply with quote
sq4²



Joined: 31 Jul 2005
Posts: 13
sq4² 08 Aug 2005, 16:45
One thing is sure : it's not an alignment problem.

Perhaps this is complete nonsense but :

in another thread also mmx/sse is used (this code resides in a dll (in fact it's a VSTI (see steinberg))

could it be that this interferes with my code?

if so : will a criticalsection prevent this?
Post 08 Aug 2005, 16:45
View user's profile Send private message Reply with quote
sq4²



Joined: 31 Jul 2005
Posts: 13
sq4² 08 Aug 2005, 20:00
Guess what : my soundcard is doing funny. I tried another comp+same type of soundcard, and everything is working fine.

Big thanks to you all, and especially Tomasz for answering so fast.
Post 08 Aug 2005, 20:00
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.