flat assembler
Message board for the users of flat assembler.

Index > Heap > SSE* and MMX examples NOT needed

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
0.1



Joined: 24 Jul 2007
Posts: 474
Location: India
0.1
well ... there are some very experienced and expert level board members.
so i just request them if they can post some SSE* and MMX examples that are
simple enough (and show some useful ... err use also) for me and any body
of my level (or may be of higher level than me)

thanks!

PS: oops! i forgot to mention that please also explain the instructions in simple terms!
Thanks aGain!

_________________
Code:
 o__=-
 )
(\
 /\  
    


Last edited by 0.1 on 25 Jan 2008, 13:41; edited 1 time in total
Post 22 Dec 2007, 10:10
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17270
Location: In your JS exploiting you and your system
revolution
There are already a few topics discussing SSE code. What did the search function turn up for you?
Post 22 Dec 2007, 10:44
View user's profile Send private message Visit poster's website Reply with quote
kohlrak



Joined: 21 Jul 2006
Posts: 1421
Location: Uncle Sam's Pad
kohlrak
More interesting would be the trig functions in SSE2 code, but i havn't did much searching on that, yet.
Post 22 Dec 2007, 10:52
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger Reply with quote
0.1



Joined: 24 Jul 2007
Posts: 474
Location: India
0.1
revolution wrote:
There are already a few topics discussing SSE code. What did the search function turn up for you?

they mostly concern with -- to show how good they are at it.
and not -- how to show useful use of it to a newbie like me -- i am afraid Sad

_________________
Code:
 o__=-
 )
(\
 /\  
    
Post 22 Dec 2007, 10:56
View user's profile Send private message Reply with quote
kohlrak



Joined: 21 Jul 2006
Posts: 1421
Location: Uncle Sam's Pad
kohlrak
i used to wonder as well. MMX and later versions (SSE, SSE2, etc) are like the FPU, only they're faster when working with large amounts of calculations (like in a 3d game where collision, position, and changes have to be calculated every draw). It's just a matter of implimenting them. If you're not looping calculations which modify 4 floats or a single or multiple calculation of more than 4 floats, the outcome really isn't worth it.
Post 22 Dec 2007, 10:59
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17270
Location: In your JS exploiting you and your system
revolution
Here's a MMX population count. I've posted this before somewhere:
Code:
align 8
c5555       dq      05555555555555555h
c3333     dq      03333333333333333h
c0f0f     dq      00f0f0f0f0f0f0f0fh

population_count_mm0:
 movq    mm1,mm0         ;v
  psrld   mm0,1           ;v >> 1
       pand    mm0,[c5555]     ;(v >> 1) & 0x55555555
    psubd   mm1,mm0         ;w = v - ((v >> 1) & 0x55555555)
  movq    mm0,mm1         ;w
  psrld   mm1,2           ;w >> 2
       pand    mm0,[c3333]     ;w & 0x33333333
 pand    mm1,[c3333]     ;(w >> 2) & 0x33333333
    paddd   mm0,mm1         ;x = (w & 0x33333333) + ((w >> 2) & 0x33333333)
       movq    mm1,mm0         ;x
  psrld   mm0,4           ;x >> 4
       paddd   mm0,mm1         ;x + (x >> 4)
 pand    mm0,[c0f0f]     ;y = (x + (x >> 4) & 0x0f0f0f0f)
  pxor    mm1,mm1         ;0
  psadbw  mm0,mm1         ;sum across all 8 bytes
     movd    eax,mm0         ;result in EAX per calling
  ret    

But one main problem with MMX/SSE is that people only want to use to optimise their code, so of course they will be mostly focussed upon how well it performs.
Post 22 Dec 2007, 11:11
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17270
Location: In your JS exploiting you and your system
revolution
Here's some more bit manipulation stuff I found on my HDD. I think I've also posted this before somewhere:
Code:
align 16
d5555     dq      05555555555555555h,05555555555555555h
d3333  dq      03333333333333333h,03333333333333333h
d0f0f  dq      00f0f0f0f0f0f0f0fh,00f0f0f0f0f0f0f0fh
d00ff  dq      000ff00ff00ff00ffh,000ff00ff00ff00ffh

bitswap_xmm0:
      movdqa  xmm1,xmm0
   psrlq   xmm0,1
      pand    xmm1,dqword[d5555]
  pand    xmm0,dqword[d5555]
  psllq   xmm1,1
      por     xmm0,xmm1
   movdqa  xmm1,xmm0
   psrlq   xmm0,2
      pand    xmm1,dqword[d3333]
  pand    xmm0,dqword[d3333]
  psllq   xmm1,2
      por     xmm0,xmm1
   movdqa  xmm1,xmm0
   psrlq   xmm0,4
      pand    xmm1,dqword[d0f0f]
  pand    xmm0,dqword[d0f0f]
  psllq   xmm1,4
      por     xmm0,xmm1
   movdqa  xmm1,xmm0
   psrlq   xmm0,8
      pand    xmm1,dqword[d00ff]
  pand    xmm0,dqword[d00ff]
  psllq   xmm1,8
      por     xmm0,xmm1
   pshuflw xmm0,xmm0,0*64+1*16+2*4+3
   pshufhw xmm0,xmm0,0*64+1*16+2*4+3
   pshufd  xmm0,xmm0,1*64+0*16+3*4+2
   ret

align 8
c5555     dq      05555555555555555h
c3333     dq      03333333333333333h
c0f0f     dq      00f0f0f0f0f0f0f0fh
c00ff     dq      000ff00ff00ff00ffh

bitswap_mm0:
  movq    mm1,mm0
     psrlq   mm0,1
       pand    mm1,[c5555]
 pand    mm0,[c5555]
 psllq   mm1,1
       por     mm0,mm1
     movq    mm1,mm0
     psrlq   mm0,2
       pand    mm1,[c3333]
 pand    mm0,[c3333]
 psllq   mm1,2
       por     mm0,mm1
     movq    mm1,mm0
     psrlq   mm0,4
       pand    mm1,[c0f0f]
 pand    mm0,[c0f0f]
 psllq   mm1,4
       por     mm0,mm1
     movq    mm1,mm0
     psrlq   mm0,8
       pand    mm1,[c00ff]
 pand    mm0,[c00ff]
 psllq   mm1,8
       por     mm0,mm1
     pshufw  mm0,mm0,0*64+1*16+2*4+3
     ret    
Post 22 Dec 2007, 11:48
View user's profile Send private message Visit poster's website Reply with quote
0.1



Joined: 24 Jul 2007
Posts: 474
Location: India
0.1
Thanks a lot revolution!
But they would be more useful to me if you wrap them in a nice tutorial !
Gift for the Christmas - may be !

As it is, I can't understand much (very little only - I am afraid) Sad

Thanks aGain!

_________________
Code:
 o__=-
 )
(\
 /\  
    
Post 22 Dec 2007, 12:19
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17270
Location: In your JS exploiting you and your system
revolution
I don't have any such explanation or tutorial available.

But you can try this link. Wink

Hehe, couldn't resist. Sorry if I did something bad.
Post 22 Dec 2007, 12:49
View user's profile Send private message Visit poster's website Reply with quote
kohlrak



Joined: 21 Jul 2006
Posts: 1421
Location: Uncle Sam's Pad
kohlrak
Hosted by the Lojban group... The host many interesting things, including the KLI (Klingon Language Institute). XD Though, i will say that my google skills don't give much trig in SSE2 directly, but through HLL code.

EDIT: I found a well optimised version but i havn't tested it yet... It dosn't include tangent/arctangent, though...
Post 22 Dec 2007, 13:01
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger Reply with quote
0.1



Joined: 24 Jul 2007
Posts: 474
Location: India
0.1
Ho! Ho! Ho!
Laughing Laughing Laughing

_________________
Code:
 o__=-
 )
(\
 /\  
    
Post 22 Dec 2007, 13:22
View user's profile Send private message Reply with quote
kohlrak



Joined: 21 Jul 2006
Posts: 1421
Location: Uncle Sam's Pad
kohlrak
Interesting topic

Though, i know a simple operation change that would increase accurecy, but decrease the speed. That would be to use an equation for pie that uses the distributive property. Note, though, that i havn't tested the code at the other end of that link.
Post 22 Dec 2007, 13:24
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88
If you want to do trigonometry with SIMD instructions learn about the Taylor Series. Learning the math behind these little code snippets is necessary if you want to write and understand them.
Post 24 Dec 2007, 00:30
View user's profile Send private message Visit poster's website Reply with quote
kohlrak



Joined: 21 Jul 2006
Posts: 1421
Location: Uncle Sam's Pad
kohlrak
Right, i tried looking them up a few times, but i didn't quite understand their methods (probably inaccuret sites since my googling skills are horrible). According to this site (though i didn't check it), the taylor series is supposedly slow, but i'll take a look at it with my own conclusions. But yes, that is my objective.
Post 24 Dec 2007, 01:19
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias
0.1 wrote:
...But they would be more useful to me if you wrap them in a nice tutorial ! Gift for the Christmas - may be ! ...

http://www.neilkemp.us/v3/tutorials/SSE_Tutorial_1.html
http://arstechnica.com/articles/paedia/cpu/simd.ars/
http://www.intel.com/products/processor/manuals/index.htm
somewhat less useful:
http://coding.derkeiler.com/Archive/Assembler/comp.lang.asm.x86/2006-01/msg00278.html
merry christmas
Smile
Post 24 Dec 2007, 12:10
View user's profile Send private message Reply with quote
0.1



Joined: 24 Jul 2007
Posts: 474
Location: India
0.1
Thanks tom !
I will be more grateful to you if I could understand something from these links Smile
Post 26 Dec 2007, 09:05
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
See my last post in this thread, I hope it explains it well enough for a beginner level (it talks mostly about the idea, not the instructions themselves.. examples are in MMX).
Post 26 Dec 2007, 12:34
View user's profile Send private message Reply with quote
0.1



Joined: 24 Jul 2007
Posts: 474
Location: India
0.1
[quote=The_Grey_Beast]hope this helps[/quote]
quite a lot Smile
Post 26 Dec 2007, 12:39
View user's profile Send private message Reply with quote
0.1



Joined: 24 Jul 2007
Posts: 474
Location: India
0.1
Yuck! Why it's looking ugly? Shocked
Post 26 Dec 2007, 12:40
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
Post 25 Jan 2008, 07:30
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.