flat assembler
Message board for the users of flat assembler.

Index > Main > SSE Random Numbers

Thread Post new topic Reply to topic

Joined: 26 Aug 2008
Posts: 227
pal 11 Jul 2009, 23:18
Well basically I converted a code from C++ into assembly language and I decided to use SSE to do it as it contains some large integers. The code is in C++:

Struct Ran {
    Ullong u, v, w;
    Ran(Ullong j) : v(4101842887655102017LL), w(1) {
        // Constructor. Call with any integer seed (except value of v above).
        u = j ^ v; int64();
        v = u; int64();
        w = v; int64();
    inline Ullong int64() {
        // Return 64-bit random integer. See text for explanation of method.
        u = u * 2862933555777941757LL + 7046029254386353087LL;
        v ^= v >> 17; v ^= v << 31; v ^= v >> 8;
        w = 4294957665U*(w & 0xffffffff) + (w >> 32);
        Ullong x = u ^ (u << 21); x ^= x >> 35; x ^= x << 4;
        return (x + v) ^ w;
    inline Doub doub() { return 5.42101086242752217E-20 * int64(); }
    // Return random double-precision floating value in the range 0. to 1.
    inline Uint int32() { return (Uint)int64(); }
    // Return 32-bit random integer.

You have to slightly modify it to run it. This code is taken from Numerical Recipes 3rd Edition. The code I came up with was:

   push    ebp
 mov             ebp,esp
     mov             eax,[ebp+8]
 movd    xmm0,eax
    mov             eax,4101842887
      movd    xmm7,eax
    pslldq  xmm7,4
      mov             eax,655102017
       movd    xmm2,eax 
   paddd   xmm7,xmm2 
  pxor    xmm0,xmm7 ; u = j ^ v
       call    produceRandom
       movq    xmm7,xmm0 ; v = u
   call    produceRandom
       movq    xmm5,xmm7 ; w = v
   call    produceRandom
       mov             esp,ebp
     pop             ebp
 retn    4
          ; u = u * 2862933555777941757LL + 7046029254386353087LL;
            mov             eax,2862933555
              movd    xmm2,eax
            pslldq  xmm2,4
              mov             eax,777941757
               movd    xmm3,eax
            paddd   xmm2,xmm3
           pmuludq xmm0,xmm2
           mov             eax,704602925
               movd    xmm2,eax
            pslldq  xmm2,4
              mov             eax,4000000000
              movd    xmm3,eax
            paddd   xmm2,xmm3
           mov             eax,386353087
               movd    xmm3,eax
            paddd   xmm2,xmm3
           paddq   xmm0,xmm2
            ; v ^= v >> 17; v ^= v << 31; v ^= v >> 8
         ; v = xmm7
          movq    xmm6,xmm7
           psrldq  xmm6,17 ; v >> 17
             pxor    xmm7,xmm6 ; v ^= v
          movq    xmm6,xmm7
           pslldq  xmm6,31 ; v << 31
             pxor    xmm7,xmm6 ; v ^= v
          movq    xmm6,xmm7
           psrldq  xmm6,8 ; v >> 8
               pxor    xmm7,xmm6 ; v ^= v
          pxor    xmm6,xmm6
            ; w = 4294957665 * (w & 0xFFFFFFFF) + (w >> 32)
           ; w = xmm5
          movq    xmm6,xmm5
           mov             eax,4294957665
              movd    xmm4,eax
            mov             eax,0xFFFFFFFF
              movd    xmm3,eax
            pand    xmm6,xmm3
           pmuludq xmm6,xmm4
           psrldq  xmm5,32
             paddq   xmm5,xmm6
           pxor    xmm3,xmm3 ; Cleanup
         pxor    xmm4,xmm4
           pxor    xmm6,xmm6
            ; x = u ^ (u << 21); x ^= x >> 35; x ^= x << 4;
           ; x = xmm6
          movq    xmm6,xmm0
           movq    xmm4,xmm0
           pslldq  xmm4,31 ; x << 31
             pxor    xmm6,xmm4 ; x ^= x
          movq    xmm4,xmm6 
          psrldq  xmm4,35 ; x >> 35
             pxor    xmm6,xmm4 ; x ^= x
          movq    xmm4,xmm6
           pslldq  xmm4,4 ; x << 4
               pxor    xmm6,xmm4 ; x ^= x
            ; return (x + v) ^ w;
               movq    xmm4,xmm6 
          paddq   xmm4,xmm7 ; x + v
           pxor    xmm4,xmm5
           movd    eax,xmm4

I may be quite a bit off with the accuracy, but it produces pseudo-random numbers. I do know that I do some pxor's when I then go and use movd to put information into the register. Apparently the C++ method is really good. I just have a few questions though as I could not find them in the manuals.

1. When loading a value like 2862933555777941757 into an XMM register, am I doing it the correct way? I.e. loading 2862933555 into GPR, then into XMM, then shifting it (pslldq) by a dword, then loading 777941757 into a different XMM register, then doing something (paddd, por, pxor etc.) to concat the string.

2. When wanting to subtract 1 from an XMM register (I don't do this here, but I did need it the other day), is there a quicker way to do it than:

mov eax,1
movd xmm1,eax
psubb xmm0,xmm1

psubb doesn't allow an immediate value, and I couldn't find a pdec instruction mnemonic.

3. How do I tell what version of SSE the computer has. Like I was doing some coding the other day with SSE and I needed to check if xmm0 was 0 or not so I was going to use ptest xmm0,xmm0 but apparently I don't have SSE4.0+ so that didn't work, and in the end I didn't actually find a way to check what value was in xmm0 Confused

Cheers for any answers, pal.

P.S. My bad for all the questions Razz
Post 11 Jul 2009, 23:18
View user's profile Send private message Reply with quote

Joined: 30 Jun 2004
Posts: 827
windwakr 11 Jul 2009, 23:41
3. Use CPUID, its in the cpuid manual.

Call CPUID with EAX being 1

EDX can tell you support for SSE and SSE2
bit 25 is SSE bit 26 is SSE2

ECX can tell you support for SSE3 SSSE3 SSE4.1 and SSE4.2
bit 0 is SSE3 bit 9 is SSSE3 bit 19 is SSE4.1 bit 20 is SSE4.2

EDIT: *FACEPALM*, I stupidly forgot that the Windows API modifies the registers, boy do I feel dumb...Works good now...

;SSE detection, by ---- aka windwakr
;Saturday, July 11, 2009  7/11/09 Smile
;Seriously, why does the rest of the world do dates the "logical" way? mm/dd/yy is so much better...
include "win32ax.inc"                                       

supported1 dd ?
supported2 dd ?
sse db ' SSE',0
sse2 db ', SSE2',0
sse3 db ', SSE3',0
sse33 db ', SSSE3',0
sse41 db ', SSE4.1',0
sse42 db ', SSE4.2',0

title db 'SSE support',0
support db 'Your machine has support for these SSE versions:',13,10,0
buffer rb 256

mov eax,1
mov [supported1],edx
mov [supported2],ecx
test [supported1],00000010000000000000000000000000b
jz @f
invoke lstrcat,support,sse
test [supported1],00000100000000000000000000000000b
jz @f
invoke lstrcat,support,sse2
test [supported2],00000000000000000000000000000001b
jz @f
invoke lstrcat,support,sse3
test [supported2],00000000000000000000001000000000b
jz @f
invoke lstrcat,support,sse33
test [supported2],00000000000010000000000000000000b
jz @f
invoke lstrcat,support,sse41
test [supported2],00000000000100000000000000000000b
jz @f
invoke lstrcat,support,sse42

invoke MessageBox,0,support,title,MB_OK
invoke ExitProcess,0
.end start

----> * <---- My star, won HERE

Last edited by windwakr on 15 Jul 2012, 21:02; edited 7 times in total
Post 11 Jul 2009, 23:41
View user's profile Send private message Reply with quote

Joined: 27 Dec 2004
Posts: 805
r22 12 Jul 2009, 00:10

In regards to your questions
1 - Loading data into XMMX is best performed all at once using an aligned data source and the MOVDQA opcode
align 16
bignum dq 1234567812345678h, 11111111ffffffffh
movdqa xmm0, dqword[bignum]

2. Subtracting one can be performed similar to the above, using an aligned data source.
align 16
sub1fromallbytes dq 0101010101010101h, 0101010101010101h
sub1fromallqwords dq 0000000000000001h, 0000000000000001h
psubb XMM0,dqword[sub1fromallbytes]
psubq XMM0,dqword[sub1fromallqwords]

I find encryption and prng a very interesting topic. In fact I wrote my honors thesis for college on it, and documented my work on this board. http://board.flatassembler.net/topic.php?t=6518&postdays=0&postorder=asc&start=0
You may want to read it if you get a chance.
The PROE algorithm is used for encryption, but the PRNG portion can be easily stripped out and analyzed.

Using the NIST (national institute of standards and technology) test suite as well as ENT (a program used for testing randomness). I was able to verify that my PRNG is cryptographically secure (extremely random).
Post 12 Jul 2009, 00:10
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote

Joined: 26 Aug 2008
Posts: 227
pal 12 Jul 2009, 12:02
windwakr: I thought it may have been something to do with cpuid, stupidly I didn't check the manual even though I was reading it the other day. Personally to test it I would have used bt but upon reading I have found that this takes 4 clocks for a reg,imm8.

r22: Again stupidly I completely forgot about just putting the data in a memory location, though I haven't come across movdqa yet so I'll have a play with that. Your PRNG looks very interesting, I am going to be sure to have a decent look at it, cheers for linking me to that man.

Cheers to both of you for your help by the way Wink
Post 12 Jul 2009, 12:02
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum

Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.