Good / fast PRNG, for floats bonus

Index > Main > Good / fast PRNG, for floats bonus

Goto page 1, 2 Next

Author

Thread

Furs

Joined: 04 Mar 2016
Posts: 2738

Furs 22 Jan 2017, 19:08

I'm looking for a PRNG that's optimized for speed but it provides very good results for a PRNG, not cheap stuff. It's not for cryptography by any means, but it's just my perfectionism.

Now, obviously, anything that exploits specialized assembly hacks or instructions is more than welcome, and that's why I'm asking. In fact, that's exactly what I want: assembly hacks for it. I'm kinda aware of most of the "C" PRNGs on the internet already. I want to see if someone here found anything better in asm so I can compare.

I don't have much knowledge of the exotic/weird "new" instructions in CPUs these days (IIRC there's a CRC instruction or something?), but if they can be used for a PRNG then I'm all ears. (but available in 32-bit with 64-bit optional, doesn't matter, but I mean 32-bit mode in a 64-bit capable CPU is perfectly fine, not necessarily for "old CPUs").

If the quality of the PRNG isn't too good, but it is super fast, that's okish too, I guess cause, I can probably reseed it once every X uses of it from things like /dev/urandom (Linux). Smile

Oh and if it can be "tailored" for floats within 0...1 range (inclusive), that's even better. Of course I know I can do the usual conversion, but I'm asking if there's a specific hack for it.

Please note that I don't want the ultimate speed PRNG which gives poor results. Anything that's comparable in speed (but not slower!) than the standard rand() algorithms (like in C stdlib) is fine... if it gives considerable better results, of course. The rand() in Window's CRT is horrible in quality, for example, also quite slow. If something has same speed but much better quality, it's good for me.

Thanks!

(any examples are welcome, including AVX/AVX2 instructions if there's any suited for the job btw! even a hack mixing the x87 FPU with SSE or using just the FPU, thought it's highly unlikely it will even be good unless there's a trick to utilize more pipelines or something like that)

22 Jan 2017, 19:08

neville

Joined: 13 Jul 2008
Posts: 507
Location: New Zealand

neville 22 Jan 2017, 23:07

There are, afaik, an infinite number of irrational numbers out there. The fractional expansions of such numbers never repeat themselves (in any number base) so by definition these digit sequences should be regarded as pseudo random.

There are also a large number of algorithms out there for calculating a small number of irrational numbers, notably pi, e, ln2, etc.

Recently I've been playing around with some of them and thought perhaps they could be of use to you? The "spigot" algorithms might be the best starting point for your purpose because they generate the output sequences immediately, although I don't know if they'd be fast enough for you overall. You could use and interpret the byte sequences in any way you like of course. Might be worth a try.

Somebody is bound to suggest that the digits of pi, for example, could not be considered "random", because the sequence has been published to trillions of (decimal) digits so is now "predictable". However I would firstly say that, unless you can remember the sequence, it doesn't matter. Of course your computer could store the sequence, but that could also be an advantage during the testing phase of some software e.g. a Monte Carlo simulation - being able to compare results with the same set of "random" inputs. Obviously that wouldn't be appropriate when the program goes live though.

Secondly I would suggest that your initial seed could be used to generate an unknown number, not just pi or e etc. Again the spigot algorithms are useful because they typically rely on an initialised buffer to generate the required value. The question would then arise, can irrationality be guaranteed?

Lastly, of course you could convert the output to use any number base, not just base 10 or "decimal". I doubt whether there are too many published lists for the digits of pi in base 23, for example.

_________________
FAMOS - the first memory operating system

22 Jan 2017, 23:07

redsock

Joined: 09 Oct 2009
Posts: 452
Location: Australia

redsock 22 Jan 2017, 23:57

IMO, Mr. Agner Fog has the PRNG space fully sorted: http://www.agner.org/random/ ... my own library's random generator is based on two of his, it is quite fast: https://2ton.com.au/library_as_html/rnc.inc.html

His asmlib contains assembly language versions of his RNG resources as well: www.agner.org/optimize/ (See the "Subroutine library" section).

_________________
2 Ton Digital - https://2ton.com.au/

22 Jan 2017, 23:57

Furs

Joined: 04 Mar 2016
Posts: 2738

Furs 23 Jan 2017, 02:30

neville wrote:

There are, afaik, an infinite number of irrational numbers out there. The fractional expansions of such numbers never repeat themselves (in any number base) so by definition these digit sequences should be regarded as pseudo random.

There are also a large number of algorithms out there for calculating a small number of irrational numbers, notably pi, e, ln2, etc.

Recently I've been playing around with some of them and thought perhaps they could be of use to you? The "spigot" algorithms might be the best starting point for your purpose because they generate the output sequences immediately, although I don't know if they'd be fast enough for you overall. You could use and interpret the byte sequences in any way you like of course. Might be worth a try.

Sounds like an interesting idea, but most of those sequences use mostly just "normal" mathematical operations, so I doubt they would be much better than things like "xorshift" which of course I already considered. I mean, maybe I'm asking for the impossible, but that's why I asked, just wanted to see if I missed a hidden hack or asm gem Smile

redsock wrote:

IMO, Mr. Agner Fog has the PRNG space fully sorted: http://www.agner.org/random/ ... my own library's random generator is based on two of his, it is quite fast: https://2ton.com.au/library_as_html/rnc.inc.html

Your link doesn't work it says "404 Gone Walkabout" Sad

I checked his vectorized rand "vectorclass" and... of course it's in C++. I dread the complexity and bloat of that code, look at the hideous monstrosity and how much code (and how many files!!! wtf?) for a stupid random algorithm. Sigh.

I'm sorry but I just don't even know where to start from his code to learn it, I hate OOP abstractions and stuff like that, I prefer everything explicit. Like in this case, not a million functions, just 1 to show the random algorithm... I guess it's too much to ask from him Confused

I don't want to "use" his vector class, I just want to see the implementation in one function not 1 million of them. Maybe your site is much clearer for an asm/C person like me but it's down at the moment for me as I said Sad

Thanks though... I don't want to give impression I'm ranting but I'm just lost in that code Smile

23 Jan 2017, 02:30

redsock

Joined: 09 Oct 2009
Posts: 452
Location: Australia

redsock 23 Jan 2017, 04:28

Furs wrote:

redsock wrote:
IMO, Mr. Agner Fog has the PRNG space fully sorted: http://www.agner.org/random/ ... my own library's random generator is based on two of his, it is quite fast: https://2ton.com.au/library_as_html/rnc.inc.html
Your link doesn't work it says "404 Gone Walkabout"

Lol, helps if I spell it right (hand-typed links, go figure, haha)... try this one instead:

https://2ton.com.au/library_as_html/rng.inc.html

Smile

_________________
2 Ton Digital - https://2ton.com.au/

23 Jan 2017, 04:28

neville

Joined: 13 Jul 2008
Posts: 507
Location: New Zealand

neville 23 Jan 2017, 09:08

Furs wrote:

but most of those sequences use mostly just "normal" mathematical operations, so I doubt they would be much better than things like "xorshift" which of course I already considered.)

Except maybe that in theory the sequence would never repeat? Although in reality I guess that would require an infinite amount of RAM (and take an infinitely long time to calculate...)

_________________
FAMOS - the first memory operating system

23 Jan 2017, 09:08

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 21011
Location: In your JS exploiting you and your system

revolution 23 Jan 2017, 12:25

But computing digits of pi (or e, or sqrt(2), etc.), while guaranteed never to repeat, is also nowhere near as fast as "normal" PRNGs that do repeat over some finite period.

Later CPUs have a dedicated RNG instruction. It does have a limited rate for new bits to be generated but still quite fast and easy to code.

23 Jan 2017, 12:25

Furs

Joined: 04 Mar 2016
Posts: 2738

Furs 23 Jan 2017, 13:22

redsock wrote:

Lol, helps if I spell it right (hand-typed links, go figure, haha)... try this one instead:

Thanks, looks interesting. I'll analyze it and do some tests, but ultimately will test it with my actual code (I know revolution will say that's the proper way to test Razz

)

neville wrote:

Except maybe that in theory the sequence would never repeat? Although in reality I guess that would require an infinite amount of RAM (and take an infinitely long time to calculate...)

That's true, but, my problem isn't exactly if it repeats, but whether it's predictable. Digits of pi, while they never repeat, are predictable. I know you weren't talking about that, but I mean any common irrational is likely to be predictable though.

The reason I'm not worried about repeat is because I will constantly reseed the RNG from outside source. To make it multi-thread safe I was thinking to have a normal RNG that simply XORs with the seed every once in a while. If the seed was updated, it XORs with a new value (global seed that gets updated by a timer thread), otherwise with old value so it doesn't help here until it gets changed.

Reseeding would be done with e.g. /dev/urandom on Linux or RDRAND or something like that (just changing the global variable, no atomic operation needed as it shouldn't be a problem if it xor with old seed, right?).

What do you think of it? Or is this approach bad?

EDIT: Note when I speak of global seed I don't mean the RNG's state. The RNG's state is local (unless there's a better way to have it thread safe). The global seed is something else, a random number generated from outside source by a timer thread and updated every X units of time etc. Local RNGs will never modify it, that's what I mean.

revolution wrote:

Later CPUs have a dedicated RNG instruction. It does have a limited rate for new bits to be generated but still quite fast and easy to code.

Sorry I forgot to mention I was aware of this instruction.

The problem to me seems to be that it can "fail" and return zeros (carry flag check) -- at which point, I need to do polling over it, not really feasible is it?

I can understand failing due to lack of entropy, kinda how /dev/random works, but it's obviously not practical for non-cryptographic purposes.

Last edited by Furs on 23 Jan 2017, 13:31; edited 1 time in total

23 Jan 2017, 13:22

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 21011
Location: In your JS exploiting you and your system

revolution 23 Jan 2017, 13:29

Furs wrote:

Sorry I forgot to mention I was aware of this instruction.

The problem to me seems to be that it can "fail" and return zeros (carry flag check) -- at which point, I need to do polling over it, not really feasible is it?

I would have thought the opposite. It is very easy to code, just a simple loop.

Furs wrote:

I can understand failing due to lack of entropy, kinda how /dev/random works, but it's obviously not practical for non-cryptographic purposes.

Once again I would have thought the opposite. It is great for non-crypto purposes. And potentially bad for crypto uses because of the unknown underlying algorithms and seeding arrangements. But it can be used to add to an already existing entropy pool as part of an overall system for full-crypto usage.

23 Jan 2017, 13:29

Furs

Joined: 04 Mar 2016
Posts: 2738

Furs 23 Jan 2017, 13:33

What do you mean the opposite? I don't get it. Smile

I need a RNG that always returns in a similar amount of time and not "blocks", because it's for real time processing. I mean it's ok to be "slowish" but not very slow and it should always have similar speed not an unpredictable "block", but I only want it slow if it means very good RNG quality though.

For example it's ok to use RDRAND in the timer thread that updates the "global seed" I mentioned, because even if it fails, processing will still continue -- only that the quality will be lower. That's ok though.

23 Jan 2017, 13:33

edfed

Joined: 20 Feb 2006
Posts: 4366
Location: Now

edfed 23 Jan 2017, 16:20

http://www.cacert.at/cgi-bin/rngresults

https://board.flatassembler.net/topic.php?t=8068

23 Jan 2017, 16:20

Furs

Joined: 04 Mar 2016
Posts: 2738

Furs 23 Jan 2017, 17:27

@edfed: I will look into that thanks. I'm open to any algorithm/idea since I can just combine them together with XOR to make them way less predictable (see below what I mean). Goal isn't security but obscurity so that "others" don't have same sequences ever.

@redsock: Ok I'm afraid I'll have to say that your code (well, since it is based on Agner's) seems way too complex, I can't really understand some of it.

There are some functions in there which I've no idea what they even are, like hmac_drbg$generate, and the absolutely huge state (or what's the globals for?), I'll obviously have to understand it to make it thread-safe. Ideally I want as small a state as possible.

I guess I wasn't very detailed at the beginning, I apologize for that. I need it to be unpredictable but not necessarily security-grade since it won't be used for security. It will be used in DSP (Digital Signal Processing) as a plugin I'm writing, in realtime, and I just want to avoid pitfalls of common things which can result in same "sequence" for some transformations/filters/random effects etc, because everybody knows them. (for example the standard rand() function in C runtime on Windows msvcrt.dll is horrible, poor quality, low amount of numbers, and every cheap programmer out there is using it so it is very "predictable").

That's what I mean by unpredictable. I suppose "obscurity" would fit it also.

I looked more into rdrand, since my CPU supports it (Intel Haswell Xeon E3-1241 v3). It seems Intel said "if it fails after 10 tries, the chances are astronomically small, so it is likely a problem in the CPU" or something along those lines. I overestimated the amount of times it would need to be retried it seems -- I literally thought you'd have to poll it for long period of times, like /dev/random blocks when not enough entropy. (like stalling a CPU core at 100% use for seconds).

Still, retrying for 10 times isn't acceptable due to performance implications when I need it called very often many thousands of times per second.

So I was thinking to use fast/simple PRNGs as main "rand" units, but re-seeded often from rdrand by a separate timer thread or after a given number of times using "rand" (obviously, even 5 times per second timer is FAR less than thousands of times per second of RDRAND). A secondary timer using /dev/urandom could be employed, too, with less calls obviously.

Like, use XOR of xorshift (or xoroshiro, why not use rotates since asm?!) and the simple "multiply by prime" rand... with their seeds periodically changed by RDRAND / dev/urandom. That should be oscure/unpredictable enough to have true random as far as signal analysis is concerned, maybe I'm way too obsessed. Smile

Again, I'm open to any use of extra randomness from "exotic" instructions, idk if I can make use of the AES instructions in the CPU for example and so on.

And lastly, any tricks to make it random into "floats"? (I prefer floats over doubles, due to performance, since I'll probably be using AVX/AVX2 and not scalars)

I read some simple tricks, like just randomize the bits of mantissa and get proper exponent (1 subtraction), which is fine, until you realize that a number in 0...1 range (or -1...1) can have negative exponents.

Randomizing the mantissa/significand only means that you only get random distribution when exponent is 0 or whatever. Is that ok or am I missing something?

Is it even good for uniform distribution to have negative exponents? (wouldn't that make more numbers in the lower end and ruin uniformity?). Maybe I'm overthinking this, but I'm really not a mathematician in the slightest so I don't understand "properties" of how RNGs or even hash functions work. I understand the operations, but not how they *work*, so I've no idea how one is more uniform etc than another. That's why I'm asking such dumb questions. Smile

I'll probably end up going with the hybrid 3-way method I've mentioned to make it extremely unpredictable and still keep its performance high.

Thanks for all info so far, keep them coming if you find new ways, I'm sure they'd be educational for anyone stumbling on this thread, even if not suitable for my needs here (like the irrational number one, which is cool idea, but unfortunately not for what I need now). Wink

(btw edfed, huuge list of RNGs lol! shame you can't sort it by criteria?)

23 Jan 2017, 17:27

redsock

Joined: 09 Oct 2009
Posts: 452
Location: Australia

redsock 23 Jan 2017, 21:16

I hacked together a quick and dirty example re: floating point conversion for you to contemplate:

Code:

include '../../ht_defaults.inc'
include '../../ht.inc'

use_rdrand = 0

; function to turn rdrand or /dev/urandom 64 bits into a double
; return in xmm0
falign
rdrand_double:
        prolog  rdrand_double
        movq    xmm1, [.one]
if use_rdrand
@@:
        rdrand  rax
        jnc     @b
else
        ; example only, load from /dev/urandom instead
        mov     rdi, .fname
        xor     esi, esi        ; O_RDONLY
        mov     eax, syscall_open
        syscall
        ; cmp eax, 0
        ; jl .kakked
        push    rax
        sub     rsp, 8
        mov     rdi, rax        ; fd
        mov     rsi, rsp        ; buf
        mov     edx, 8          ; len
        mov     eax, syscall_read
        syscall
        mov     rdi, [rsp+8]
        mov     eax, syscall_close
        syscall
        pop     rax rcx         ; rcx junk fd
end if
        movq    xmm0, rax
        psrlq   xmm0, 12
        por     xmm0, xmm1
        subsd   xmm0, xmm1
        epilog
dalign
.one:
        dd      0x0, 0x3ff00000
if use_rdrand = 0
.fname: db      '/dev/urandom',0
end if
        


public _start
falign
_start:
        call    ht$init

        call    rdrand_double
        mov     edi, double_string_fixed
        mov     esi, 3
        call    string$from_double
        push    rax
        mov     rdi, rax
        call    string$to_stdout
        pop     rdi
        call    heap$free

        mov     eax, syscall_exit
        xor     edi, edi
        syscall

include '../../ht_data.inc'

That is linked against my library of course, but is a good example.

My older Sandy Bridge processor doesn't support RDRAND, so my example uses the Linux /dev/urandom, but any random source for the 8 bytes the function gets will suffice.

Also, since your requirements dont' involve security, have you considered using the much simpler LFSR? https://en.wikipedia.org/wiki/Linear-feedback_shift_register (Specifically, see the "Uses in cryptography" section there)

Re: float vs. double, the above code is for doubles, but a similar technique can be used for 32bit instead without much difficulty.

Cheers!

_________________
2 Ton Digital - https://2ton.com.au/

23 Jan 2017, 21:16

Furs

Joined: 04 Mar 2016
Posts: 2738

Furs 23 Jan 2017, 22:16

Thanks, that's basically the randomized mantissa trick I mentioned earlier, but at least you confirmed it's a "good" way to get a random float (I was afraid it wouldn't be uniform or something like that, since I'm clueless). Smile

LFSR is interesting, but it seems impractical on software level (only 1 bit per iteration?), but I'll look into it a bit more and see if I find anything.

23 Jan 2017, 22:16

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 21011
Location: In your JS exploiting you and your system

revolution 23 Jan 2017, 23:47

Furs wrote:

I looked more into rdrand, since my CPU supports it (Intel Haswell Xeon E3-1241 v3). It seems Intel said "if it fails after 10 tries, the chances are astronomically small, so it is likely a problem in the CPU" or something along those lines. I overestimated the amount of times it would need to be retried it seems -- I literally thought you'd have to poll it for long period of times, like /dev/random blocks when not enough entropy. (like stalling a CPU core at 100% use for seconds).

This rdrand thing can generate ~3Gb/s, so if you only want thousands per second the it won't be a problem and quite likely won't ever fail or need looping ever. But rather than just making assumptions, test it.

23 Jan 2017, 23:47

Furs

Joined: 04 Mar 2016
Posts: 2738

Furs 24 Jan 2017, 01:02

Well I did test but how can I be sure it won't act up sometimes? Since I don't even know why exactly it fails at all since it's a PRNG itself (not like rdseed). When it does fail, currently it returns zero (according to Intel), which is kinda a problem so I have to loop.

I was only worried that it would suddenly fail nonstop (Intel don't mention why it would fail at all, so I assume a bug of sorts) and then "freeze" the program if I retried it in an infinite loop (because what am I supposed to do if it's the only RNG source I have? if it fails, use zero?). Not often but well, like a random crash/freeze. Confused

24 Jan 2017, 01:02

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 21011
Location: In your JS exploiting you and your system

revolution 24 Jan 2017, 01:52

Apparently it "fails" when the entropy has been exhausted. Then you have to loop and try again until the entropy has been refilled. It won't ever get stuck in an infinite loop, it is not so dumb as to do that. If you are using bits at a high rate (i.e. more than ~3Gb/s) then you'll have wait a short time to let it catch up.

Last edited by revolution on 24 Jan 2017, 16:04; edited 1 time in total

24 Jan 2017, 01:52

Furs

Joined: 04 Mar 2016
Posts: 2738

Furs 24 Jan 2017, 15:58

Still, if the entropy is exhausted, how many times can it fail in a row? For a realtime system, getting a "spike" in CPU use can be pretty bad. If I use it to process such signals (say audio, for simplicity), I'll get drop-outs, crackles, pops etc, you know the typical audio glitches with high CPU use when it can't fill the buffer fast enough. Mind you this won't be that expensive itself but it won't be the only effect/transformation running, so keeping its CPU as low as "possible" (within reason) is quite important, even if by itself it only uses ~20% or less etc. I'm not afraid of constant fails, but big "spikes" in fails which happen rarely.

So, to get rid of that problem, I was thinking to calculate a simple "normal" PRNG alongside RDRAND, and simply XOR them together. Then either 1) not retry at all or 2) retry a specific number of times max.

Even if it returns zero, I'll at least get the simple PRNG result, which shouldn't be predictable since I doubt it will return zero after failing so many times in a row. And once RDRAND returns once, the "predictable sequence" is gone.

24 Jan 2017, 15:58

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 21011
Location: In your JS exploiting you and your system

revolution 24 Jan 2017, 16:07

Furs wrote:

Still, if the entropy is exhausted, how many times can it fail in a row?

Test it in your app and see what results you get. Different apps will get different results obviously so we can't give you a specific answer.

24 Jan 2017, 16:07

Furs

Joined: 04 Mar 2016
Posts: 2738

Furs 24 Jan 2017, 18:03

Well I'll have to do a really long test and "count" the times it fails in a row (max), since it's for a continuous system, so gotta keep it up for like a day or so. Still with random fails, nothing is "certain", so if I don't get one whole day more than say, 5 fails, it doesn't mean I won't get 10 fails on a bad day in a row. I doubt it depends on the app that much, since the entropy is hardware-based. I guess depends how often it is called though.

Hmm thinking of something unrelated, I wonder if it's possible to do a local denial of service in a multi-user environment (to RDRAND's quality I mean) since it's unprivileged instruction, same as /dev/urandom can be abused by users to decrease quality entropy. (nothing to do with me tho Razz

)

Anyway I'll have to implement some other things about it first which I'm working on at the moment before I can start testing, at least I got the random thing sorted Smile

24 Jan 2017, 18:03

Goto page 1, 2 Next

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum