flat assembler
Message board for the users of flat assembler.
Index
> Main > Good / fast PRNG, for floats bonus Goto page 1, 2 Next |
Author |
|
neville 22 Jan 2017, 23:07
There are, afaik, an infinite number of irrational numbers out there. The fractional expansions of such numbers never repeat themselves (in any number base) so by definition these digit sequences should be regarded as pseudo random.
There are also a large number of algorithms out there for calculating a small number of irrational numbers, notably pi, e, ln2, etc. Recently I've been playing around with some of them and thought perhaps they could be of use to you? The "spigot" algorithms might be the best starting point for your purpose because they generate the output sequences immediately, although I don't know if they'd be fast enough for you overall. You could use and interpret the byte sequences in any way you like of course. Might be worth a try. Somebody is bound to suggest that the digits of pi, for example, could not be considered "random", because the sequence has been published to trillions of (decimal) digits so is now "predictable". However I would firstly say that, unless you can remember the sequence, it doesn't matter. Of course your computer could store the sequence, but that could also be an advantage during the testing phase of some software e.g. a Monte Carlo simulation - being able to compare results with the same set of "random" inputs. Obviously that wouldn't be appropriate when the program goes live though. Secondly I would suggest that your initial seed could be used to generate an unknown number, not just pi or e etc. Again the spigot algorithms are useful because they typically rely on an initialised buffer to generate the required value. The question would then arise, can irrationality be guaranteed? Lastly, of course you could convert the output to use any number base, not just base 10 or "decimal". I doubt whether there are too many published lists for the digits of pi in base 23, for example. _________________ FAMOS - the first memory operating system |
|||
22 Jan 2017, 23:07 |
|
redsock 22 Jan 2017, 23:57
IMO, Mr. Agner Fog has the PRNG space fully sorted: http://www.agner.org/random/ ... my own library's random generator is based on two of his, it is quite fast: https://2ton.com.au/library_as_html/rnc.inc.html
His asmlib contains assembly language versions of his RNG resources as well: www.agner.org/optimize/ (See the "Subroutine library" section). |
|||
22 Jan 2017, 23:57 |
|
Furs 23 Jan 2017, 02:30
neville wrote: There are, afaik, an infinite number of irrational numbers out there. The fractional expansions of such numbers never repeat themselves (in any number base) so by definition these digit sequences should be regarded as pseudo random. redsock wrote: IMO, Mr. Agner Fog has the PRNG space fully sorted: http://www.agner.org/random/ ... my own library's random generator is based on two of his, it is quite fast: https://2ton.com.au/library_as_html/rnc.inc.html I checked his vectorized rand "vectorclass" and... of course it's in C++. I dread the complexity and bloat of that code, look at the hideous monstrosity and how much code (and how many files!!! wtf?) for a stupid random algorithm. Sigh. I'm sorry but I just don't even know where to start from his code to learn it, I hate OOP abstractions and stuff like that, I prefer everything explicit. Like in this case, not a million functions, just 1 to show the random algorithm... I guess it's too much to ask from him I don't want to "use" his vector class, I just want to see the implementation in one function not 1 million of them. Maybe your site is much clearer for an asm/C person like me but it's down at the moment for me as I said Thanks though... I don't want to give impression I'm ranting but I'm just lost in that code |
|||
23 Jan 2017, 02:30 |
|
redsock 23 Jan 2017, 04:28
Furs wrote:
https://2ton.com.au/library_as_html/rng.inc.html |
|||
23 Jan 2017, 04:28 |
|
neville 23 Jan 2017, 09:08
Furs wrote: but most of those sequences use mostly just "normal" mathematical operations, so I doubt they would be much better than things like "xorshift" which of course I already considered.) _________________ FAMOS - the first memory operating system |
|||
23 Jan 2017, 09:08 |
|
revolution 23 Jan 2017, 12:25
But computing digits of pi (or e, or sqrt(2), etc.), while guaranteed never to repeat, is also nowhere near as fast as "normal" PRNGs that do repeat over some finite period.
Later CPUs have a dedicated RNG instruction. It does have a limited rate for new bits to be generated but still quite fast and easy to code. |
|||
23 Jan 2017, 12:25 |
|
Furs 23 Jan 2017, 13:22
redsock wrote: Lol, helps if I spell it right (hand-typed links, go figure, haha)... try this one instead: neville wrote: Except maybe that in theory the sequence would never repeat? Although in reality I guess that would require an infinite amount of RAM (and take an infinitely long time to calculate...) The reason I'm not worried about repeat is because I will constantly reseed the RNG from outside source. To make it multi-thread safe I was thinking to have a normal RNG that simply XORs with the seed every once in a while. If the seed was updated, it XORs with a new value (global seed that gets updated by a timer thread), otherwise with old value so it doesn't help here until it gets changed. Reseeding would be done with e.g. /dev/urandom on Linux or RDRAND or something like that (just changing the global variable, no atomic operation needed as it shouldn't be a problem if it xor with old seed, right?). What do you think of it? Or is this approach bad? EDIT: Note when I speak of global seed I don't mean the RNG's state. The RNG's state is local (unless there's a better way to have it thread safe). The global seed is something else, a random number generated from outside source by a timer thread and updated every X units of time etc. Local RNGs will never modify it, that's what I mean. revolution wrote: Later CPUs have a dedicated RNG instruction. It does have a limited rate for new bits to be generated but still quite fast and easy to code. The problem to me seems to be that it can "fail" and return zeros (carry flag check) -- at which point, I need to do polling over it, not really feasible is it? I can understand failing due to lack of entropy, kinda how /dev/random works, but it's obviously not practical for non-cryptographic purposes. Last edited by Furs on 23 Jan 2017, 13:31; edited 1 time in total |
|||
23 Jan 2017, 13:22 |
|
revolution 23 Jan 2017, 13:29
Furs wrote: Sorry I forgot to mention I was aware of this instruction. Furs wrote: I can understand failing due to lack of entropy, kinda how /dev/random works, but it's obviously not practical for non-cryptographic purposes. |
|||
23 Jan 2017, 13:29 |
|
Furs 23 Jan 2017, 13:33
What do you mean the opposite? I don't get it.
I need a RNG that always returns in a similar amount of time and not "blocks", because it's for real time processing. I mean it's ok to be "slowish" but not very slow and it should always have similar speed not an unpredictable "block", but I only want it slow if it means very good RNG quality though. For example it's ok to use RDRAND in the timer thread that updates the "global seed" I mentioned, because even if it fails, processing will still continue -- only that the quality will be lower. That's ok though. |
|||
23 Jan 2017, 13:33 |
|
edfed 23 Jan 2017, 16:20
|
|||
23 Jan 2017, 16:20 |
|
Furs 23 Jan 2017, 17:27
@edfed: I will look into that thanks. I'm open to any algorithm/idea since I can just combine them together with XOR to make them way less predictable (see below what I mean). Goal isn't security but obscurity so that "others" don't have same sequences ever.
@redsock: Ok I'm afraid I'll have to say that your code (well, since it is based on Agner's) seems way too complex, I can't really understand some of it. There are some functions in there which I've no idea what they even are, like hmac_drbg$generate, and the absolutely huge state (or what's the globals for?), I'll obviously have to understand it to make it thread-safe. Ideally I want as small a state as possible. I guess I wasn't very detailed at the beginning, I apologize for that. I need it to be unpredictable but not necessarily security-grade since it won't be used for security. It will be used in DSP (Digital Signal Processing) as a plugin I'm writing, in realtime, and I just want to avoid pitfalls of common things which can result in same "sequence" for some transformations/filters/random effects etc, because everybody knows them. (for example the standard rand() function in C runtime on Windows msvcrt.dll is horrible, poor quality, low amount of numbers, and every cheap programmer out there is using it so it is very "predictable"). That's what I mean by unpredictable. I suppose "obscurity" would fit it also. I looked more into rdrand, since my CPU supports it (Intel Haswell Xeon E3-1241 v3). It seems Intel said "if it fails after 10 tries, the chances are astronomically small, so it is likely a problem in the CPU" or something along those lines. I overestimated the amount of times it would need to be retried it seems -- I literally thought you'd have to poll it for long period of times, like /dev/random blocks when not enough entropy. (like stalling a CPU core at 100% use for seconds). Still, retrying for 10 times isn't acceptable due to performance implications when I need it called very often many thousands of times per second. So I was thinking to use fast/simple PRNGs as main "rand" units, but re-seeded often from rdrand by a separate timer thread or after a given number of times using "rand" (obviously, even 5 times per second timer is FAR less than thousands of times per second of RDRAND). A secondary timer using /dev/urandom could be employed, too, with less calls obviously. Like, use XOR of xorshift (or xoroshiro, why not use rotates since asm?!) and the simple "multiply by prime" rand... with their seeds periodically changed by RDRAND / dev/urandom. That should be oscure/unpredictable enough to have true random as far as signal analysis is concerned, maybe I'm way too obsessed. Again, I'm open to any use of extra randomness from "exotic" instructions, idk if I can make use of the AES instructions in the CPU for example and so on. And lastly, any tricks to make it random into "floats"? (I prefer floats over doubles, due to performance, since I'll probably be using AVX/AVX2 and not scalars) I read some simple tricks, like just randomize the bits of mantissa and get proper exponent (1 subtraction), which is fine, until you realize that a number in 0...1 range (or -1...1) can have negative exponents. Randomizing the mantissa/significand only means that you only get random distribution when exponent is 0 or whatever. Is that ok or am I missing something? Is it even good for uniform distribution to have negative exponents? (wouldn't that make more numbers in the lower end and ruin uniformity?). Maybe I'm overthinking this, but I'm really not a mathematician in the slightest so I don't understand "properties" of how RNGs or even hash functions work. I understand the operations, but not how they *work*, so I've no idea how one is more uniform etc than another. That's why I'm asking such dumb questions. I'll probably end up going with the hybrid 3-way method I've mentioned to make it extremely unpredictable and still keep its performance high. Thanks for all info so far, keep them coming if you find new ways, I'm sure they'd be educational for anyone stumbling on this thread, even if not suitable for my needs here (like the irrational number one, which is cool idea, but unfortunately not for what I need now). (btw edfed, huuge list of RNGs lol! shame you can't sort it by criteria?) |
|||
23 Jan 2017, 17:27 |
|
redsock 23 Jan 2017, 21:16
I hacked together a quick and dirty example re: floating point conversion for you to contemplate:
Code: include '../../ht_defaults.inc' include '../../ht.inc' use_rdrand = 0 ; function to turn rdrand or /dev/urandom 64 bits into a double ; return in xmm0 falign rdrand_double: prolog rdrand_double movq xmm1, [.one] if use_rdrand @@: rdrand rax jnc @b else ; example only, load from /dev/urandom instead mov rdi, .fname xor esi, esi ; O_RDONLY mov eax, syscall_open syscall ; cmp eax, 0 ; jl .kakked push rax sub rsp, 8 mov rdi, rax ; fd mov rsi, rsp ; buf mov edx, 8 ; len mov eax, syscall_read syscall mov rdi, [rsp+8] mov eax, syscall_close syscall pop rax rcx ; rcx junk fd end if movq xmm0, rax psrlq xmm0, 12 por xmm0, xmm1 subsd xmm0, xmm1 epilog dalign .one: dd 0x0, 0x3ff00000 if use_rdrand = 0 .fname: db '/dev/urandom',0 end if public _start falign _start: call ht$init call rdrand_double mov edi, double_string_fixed mov esi, 3 call string$from_double push rax mov rdi, rax call string$to_stdout pop rdi call heap$free mov eax, syscall_exit xor edi, edi syscall include '../../ht_data.inc' My older Sandy Bridge processor doesn't support RDRAND, so my example uses the Linux /dev/urandom, but any random source for the 8 bytes the function gets will suffice. Also, since your requirements dont' involve security, have you considered using the much simpler LFSR? https://en.wikipedia.org/wiki/Linear-feedback_shift_register (Specifically, see the "Uses in cryptography" section there) Re: float vs. double, the above code is for doubles, but a similar technique can be used for 32bit instead without much difficulty. Cheers! |
|||
23 Jan 2017, 21:16 |
|
Furs 23 Jan 2017, 22:16
Thanks, that's basically the randomized mantissa trick I mentioned earlier, but at least you confirmed it's a "good" way to get a random float (I was afraid it wouldn't be uniform or something like that, since I'm clueless).
LFSR is interesting, but it seems impractical on software level (only 1 bit per iteration?), but I'll look into it a bit more and see if I find anything. |
|||
23 Jan 2017, 22:16 |
|
revolution 23 Jan 2017, 23:47
Furs wrote: I looked more into rdrand, since my CPU supports it (Intel Haswell Xeon E3-1241 v3). It seems Intel said "if it fails after 10 tries, the chances are astronomically small, so it is likely a problem in the CPU" or something along those lines. I overestimated the amount of times it would need to be retried it seems -- I literally thought you'd have to poll it for long period of times, like /dev/random blocks when not enough entropy. (like stalling a CPU core at 100% use for seconds). |
|||
23 Jan 2017, 23:47 |
|
Furs 24 Jan 2017, 01:02
Well I did test but how can I be sure it won't act up sometimes? Since I don't even know why exactly it fails at all since it's a PRNG itself (not like rdseed). When it does fail, currently it returns zero (according to Intel), which is kinda a problem so I have to loop.
I was only worried that it would suddenly fail nonstop (Intel don't mention why it would fail at all, so I assume a bug of sorts) and then "freeze" the program if I retried it in an infinite loop (because what am I supposed to do if it's the only RNG source I have? if it fails, use zero?). Not often but well, like a random crash/freeze. |
|||
24 Jan 2017, 01:02 |
|
revolution 24 Jan 2017, 01:52
Apparently it "fails" when the entropy has been exhausted. Then you have to loop and try again until the entropy has been refilled. It won't ever get stuck in an infinite loop, it is not so dumb as to do that. If you are using bits at a high rate (i.e. more than ~3Gb/s) then you'll have wait a short time to let it catch up.
Last edited by revolution on 24 Jan 2017, 16:04; edited 1 time in total |
|||
24 Jan 2017, 01:52 |
|
Furs 24 Jan 2017, 15:58
Still, if the entropy is exhausted, how many times can it fail in a row? For a realtime system, getting a "spike" in CPU use can be pretty bad. If I use it to process such signals (say audio, for simplicity), I'll get drop-outs, crackles, pops etc, you know the typical audio glitches with high CPU use when it can't fill the buffer fast enough. Mind you this won't be that expensive itself but it won't be the only effect/transformation running, so keeping its CPU as low as "possible" (within reason) is quite important, even if by itself it only uses ~20% or less etc. I'm not afraid of constant fails, but big "spikes" in fails which happen rarely.
So, to get rid of that problem, I was thinking to calculate a simple "normal" PRNG alongside RDRAND, and simply XOR them together. Then either 1) not retry at all or 2) retry a specific number of times max. Even if it returns zero, I'll at least get the simple PRNG result, which shouldn't be predictable since I doubt it will return zero after failing so many times in a row. And once RDRAND returns once, the "predictable sequence" is gone. |
|||
24 Jan 2017, 15:58 |
|
revolution 24 Jan 2017, 16:07
Furs wrote: Still, if the entropy is exhausted, how many times can it fail in a row? |
|||
24 Jan 2017, 16:07 |
|
Furs 24 Jan 2017, 18:03
Well I'll have to do a really long test and "count" the times it fails in a row (max), since it's for a continuous system, so gotta keep it up for like a day or so. Still with random fails, nothing is "certain", so if I don't get one whole day more than say, 5 fails, it doesn't mean I won't get 10 fails on a bad day in a row. I doubt it depends on the app that much, since the entropy is hardware-based. I guess depends how often it is called though.
Hmm thinking of something unrelated, I wonder if it's possible to do a local denial of service in a multi-user environment (to RDRAND's quality I mean) since it's unprivileged instruction, same as /dev/urandom can be abused by users to decrease quality entropy. (nothing to do with me tho ) Anyway I'll have to implement some other things about it first which I'm working on at the moment before I can start testing, at least I got the random thing sorted |
|||
24 Jan 2017, 18:03 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.