flat assembler
Message board for the users of flat assembler.
flat assembler
> Main > signed modulus (to match C's % operator) 
Author 

Hi, I've writen my own language, originally using FASM but now with an inline assembler, very
much indebted to its FASM roots. It has a builtin remainder(x,y) function which is implemented as mov eax,[x]; mov ecx,[y]; cdq; idiv ecx; save edx, and a hll mod(x,y) which is implemented as return x  y * floor(x / y). The latter has served me well for several years but is perhaps not quite so ideally implemented, however it gets me the same results as C's % operator: Code: x,y: 9,4 9,+4 +9,4 +9,+4 remainder(x,y): 1 1 +1 +1 mod(x,y): 1 3 3 +1  (matches C's % operator) Note that mod() yields the same result as remainder() when the signs of both arguments are the same, but the result always has the same sign as the divisor whereas the result of remainder() always has the same sign as the dividend. My floor(x/y) function is implemented by checking the signs, and if one but not both is signed then floor(a/b) is performed as floor((a(b1))/b), which I somehow doubt is the right/best thing, especially so for mod(). I cannot imagine that C jumps through such hoops to implement the % operator. I already have a working divrem function (and ~20 others), but got stuck on divmod: Code: x,y: 9,4 9,+4 +9,4 +9,+4 divrem(x,y): +2,1 2,1 2,+1 +2,+1  ie x==y*q+r divmod(x,y): +2,1 3,+3 3,3 +2,+1  ie x==y*q+r[??] I know what the mod part ought to be for certain, but the quotient is a bit of a guess. In truth I don't actually need divmod(), just mod(), but I'll take it if I can get it. Perhaps an explanation of divmod (vs divrem) might help me out too. Hopefully this is a long post with a much shorter answer. Regards, Pete PS: remainder() also uses the fprem instruction when dealing with floating point numbers. PPS: my language is phix, if interested see http://phix.x10.mx 

11 Dec 2018, 11:25 

I'm not sure what your question is.
Looks like you have it all worked out. 

11 Dec 2018, 13:41 

I guess I just wondered how other people calculate mod in assembly, rather than at least two divides, one multiply, and five subtract operations... and that's only counting the bits that differ from the "unsigned remainder" calculation.
Edit: Actually, reading the righthand table on https://en.wikipedia.org/wiki/Modulo_operation (Integer modulo operators in various programming languages) makes me realise what a nonquestion this is, I naively figured mod was a standard thing. 

12 Dec 2018, 13:12 

DimonSoft wrote: div instruction makes it all. OH REALLY? Ok then sunshine, show me how to divide 9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (3,+3). Last edited by petelomax on 13 Dec 2018, 12:45; edited 4 times in total 

13 Dec 2018, 11:37 

petelomax wrote: Edit: Actually, reading the righthand table on https://en.wikipedia.org/wiki/Modulo_operation (Integer modulo operators in various programming languages) makes me realise what a nonquestion this is, I naively figured mod was a standard thing. 

13 Dec 2018, 11:45 

petelomax wrote: OH REALLY? Ok then sunshine, show me how to divide 9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (3,+3). Code: ; inputs mov eax, 9 mov ecx, 4 cdq mov ebx, eax idiv ecx xor ebx, ecx ; sign bit set if signs differ jns @f test edx, edx jz @f mov ebx, eax sar ebx, 31 ; all 1s if result was negative or ebx, 1 ; 1 if negative, +1 if positive add edx, ecx ; adjust modulus add eax, ebx ; adjust result @@: I'm sure you can get rid of the branches with some creativity (cmovs) Last edited by Furs on 13 Dec 2018, 15:58; edited 1 time in total 

13 Dec 2018, 15:48 

petelomax wrote:
Sorry, that’s what happens as a result of reading the forum while simultaneously talking to other people. Got the impression that you use idiv and then proceed by multiplying back instead of just using the remainder. Which, as a closer look revealed, isn’t the case. No need to become furious though. To put my 5 cents on the topic and give my answer to the question I quoted, I find it quite a bad idea to rely on proper calculation of FP remainders anyway knowing the quirks of IEEE FPs, so whenever I have such a case (can’t even remember any such task except when I was a student) I just implement it the most maintainable way (differs depending on the surrounding FPU code) and make sure program logic doesn’t depend on the result. As for this petelomax wrote: I cannot imagine that C jumps through such hoops to implement the % operator. it doesn’t ’cause ISO/IEC 9899:201x, chapter 6.5.5, statement 2 wrote: Each of the operands shall have arithmetic type. The operands of the % operator shall have integer type. so for x86 it just conveniently maps to idiv and the FP version is left for the library which is a good thing since C doesn’t require implementations to use particular representations for types, especially for FP types. 

13 Dec 2018, 15:57 

Actually here's a better version as long as you don't overflow:
Code: ; inputs mov eax, 9 mov ecx, 4 ; edx = abs(divisor)  1 mov edx, ecx mov ebx, ecx sar edx, 31 xor ebx, edx sub ebx, edx dec ebx ; change sign to match eax mov edx, eax sar edx, 31 xor ebx, edx sub ebx, edx ; adjust and divide add eax, ebx cdq idiv ecx sub edx, ebx EDIT: nevermind this doesn't work as you want it to 

13 Dec 2018, 16:17 

petelomax wrote: OH REALLY? Ok then sunshine, show me how to divide 9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (3,+3). If you take a look at how fasmg does long division with remainder, it really has just an unsigned division routine at its core, and inserts a couple of negations for signed cases. On a side note, in the context of this board there is no such thing as exception #C0000095. The exception that DIV/IDIV can cause is the exception 0. 

13 Dec 2018, 17:24 

DimonSoft wrote: No need to become furious though. Sorry, I did actually tone down my first response Furs wrote: Result in eax, modulus in edx. One division only, no multiplications. Excellent, exactly what I have been looking for, for ages, you are a star. 

17 Dec 2018, 01:31 

petelomax wrote:
I think it's worth it, because idiv has long latency so if there's other code mixed in here it would also be rolled back on a branch misprediction, so avoiding branches here is good for performance. Finishing some bits right now. 

24 Dec 2018, 11:34 

Furs wrote: ... avoiding branches here is good for performance. 

24 Dec 2018, 11:41 

Ok here's branchless:
Code: ; inputs mov eax, 9 mov ecx, 4 cdq mov ebx, eax idiv ecx xor ebx, ecx ; sign bit set if signs differ sar ebx, 31 ; mask of all 1s if sign bit differ, 0 otherwise test edx, edx cmovz ebx, edx ; if remainder is zero, force mask to 0 and ecx, ebx ; mask out divisor if needed add edx, ecx ; adjust modulus mov ecx, eax sar ecx, 31 ; all 1s if result was negative or ecx, 1 ; 1 if negative, +1 if positive and ecx, ebx ; mask it out if needed add eax, ecx ; adjust result ; eax = result, edx = modulus The mask basically acts as though we branched. It is either all 1s if we did not branch, or 0 if we did branch before. We use it to mask the adjustments so that the adds are effectively nops if we were to branch. Other than that it's the same method. If anyone finds a better way (faster, or using less registers, or whatever, without branches!), please share. I'm adding it to my cool & useful tricks collection. EDIT: Some optimizations can be made if you only need one or the other. If you only need the modulus: Code: ; inputs mov eax, 9 mov ecx, 4 cdq mov ebx, eax idiv ecx xor eax, eax xor ebx, ecx cmovns ecx, eax test edx, edx cmovz ecx, eax add edx, ecx ; edx = modulus Code: ; inputs mov eax, 9 mov ecx, 4 cdq mov ebx, eax idiv ecx test edx, edx cmovz ebx, ecx ; make sure the xor below has no sign set in this case cdq ; edx = 1 if result is negative, 0 otherwise or edx, 1 xor ebx, ecx mov ecx, 0 ; we can't touch the flags cmovns edx, ecx ; set to zero if signs were the same add eax, edx ; eax = result revolution wrote:
Anyway, I generally prefer branchless code unless it adds too much latency, because it's much more stable (i.e. performance always the same). There's always a potential "worst case" with branch code (maybe even malware for DoS). 

24 Dec 2018, 11:41 

< Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992019, Tomasz Grysztar.
Powered by rwasa.