signed modulus (to match C's % operator)

Index > Main > signed modulus (to match C's % operator)

Goto page 1, 2 Next

Author

Thread

petelomax

Joined: 19 Jan 2013
Posts: 11
Location: London

petelomax 11 Dec 2018, 11:25

Hi, I've writen my own language, originally using FASM but now with an inline assembler, very
much indebted to its FASM roots. It has a builtin remainder(x,y) function which is implemented
as mov eax,[x]; mov ecx,[y]; cdq; idiv ecx; save edx, and a hll mod(x,y) which is implemented
as return x - y * floor(x / y). The latter has served me well for several years but is perhaps
not quite so ideally implemented, however it gets me the same results as C's % operator:

Code:

               x,y: -9,-4  -9,+4  +9,-4  +9,+4
    remainder(x,y):    -1     -1     +1     +1
          mod(x,y):    -1      3     -3     +1      -- (matches C's % operator)

Note that mod() yields the same result as remainder() when the signs of both arguments are the
same, but the result always has the same sign as the divisor whereas the result of remainder()
always has the same sign as the dividend. My floor(x/y) function is implemented by checking the
signs, and if one but not both is signed then floor(-a/b) is performed as -floor((a-(b-1))/b),
which I somehow doubt is the right/best thing, especially so for mod(). I cannot imagine that
C jumps through such hoops to implement the % operator.

I already have a working divrem function (and ~20 others), but got stuck on divmod:

Code:

            x,y: -9,-4  -9,+4  +9,-4  +9,+4
    divrem(x,y): +2,-1  -2,-1  -2,+1  +2,+1     -- ie x==y*q+r
    divmod(x,y): +2,-1  -3,+3  -3,-3  +2,+1     -- ie x==y*q+r[??]

I know what the mod part ought to be for certain, but the quotient is a bit of a guess.
In truth I don't actually need divmod(), just mod(), but I'll take it if I can get it.
Perhaps an explanation of divmod (vs divrem) might help me out too. Hopefully this is a long post with
a much shorter answer.

Regards, Pete
PS: remainder() also uses the fprem instruction when dealing with floating point numbers.
PPS: my language is phix, if interested see http://phix.x10.mx

11 Dec 2018, 11:25

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20689
Location: In your JS exploiting you and your system

revolution 11 Dec 2018, 13:41

I'm not sure what your question is.

Looks like you have it all worked out.

11 Dec 2018, 13:41

petelomax

Joined: 19 Jan 2013
Posts: 11
Location: London

petelomax 12 Dec 2018, 13:12

I guess I just wondered how other people calculate mod in assembly, rather than at least two divides, one multiply, and five subtract operations... and that's only counting the bits that differ from the "unsigned remainder" calculation.

Edit: Actually, reading the right-hand table on https://en.wikipedia.org/wiki/Modulo_operation (Integer modulo operators in various programming languages) makes me realise what a non-question this is, I naively figured mod was a standard thing.

12 Dec 2018, 13:12

DimonSoft

Joined: 03 Mar 2010
Posts: 1228
Location: Belarus

DimonSoft 12 Dec 2018, 16:57

petelomax wrote:

I guess I just wondered how other people calculate mod in assembly, rather than at least two divides, one multiply, and five subtract operations... and that's only counting the bits that differ from the "unsigned remainder" calculation.

div instruction makes it all.

12 Dec 2018, 16:57

petelomax

Joined: 19 Jan 2013
Posts: 11
Location: London

petelomax 13 Dec 2018, 11:37

DimonSoft wrote:

div instruction makes it all.

OH REALLY? Ok then sunshine, show me how to divide -9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (-3,+3).

Last edited by petelomax on 13 Dec 2018, 12:45; edited 4 times in total

13 Dec 2018, 11:37

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20689
Location: In your JS exploiting you and your system

revolution 13 Dec 2018, 11:45

petelomax wrote:

Edit: Actually, reading the right-hand table on https://en.wikipedia.org/wiki/Modulo_operation (Integer modulo operators in various programming languages) makes me realise what a non-question this is, I naively figured mod was a standard thing.

That table makes C look so bad with the "implementation defined" qualifier. Yuck. I see that Forth has that distinction also. Confused

13 Dec 2018, 11:45

Furs

Joined: 04 Mar 2016
Posts: 2652

Furs 13 Dec 2018, 15:48

petelomax wrote:

OH REALLY? Ok then sunshine, show me how to divide -9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (-3,+3).

Code:

; inputs
mov eax, -9
mov ecx, 4

cdq
mov ebx, eax
idiv ecx
xor ebx, ecx  ; sign bit set if signs differ
jns @f
test edx, edx
jz @f
mov ebx, eax
sar ebx, 31   ; all 1s if result was negative
or ebx, 1     ; -1 if negative, +1 if positive
add edx, ecx  ; adjust modulus
add eax, ebx  ; adjust result
@@:

Result in eax, modulus in edx. One division only, no multiplications.

I'm sure you can get rid of the branches with some creativity (cmovs)

Last edited by Furs on 13 Dec 2018, 15:58; edited 1 time in total

13 Dec 2018, 15:48

DimonSoft

Joined: 03 Mar 2010
Posts: 1228
Location: Belarus

DimonSoft 13 Dec 2018, 15:57

petelomax wrote:

DimonSoft wrote:
div instruction makes it all.

OH REALLY? Ok then sunshine, show me how to divide -9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (-3,+3).

Sorry, that’s what happens as a result of reading the forum while simultaneously talking to other people. Got the impression that you use idiv and then proceed by multiplying back instead of just using the remainder. Which, as a closer look revealed, isn’t the case. No need to become furious though.

To put my 5 cents on the topic and give my answer to the question I quoted, I find it quite a bad idea to rely on proper calculation of FP remainders anyway knowing the quirks of IEEE FPs, so whenever I have such a case (can’t even remember any such task except when I was a student) I just implement it the most maintainable way (differs depending on the surrounding FPU code) and make sure program logic doesn’t depend on the result.

As for this

petelomax wrote:

I cannot imagine that C jumps through such hoops to implement the % operator.

it doesn’t ’cause

ISO/IEC 9899:201x, chapter 6.5.5, statement 2 wrote:

Each of the operands shall have arithmetic type. The operands of the % operator shall have integer type.

so for x86 it just conveniently maps to idiv and the FP version is left for the library which is a good thing since C doesn’t require implementations to use particular representations for types, especially for FP types.

13 Dec 2018, 15:57

Furs

Joined: 04 Mar 2016
Posts: 2652

Furs 13 Dec 2018, 16:17

Actually here's a better version as long as you don't overflow:

Code:

; inputs
mov eax, -9
mov ecx, 4

; edx = abs(divisor) - 1
mov edx, ecx
mov ebx, ecx
sar edx, 31
xor ebx, edx
sub ebx, edx
dec ebx

; change sign to match eax
mov edx, eax
sar edx, 31
xor ebx, edx
sub ebx, edx

; adjust and divide
add eax, ebx
cdq
idiv ecx
sub edx, ebx

No branches and untested.

EDIT: nevermind this doesn't work as you want it to Wink

13 Dec 2018, 16:17

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 13 Dec 2018, 17:24

petelomax wrote:

OH REALLY? Ok then sunshine, show me how to divide -9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (-3,+3).

Yes, in general a single DIV (yes, the unsigned one) can be enough to cover all cases. Just detect the signs of input values and have conditional NEGs here and there. If in the end the remainder has a wrong sign, you just add or subtract the divisor and adjust the quotient appropriately (either decrease or increase by one).

If you take a look at how fasmg does long division with remainder, it really has just an unsigned division routine at its core, and inserts a couple of negations for signed cases.

On a side note, in the context of this board there is no such thing as exception #C0000095. The exception that DIV/IDIV can cause is the exception 0.

13 Dec 2018, 17:24

petelomax

Joined: 19 Jan 2013
Posts: 11
Location: London

petelomax 17 Dec 2018, 01:31

DimonSoft wrote:

No need to become furious though.

Sorry, I did actually tone down my first response Wink

Furs wrote:

Result in eax, modulus in edx. One division only, no multiplications.

Excellent, exactly what I have been looking for, for ages, you are a star.

17 Dec 2018, 01:31

Furs

Joined: 04 Mar 2016
Posts: 2652

Furs 24 Dec 2018, 11:34

petelomax wrote:

Furs wrote:
Result in eax, modulus in edx. One division only, no multiplications.

Excellent, exactly what I have been looking for, for ages, you are a star.

No problem Smile

I actually thought about this some more and managed to find a way without branches. It uses the same logic as the original one, it just creates a mask instead of branching for the adds.

I think it's worth it, because idiv has long latency so if there's other code mixed in here it would also be rolled back on a branch misprediction, so avoiding branches here is good for performance.

Finishing some bits right now.

24 Dec 2018, 11:34

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20689
Location: In your JS exploiting you and your system

revolution 24 Dec 2018, 11:41

Furs wrote:

... avoiding branches here is good for performance.

Well I have to say that that might be true. But modern CPUs have BTBs and other optimisations that might make code with branches perform better in some cases. As always, test it in each variation and see which gives better performance with the data sets encountered in actual operation.

24 Dec 2018, 11:41

Furs

Joined: 04 Mar 2016
Posts: 2652

Furs 24 Dec 2018, 11:41

Ok here's branchless:

Code:

; inputs
mov eax, -9
mov ecx, 4

cdq
mov ebx, eax
idiv ecx
xor ebx, ecx     ; sign bit set if signs differ
sar ebx, 31      ; mask of all 1s if sign bit differ, 0 otherwise
test edx, edx
cmovz ebx, edx   ; if remainder is zero, force mask to 0

and ecx, ebx     ; mask out divisor if needed
add edx, ecx     ; adjust modulus
mov ecx, eax
sar ecx, 31      ; all 1s if result was negative
or ecx, 1        ; -1 if negative, +1 if positive
and ecx, ebx     ; mask it out if needed
add eax, ecx     ; adjust result

; eax = result, edx = modulus

I haven't tested it thoroughly, so let me know if there's a mistake or it doesn't work for some inputs.

The mask basically acts as though we branched. It is either all 1s if we did not branch, or 0 if we did branch before. We use it to mask the adjustments so that the adds are effectively nops if we were to branch. Other than that it's the same method.

If anyone finds a better way (faster, or using less registers, or whatever, without branches!), please share. I'm adding it to my cool & useful tricks collection. Wink

EDIT: Some optimizations can be made if you only need one or the other.

If you only need the modulus:

Code:

; inputs
mov eax, -9
mov ecx, 4
cdq

mov ebx, eax
idiv ecx
xor eax, eax
xor ebx, ecx
cmovns ecx, eax
test edx, edx
cmovz ecx, eax
add edx, ecx

; edx = modulus

and if you only need the result:

Code:

; inputs
mov eax, -9
mov ecx, 4
cdq

mov ebx, eax
idiv ecx
test edx, edx
cmovz ebx, ecx   ; make sure the xor below has no sign set in this case
cdq              ; edx = -1 if result is negative, 0 otherwise
or edx, 1
xor ebx, ecx
mov ecx, 0       ; we can't touch the flags
cmovns edx, ecx  ; set to zero if signs were the same
add eax, edx

; eax = result

revolution wrote:

Furs wrote:
... avoiding branches here is good for performance.
Well I have to say that that might be true. But modern CPUs have BTBs and other optimisations that might make code with branches perform better in some cases. As always, test it in each variation and see which gives better performance with the data sets encountered in actual operation.

Testing isn't very reliable since the predictor varies. The 'test edx, edx' (test for remainder) is a big problem tho, since it will rollback all of the code if it guesses wrong. And idiv has very long latency.

Anyway, I generally prefer branchless code unless it adds too much latency, because it's much more stable (i.e. performance always the same). There's always a potential "worst case" with branch code (maybe even malware for DoS).

24 Dec 2018, 11:41

rugxulo

Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)

rugxulo 05 Apr 2019, 19:56

revolution wrote:

petelomax wrote:
Edit: Actually, reading the right-hand table on https://en.wikipedia.org/wiki/Modulo_operation (Integer modulo operators in various programming languages) makes me realise what a non-question this is, I naively figured mod was a standard thing.
That table makes C look so bad with the "implementation defined" qualifier. Yuck. I see that Forth has that distinction also.

I'm no expert, but it's not as bad as it sounds.

Some of this is just the ways computers have (almost) always behaved, even if not 100% identical to typical math preferences. (So it sometimes varies by compilers, but so does almost everything.) So Turbo Pascal and Oberon-M both don't really give the "proper" answers but moreso the common answer that normal C compilers do. This is probably due to less stringent testing or standardese. But things like Python and Oxford Oberon will prefer the proper math answer by default. Forth (ANS?) has floored FM/MOD and symmetric SM/REM . Modula-2 has REM (at least in PIM4 or ISO, I think).

If I had to guess motive, C (ISO 1990) presumably allowed whatever was fastest and default for specific machines to stay available on '%' operator with optional full compatibility only in library "div" or "ldiv" (etc), if you direly needed it. But that was rare (in my very limited experience, I only know one compiler that did it differently), so they made the saner way always default in C99.

So yeah, it's hard to support both when you know that some rare things rely on only one specific answer. Standards ("de jure" or even "de facto") don't solve everything. It's a complicated world, but most people find ways to survive anyways.

05 Apr 2019, 19:56

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20689
Location: In your JS exploiting you and your system

revolution 06 Apr 2019, 03:23

Maybe it is just my perception of HLLs. I tend to expect them the decouple the code from the underlying machine, since that is how they are advertised. But along comes C and says you must also know the underlying machine. So to me that dilutes the whole premise of being an HLL. It is no longer portable or predictable, the programmer has to make special cases for all the potential target machines, just like we do for assembly.

Last edited by revolution on 06 Apr 2019, 14:01; edited 1 time in total

06 Apr 2019, 03:23

DimonSoft

Joined: 03 Mar 2010
Posts: 1228
Location: Belarus

DimonSoft 06 Apr 2019, 13:46

revolution wrote:

Maybe it is just my perception of HLLs. I tend to expect them the decouple the code from the underlying machine, since that is how they are advertised. But along comes C and says you must also know the underlying machine. So to me that dilutes the whole premise of being an HLL. It is no longer portable or predictable, the programmer has to makes special cases for all the potential target machines, just like we do for assembly.

AFAIK, the “level” of language is determined by something like “average number of machine instructions per HLL instruction”. If we take this definition, then abstraction is not necessary. But the abstraction is there, so C approach is at least the case of leaky abstraction.

Last edited by DimonSoft on 07 Apr 2019, 07:48; edited 1 time in total

06 Apr 2019, 13:46

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20689
Location: In your JS exploiting you and your system

revolution 07 Apr 2019, 02:46

DimonSoft wrote:

But the abstraction is there, so C approach is at least the case of leaky abstraction.

I think it is more than just a leaky abstraction. Rather than just taking more time or failing to compile, the basic operation has changed and the output result will be different.

07 Apr 2019, 02:46

rugxulo

Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)

rugxulo 07 Apr 2019, 20:03

revolution wrote:

Maybe it is just my perception of HLLs. I tend to expect them the decouple the code from the underlying machine, since that is how they are advertised. But along comes C and says you must also know the underlying machine. So to me that dilutes the whole premise of being an HLL. It is no longer portable or predictable, the programmer has to make special cases for all the potential target machines, just like we do for assembly.

It can be semi-portable, with effort, but even the language designers/implementors would probably agree that it's not as automatic nor as easy as it should be. Flaws accrued over time and can't all easily be removed. Nothing's perfect. (Though with proper warnings, lint, reference manuals, and an experienced guru, it can be greatly minimized.)

Yes, I too am a bit cynical about HLLs, but they're not "only" for (extreme, maximum) portability. They are also for easier maintenance of code. But often they also are intentionally used for non-portable code (that needs extreme hardware-specific bits for functionality or needs extra performance in a non-standard way).

Yes, you still have to workaround non-standard things, bugs, and design flaws, but overall it's better than it could be. Half the work is done, you just have to do the other 90%. Razz

07 Apr 2019, 20:03

DimonSoft

Joined: 03 Mar 2010
Posts: 1228
Location: Belarus

DimonSoft 08 Apr 2019, 11:42

rugxulo wrote:

Yes, I too am a bit cynical about HLLs, but they're not "only" for (extreme, maximum) portability. They are also for easier maintenance of code.

Well, in ideal world maintenance cost is not the main goal. The whole idea of keeping your code maintainable is based on the need to increase the reliability of the software product. Maintainability is just one of many ways to achieve reliability. And HLLs used to be as close to natural languages as possible to decrease the possibility of introducing bugs in the first place, maintainability has the same goal but over time. So, a HLL that gives maintainable code while introducing means to introduce bugs (caused by differences in lower-level implementations) that are difficult to spot looks like a human being who tries to put his pants on with one hand while taking them off with another.

In real world maintenance cost is the main goal and reliability is the worst property of a product: the more reliable a program is the less updates and technical support services one can sell. So, everybody hates reliability.

08 Apr 2019, 11:42

Goto page 1, 2 Next

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum