flat assembler
Message board for the users of flat assembler.

flat assembler > Main > signed modulus (to match C's % operator)

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
petelomax



Joined: 19 Jan 2013
Posts: 8
Location: London
Hi, I've writen my own language, originally using FASM but now with an inline assembler, very
much indebted to its FASM roots. It has a builtin remainder(x,y) function which is implemented
as mov eax,[x]; mov ecx,[y]; cdq; idiv ecx; save edx, and a hll mod(x,y) which is implemented
as return x - y * floor(x / y). The latter has served me well for several years but is perhaps
not quite so ideally implemented, however it gets me the same results as C's % operator:

Code:
               x,y: -9,-4  -9,+4  +9,-4  +9,+4
    remainder(x,y):    -1     -1     +1     +1
          mod(x,y):    -1      3     -3     +1      -- (matches C's % operator)    


Note that mod() yields the same result as remainder() when the signs of both arguments are the
same, but the result always has the same sign as the divisor whereas the result of remainder()
always has the same sign as the dividend. My floor(x/y) function is implemented by checking the
signs, and if one but not both is signed then floor(-a/b) is performed as -floor((a-(b-1))/b),
which I somehow doubt is the right/best thing, especially so for mod(). I cannot imagine that
C jumps through such hoops to implement the % operator.

I already have a working divrem function (and ~20 others), but got stuck on divmod:

Code:
            x,y: -9,-4  -9,+4  +9,-4  +9,+4
    divrem(x,y): +2,-1  -2,-1  -2,+1  +2,+1     -- ie x==y*q+r
    divmod(x,y): +2,-1  -3,+3  -3,-3  +2,+1     -- ie x==y*q+r[??]    


I know what the mod part ought to be for certain, but the quotient is a bit of a guess.
In truth I don't actually need divmod(), just mod(), but I'll take it if I can get it.
Perhaps an explanation of divmod (vs divrem) might help me out too. Hopefully this is a long post with
a much shorter answer.

Regards, Pete
PS: remainder() also uses the fprem instruction when dealing with floating point numbers.
PPS: my language is phix, if interested see http://phix.x10.mx
Post 11 Dec 2018, 11:25
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16628
Location: In your JS exploiting you and your system
I'm not sure what your question is. Question

Looks like you have it all worked out.
Post 11 Dec 2018, 13:41
View user's profile Send private message Visit poster's website Reply with quote
petelomax



Joined: 19 Jan 2013
Posts: 8
Location: London
I guess I just wondered how other people calculate mod in assembly, rather than at least two divides, one multiply, and five subtract operations... and that's only counting the bits that differ from the "unsigned remainder" calculation.

Edit: Actually, reading the right-hand table on https://en.wikipedia.org/wiki/Modulo_operation (Integer modulo operators in various programming languages) makes me realise what a non-question this is, I naively figured mod was a standard thing.
Post 12 Dec 2018, 13:12
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 532
Location: Belarus
petelomax wrote:
I guess I just wondered how other people calculate mod in assembly, rather than at least two divides, one multiply, and five subtract operations... and that's only counting the bits that differ from the "unsigned remainder" calculation.

div instruction makes it all.
Post 12 Dec 2018, 16:57
View user's profile Send private message Visit poster's website Reply with quote
petelomax



Joined: 19 Jan 2013
Posts: 8
Location: London
DimonSoft wrote:
div instruction makes it all.


OH REALLY? Ok then sunshine, show me how to divide -9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (-3,+3).


Last edited by petelomax on 13 Dec 2018, 12:45; edited 4 times in total
Post 13 Dec 2018, 11:37
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16628
Location: In your JS exploiting you and your system
petelomax wrote:
Edit: Actually, reading the right-hand table on https://en.wikipedia.org/wiki/Modulo_operation (Integer modulo operators in various programming languages) makes me realise what a non-question this is, I naively figured mod was a standard thing.
That table makes C look so bad with the "implementation defined" qualifier. Yuck. I see that Forth has that distinction also. Confused
Post 13 Dec 2018, 11:45
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1421
petelomax wrote:
OH REALLY? Ok then sunshine, show me how to divide -9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (-3,+3).
Code:
; inputs
mov eax, -9
mov ecx, 4

cdq
mov ebx, eax
idiv ecx
xor ebx, ecx  ; sign bit set if signs differ
jns @f
test edx, edx
jz @f
mov ebx, eax
sar ebx, 31   ; all 1s if result was negative
or ebx, 1     ; -1 if negative, +1 if positive
add edx, ecx  ; adjust modulus
add eax, ebx  ; adjust result
@@:    
Result in eax, modulus in edx. One division only, no multiplications.

I'm sure you can get rid of the branches with some creativity (cmovs)


Last edited by Furs on 13 Dec 2018, 15:58; edited 1 time in total
Post 13 Dec 2018, 15:48
View user's profile Send private message Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 532
Location: Belarus
petelomax wrote:
DimonSoft wrote:
div instruction makes it all.


OH REALLY? Ok then sunshine, show me how to divide -9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (-3,+3).

Sorry, that’s what happens as a result of reading the forum while simultaneously talking to other people. Got the impression that you use idiv and then proceed by multiplying back instead of just using the remainder. Which, as a closer look revealed, isn’t the case. No need to become furious though.

To put my 5 cents on the topic and give my answer to the question I quoted, I find it quite a bad idea to rely on proper calculation of FP remainders anyway knowing the quirks of IEEE FPs, so whenever I have such a case (can’t even remember any such task except when I was a student) I just implement it the most maintainable way (differs depending on the surrounding FPU code) and make sure program logic doesn’t depend on the result.

As for this
petelomax wrote:
I cannot imagine that C jumps through such hoops to implement the % operator.

it doesn’t ’cause
ISO/IEC 9899:201x, chapter 6.5.5, statement 2 wrote:
Each of the operands shall have arithmetic type. The operands of the % operator shall have integer type.

so for x86 it just conveniently maps to idiv and the FP version is left for the library which is a good thing since C doesn’t require implementations to use particular representations for types, especially for FP types.
Post 13 Dec 2018, 15:57
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1421
Actually here's a better version as long as you don't overflow:
Code:
; inputs
mov eax, -9
mov ecx, 4

; edx = abs(divisor) - 1
mov edx, ecx
mov ebx, ecx
sar edx, 31
xor ebx, edx
sub ebx, edx
dec ebx

; change sign to match eax
mov edx, eax
sar edx, 31
xor ebx, edx
sub ebx, edx

; adjust and divide
add eax, ebx
cdq
idiv ecx
sub edx, ebx    
No branches and untested.

EDIT: nevermind this doesn't work as you want it to Wink
Post 13 Dec 2018, 16:17
View user's profile Send private message Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 7289
Location: Kraków, Poland
petelomax wrote:
OH REALLY? Ok then sunshine, show me how to divide -9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (-3,+3).
Yes, in general a single DIV (yes, the unsigned one) can be enough to cover all cases. Just detect the signs of input values and have conditional NEGs here and there. If in the end the remainder has a wrong sign, you just add or subtract the divisor and adjust the quotient appropriately (either decrease or increase by one).

If you take a look at how fasmg does long division with remainder, it really has just an unsigned division routine at its core, and inserts a couple of negations for signed cases.

On a side note, in the context of this board there is no such thing as exception #C0000095. The exception that DIV/IDIV can cause is the exception 0.
Post 13 Dec 2018, 17:24
View user's profile Send private message Visit poster's website Reply with quote
petelomax



Joined: 19 Jan 2013
Posts: 8
Location: London
DimonSoft wrote:
No need to become furious though.

Sorry, I did actually tone down my first response Wink

Furs wrote:
Result in eax, modulus in edx. One division only, no multiplications.

Excellent, exactly what I have been looking for, for ages, you are a star.
Post 17 Dec 2018, 01:31
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1421
petelomax wrote:
Furs wrote:
Result in eax, modulus in edx. One division only, no multiplications.

Excellent, exactly what I have been looking for, for ages, you are a star.
No problem Smile I actually thought about this some more and managed to find a way without branches. It uses the same logic as the original one, it just creates a mask instead of branching for the adds.

I think it's worth it, because idiv has long latency so if there's other code mixed in here it would also be rolled back on a branch misprediction, so avoiding branches here is good for performance.

Finishing some bits right now.
Post 24 Dec 2018, 11:34
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16628
Location: In your JS exploiting you and your system
Furs wrote:
... avoiding branches here is good for performance.
Well I have to say that that might be true. But modern CPUs have BTBs and other optimisations that might make code with branches perform better in some cases. As always, test it in each variation and see which gives better performance with the data sets encountered in actual operation.
Post 24 Dec 2018, 11:41
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1421
Ok here's branchless:
Code:
; inputs
mov eax, -9
mov ecx, 4

cdq
mov ebx, eax
idiv ecx
xor ebx, ecx     ; sign bit set if signs differ
sar ebx, 31      ; mask of all 1s if sign bit differ, 0 otherwise
test edx, edx
cmovz ebx, edx   ; if remainder is zero, force mask to 0

and ecx, ebx     ; mask out divisor if needed
add edx, ecx     ; adjust modulus
mov ecx, eax
sar ecx, 31      ; all 1s if result was negative
or ecx, 1        ; -1 if negative, +1 if positive
and ecx, ebx     ; mask it out if needed
add eax, ecx     ; adjust result

; eax = result, edx = modulus    
I haven't tested it thoroughly, so let me know if there's a mistake or it doesn't work for some inputs.

The mask basically acts as though we branched. It is either all 1s if we did not branch, or 0 if we did branch before. We use it to mask the adjustments so that the adds are effectively nops if we were to branch. Other than that it's the same method.

If anyone finds a better way (faster, or using less registers, or whatever, without branches!), please share. I'm adding it to my cool & useful tricks collection. Wink



EDIT: Some optimizations can be made if you only need one or the other.

If you only need the modulus:
Code:
; inputs
mov eax, -9
mov ecx, 4
cdq

mov ebx, eax
idiv ecx
xor eax, eax
xor ebx, ecx
cmovns ecx, eax
test edx, edx
cmovz ecx, eax
add edx, ecx

; edx = modulus    
and if you only need the result:
Code:
; inputs
mov eax, -9
mov ecx, 4
cdq

mov ebx, eax
idiv ecx
test edx, edx
cmovz ebx, ecx   ; make sure the xor below has no sign set in this case
cdq              ; edx = -1 if result is negative, 0 otherwise
or edx, 1
xor ebx, ecx
mov ecx, 0       ; we can't touch the flags
cmovns edx, ecx  ; set to zero if signs were the same
add eax, edx

; eax = result    


revolution wrote:
Furs wrote:
... avoiding branches here is good for performance.
Well I have to say that that might be true. But modern CPUs have BTBs and other optimisations that might make code with branches perform better in some cases. As always, test it in each variation and see which gives better performance with the data sets encountered in actual operation.
Testing isn't very reliable since the predictor varies. The 'test edx, edx' (test for remainder) is a big problem tho, since it will rollback all of the code if it guesses wrong. And idiv has very long latency.

Anyway, I generally prefer branchless code unless it adds too much latency, because it's much more stable (i.e. performance always the same). There's always a potential "worst case" with branch code (maybe even malware for DoS).
Post 24 Dec 2018, 11:41
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2335
Location: Usono (aka, USA)
revolution wrote:
petelomax wrote:
Edit: Actually, reading the right-hand table on https://en.wikipedia.org/wiki/Modulo_operation (Integer modulo operators in various programming languages) makes me realise what a non-question this is, I naively figured mod was a standard thing.
That table makes C look so bad with the "implementation defined" qualifier. Yuck. I see that Forth has that distinction also. Confused


I'm no expert, but it's not as bad as it sounds.

Some of this is just the ways computers have (almost) always behaved, even if not 100% identical to typical math preferences. (So it sometimes varies by compilers, but so does almost everything.) So Turbo Pascal and Oberon-M both don't really give the "proper" answers but moreso the common answer that normal C compilers do. This is probably due to less stringent testing or standardese. But things like Python and Oxford Oberon will prefer the proper math answer by default. Forth (ANS?) has floored FM/MOD and symmetric SM/REM . Modula-2 has REM (at least in PIM4 or ISO, I think).

If I had to guess motive, C (ISO 1990) presumably allowed whatever was fastest and default for specific machines to stay available on '%' operator with optional full compatibility only in library "div" or "ldiv" (etc), if you direly needed it. But that was rare (in my very limited experience, I only know one compiler that did it differently), so they made the saner way always default in C99.

So yeah, it's hard to support both when you know that some rare things rely on only one specific answer. Standards ("de jure" or even "de facto") don't solve everything. It's a complicated world, but most people find ways to survive anyways.
Post 05 Apr 2019, 19:56
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16628
Location: In your JS exploiting you and your system
Maybe it is just my perception of HLLs. I tend to expect them the decouple the code from the underlying machine, since that is how they are advertised. But along comes C and says you must also know the underlying machine. So to me that dilutes the whole premise of being an HLL. It is no longer portable or predictable, the programmer has to make special cases for all the potential target machines, just like we do for assembly.


Last edited by revolution on 06 Apr 2019, 14:01; edited 1 time in total
Post 06 Apr 2019, 03:23
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 532
Location: Belarus
revolution wrote:
Maybe it is just my perception of HLLs. I tend to expect them the decouple the code from the underlying machine, since that is how they are advertised. But along comes C and says you must also know the underlying machine. So to me that dilutes the whole premise of being an HLL. It is no longer portable or predictable, the programmer has to makes special cases for all the potential target machines, just like we do for assembly.

AFAIK, the “level” of language is determined by something like “average number of machine instructions per HLL instruction”. If we take this definition, then abstraction is not necessary. But the abstraction is there, so C approach is at least the case of leaky abstraction.


Last edited by DimonSoft on 07 Apr 2019, 07:48; edited 1 time in total
Post 06 Apr 2019, 13:46
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16628
Location: In your JS exploiting you and your system
DimonSoft wrote:
But the abstraction is there, so C approach is at least the case of leaky abstraction.
I think it is more than just a leaky abstraction. Rather than just taking more time or failing to compile, the basic operation has changed and the output result will be different.
Post 07 Apr 2019, 02:46
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2335
Location: Usono (aka, USA)
revolution wrote:
Maybe it is just my perception of HLLs. I tend to expect them the decouple the code from the underlying machine, since that is how they are advertised. But along comes C and says you must also know the underlying machine. So to me that dilutes the whole premise of being an HLL. It is no longer portable or predictable, the programmer has to make special cases for all the potential target machines, just like we do for assembly.


It can be semi-portable, with effort, but even the language designers/implementors would probably agree that it's not as automatic nor as easy as it should be. Flaws accrued over time and can't all easily be removed. Nothing's perfect. (Though with proper warnings, lint, reference manuals, and an experienced guru, it can be greatly minimized.)

Yes, I too am a bit cynical about HLLs, but they're not "only" for (extreme, maximum) portability. They are also for easier maintenance of code. But often they also are intentionally used for non-portable code (that needs extreme hardware-specific bits for functionality or needs extra performance in a non-standard way).

Yes, you still have to workaround non-standard things, bugs, and design flaws, but overall it's better than it could be. Half the work is done, you just have to do the other 90%. Razz
Post 07 Apr 2019, 20:03
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 532
Location: Belarus
rugxulo wrote:
Yes, I too am a bit cynical about HLLs, but they're not "only" for (extreme, maximum) portability. They are also for easier maintenance of code.

Well, in ideal world maintenance cost is not the main goal. The whole idea of keeping your code maintainable is based on the need to increase the reliability of the software product. Maintainability is just one of many ways to achieve reliability. And HLLs used to be as close to natural languages as possible to decrease the possibility of introducing bugs in the first place, maintainability has the same goal but over time. So, a HLL that gives maintainable code while introducing means to introduce bugs (caused by differences in lower-level implementations) that are difficult to spot looks like a human being who tries to put his pants on with one hand while taking them off with another.

In real world maintenance cost is the main goal and reliability is the worst property of a product: the more reliable a program is the less updates and technical support services one can sell. So, everybody hates reliability.
Post 08 Apr 2019, 11:42
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2019, Tomasz Grysztar.

Powered by rwasa.