flat assembler
Message board for the users of flat assembler.
Index
> Main > signed modulus (to match C's % operator) Goto page 1, 2 Next |
Author |
|
petelomax 11 Dec 2018, 11:25
Hi, I've writen my own language, originally using FASM but now with an inline assembler, very
much indebted to its FASM roots. It has a builtin remainder(x,y) function which is implemented as mov eax,[x]; mov ecx,[y]; cdq; idiv ecx; save edx, and a hll mod(x,y) which is implemented as return x - y * floor(x / y). The latter has served me well for several years but is perhaps not quite so ideally implemented, however it gets me the same results as C's % operator: Code: x,y: -9,-4 -9,+4 +9,-4 +9,+4 remainder(x,y): -1 -1 +1 +1 mod(x,y): -1 3 -3 +1 -- (matches C's % operator) Note that mod() yields the same result as remainder() when the signs of both arguments are the same, but the result always has the same sign as the divisor whereas the result of remainder() always has the same sign as the dividend. My floor(x/y) function is implemented by checking the signs, and if one but not both is signed then floor(-a/b) is performed as -floor((a-(b-1))/b), which I somehow doubt is the right/best thing, especially so for mod(). I cannot imagine that C jumps through such hoops to implement the % operator. I already have a working divrem function (and ~20 others), but got stuck on divmod: Code: x,y: -9,-4 -9,+4 +9,-4 +9,+4 divrem(x,y): +2,-1 -2,-1 -2,+1 +2,+1 -- ie x==y*q+r divmod(x,y): +2,-1 -3,+3 -3,-3 +2,+1 -- ie x==y*q+r[??] I know what the mod part ought to be for certain, but the quotient is a bit of a guess. In truth I don't actually need divmod(), just mod(), but I'll take it if I can get it. Perhaps an explanation of divmod (vs divrem) might help me out too. Hopefully this is a long post with a much shorter answer. Regards, Pete PS: remainder() also uses the fprem instruction when dealing with floating point numbers. PPS: my language is phix, if interested see http://phix.x10.mx |
|||
11 Dec 2018, 11:25 |
|
revolution 11 Dec 2018, 13:41
I'm not sure what your question is.
Looks like you have it all worked out. |
|||
11 Dec 2018, 13:41 |
|
petelomax 12 Dec 2018, 13:12
I guess I just wondered how other people calculate mod in assembly, rather than at least two divides, one multiply, and five subtract operations... and that's only counting the bits that differ from the "unsigned remainder" calculation.
Edit: Actually, reading the right-hand table on https://en.wikipedia.org/wiki/Modulo_operation (Integer modulo operators in various programming languages) makes me realise what a non-question this is, I naively figured mod was a standard thing. |
|||
12 Dec 2018, 13:12 |
|
petelomax 13 Dec 2018, 11:37
DimonSoft wrote: div instruction makes it all. OH REALLY? Ok then sunshine, show me how to divide -9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (-3,+3). Last edited by petelomax on 13 Dec 2018, 12:45; edited 4 times in total |
|||
13 Dec 2018, 11:37 |
|
revolution 13 Dec 2018, 11:45
petelomax wrote: Edit: Actually, reading the right-hand table on https://en.wikipedia.org/wiki/Modulo_operation (Integer modulo operators in various programming languages) makes me realise what a non-question this is, I naively figured mod was a standard thing. |
|||
13 Dec 2018, 11:45 |
|
Furs 13 Dec 2018, 15:48
petelomax wrote: OH REALLY? Ok then sunshine, show me how to divide -9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (-3,+3). Code: ; inputs mov eax, -9 mov ecx, 4 cdq mov ebx, eax idiv ecx xor ebx, ecx ; sign bit set if signs differ jns @f test edx, edx jz @f mov ebx, eax sar ebx, 31 ; all 1s if result was negative or ebx, 1 ; -1 if negative, +1 if positive add edx, ecx ; adjust modulus add eax, ebx ; adjust result @@: I'm sure you can get rid of the branches with some creativity (cmovs) Last edited by Furs on 13 Dec 2018, 15:58; edited 1 time in total |
|||
13 Dec 2018, 15:48 |
|
DimonSoft 13 Dec 2018, 15:57
petelomax wrote:
Sorry, that’s what happens as a result of reading the forum while simultaneously talking to other people. Got the impression that you use idiv and then proceed by multiplying back instead of just using the remainder. Which, as a closer look revealed, isn’t the case. No need to become furious though. To put my 5 cents on the topic and give my answer to the question I quoted, I find it quite a bad idea to rely on proper calculation of FP remainders anyway knowing the quirks of IEEE FPs, so whenever I have such a case (can’t even remember any such task except when I was a student) I just implement it the most maintainable way (differs depending on the surrounding FPU code) and make sure program logic doesn’t depend on the result. As for this petelomax wrote: I cannot imagine that C jumps through such hoops to implement the % operator. it doesn’t ’cause ISO/IEC 9899:201x, chapter 6.5.5, statement 2 wrote: Each of the operands shall have arithmetic type. The operands of the % operator shall have integer type. so for x86 it just conveniently maps to idiv and the FP version is left for the library which is a good thing since C doesn’t require implementations to use particular representations for types, especially for FP types. |
|||
13 Dec 2018, 15:57 |
|
Furs 13 Dec 2018, 16:17
Actually here's a better version as long as you don't overflow:
Code: ; inputs mov eax, -9 mov ecx, 4 ; edx = abs(divisor) - 1 mov edx, ecx mov ebx, ecx sar edx, 31 xor ebx, edx sub ebx, edx dec ebx ; change sign to match eax mov edx, eax sar edx, 31 xor ebx, edx sub ebx, edx ; adjust and divide add eax, ebx cdq idiv ecx sub edx, ebx EDIT: nevermind this doesn't work as you want it to |
|||
13 Dec 2018, 16:17 |
|
Tomasz Grysztar 13 Dec 2018, 17:24
petelomax wrote: OH REALLY? Ok then sunshine, show me how to divide -9 by +4 using the div instruction, not get an exception #C0000095, and get the results asked for (-3,+3). If you take a look at how fasmg does long division with remainder, it really has just an unsigned division routine at its core, and inserts a couple of negations for signed cases. On a side note, in the context of this board there is no such thing as exception #C0000095. The exception that DIV/IDIV can cause is the exception 0. |
|||
13 Dec 2018, 17:24 |
|
petelomax 17 Dec 2018, 01:31
DimonSoft wrote: No need to become furious though. Sorry, I did actually tone down my first response Furs wrote: Result in eax, modulus in edx. One division only, no multiplications. Excellent, exactly what I have been looking for, for ages, you are a star. |
|||
17 Dec 2018, 01:31 |
|
Furs 24 Dec 2018, 11:34
petelomax wrote:
I think it's worth it, because idiv has long latency so if there's other code mixed in here it would also be rolled back on a branch misprediction, so avoiding branches here is good for performance. Finishing some bits right now. |
|||
24 Dec 2018, 11:34 |
|
revolution 24 Dec 2018, 11:41
Furs wrote: ... avoiding branches here is good for performance. |
|||
24 Dec 2018, 11:41 |
|
Furs 24 Dec 2018, 11:41
Ok here's branchless:
Code: ; inputs mov eax, -9 mov ecx, 4 cdq mov ebx, eax idiv ecx xor ebx, ecx ; sign bit set if signs differ sar ebx, 31 ; mask of all 1s if sign bit differ, 0 otherwise test edx, edx cmovz ebx, edx ; if remainder is zero, force mask to 0 and ecx, ebx ; mask out divisor if needed add edx, ecx ; adjust modulus mov ecx, eax sar ecx, 31 ; all 1s if result was negative or ecx, 1 ; -1 if negative, +1 if positive and ecx, ebx ; mask it out if needed add eax, ecx ; adjust result ; eax = result, edx = modulus The mask basically acts as though we branched. It is either all 1s if we did not branch, or 0 if we did branch before. We use it to mask the adjustments so that the adds are effectively nops if we were to branch. Other than that it's the same method. If anyone finds a better way (faster, or using less registers, or whatever, without branches!), please share. I'm adding it to my cool & useful tricks collection. EDIT: Some optimizations can be made if you only need one or the other. If you only need the modulus: Code: ; inputs mov eax, -9 mov ecx, 4 cdq mov ebx, eax idiv ecx xor eax, eax xor ebx, ecx cmovns ecx, eax test edx, edx cmovz ecx, eax add edx, ecx ; edx = modulus Code: ; inputs mov eax, -9 mov ecx, 4 cdq mov ebx, eax idiv ecx test edx, edx cmovz ebx, ecx ; make sure the xor below has no sign set in this case cdq ; edx = -1 if result is negative, 0 otherwise or edx, 1 xor ebx, ecx mov ecx, 0 ; we can't touch the flags cmovns edx, ecx ; set to zero if signs were the same add eax, edx ; eax = result revolution wrote:
Anyway, I generally prefer branchless code unless it adds too much latency, because it's much more stable (i.e. performance always the same). There's always a potential "worst case" with branch code (maybe even malware for DoS). |
|||
24 Dec 2018, 11:41 |
|
rugxulo 05 Apr 2019, 19:56
revolution wrote:
I'm no expert, but it's not as bad as it sounds. Some of this is just the ways computers have (almost) always behaved, even if not 100% identical to typical math preferences. (So it sometimes varies by compilers, but so does almost everything.) So Turbo Pascal and Oberon-M both don't really give the "proper" answers but moreso the common answer that normal C compilers do. This is probably due to less stringent testing or standardese. But things like Python and Oxford Oberon will prefer the proper math answer by default. Forth (ANS?) has floored FM/MOD and symmetric SM/REM . Modula-2 has REM (at least in PIM4 or ISO, I think). If I had to guess motive, C (ISO 1990) presumably allowed whatever was fastest and default for specific machines to stay available on '%' operator with optional full compatibility only in library "div" or "ldiv" (etc), if you direly needed it. But that was rare (in my very limited experience, I only know one compiler that did it differently), so they made the saner way always default in C99. So yeah, it's hard to support both when you know that some rare things rely on only one specific answer. Standards ("de jure" or even "de facto") don't solve everything. It's a complicated world, but most people find ways to survive anyways. |
|||
05 Apr 2019, 19:56 |
|
revolution 06 Apr 2019, 03:23
Maybe it is just my perception of HLLs. I tend to expect them the decouple the code from the underlying machine, since that is how they are advertised. But along comes C and says you must also know the underlying machine. So to me that dilutes the whole premise of being an HLL. It is no longer portable or predictable, the programmer has to make special cases for all the potential target machines, just like we do for assembly.
Last edited by revolution on 06 Apr 2019, 14:01; edited 1 time in total |
|||
06 Apr 2019, 03:23 |
|
DimonSoft 06 Apr 2019, 13:46
revolution wrote: Maybe it is just my perception of HLLs. I tend to expect them the decouple the code from the underlying machine, since that is how they are advertised. But along comes C and says you must also know the underlying machine. So to me that dilutes the whole premise of being an HLL. It is no longer portable or predictable, the programmer has to makes special cases for all the potential target machines, just like we do for assembly. AFAIK, the “level” of language is determined by something like “average number of machine instructions per HLL instruction”. If we take this definition, then abstraction is not necessary. But the abstraction is there, so C approach is at least the case of leaky abstraction. Last edited by DimonSoft on 07 Apr 2019, 07:48; edited 1 time in total |
|||
06 Apr 2019, 13:46 |
|
revolution 07 Apr 2019, 02:46
DimonSoft wrote: But the abstraction is there, so C approach is at least the case of leaky abstraction. |
|||
07 Apr 2019, 02:46 |
|
rugxulo 07 Apr 2019, 20:03
revolution wrote: Maybe it is just my perception of HLLs. I tend to expect them the decouple the code from the underlying machine, since that is how they are advertised. But along comes C and says you must also know the underlying machine. So to me that dilutes the whole premise of being an HLL. It is no longer portable or predictable, the programmer has to make special cases for all the potential target machines, just like we do for assembly. It can be semi-portable, with effort, but even the language designers/implementors would probably agree that it's not as automatic nor as easy as it should be. Flaws accrued over time and can't all easily be removed. Nothing's perfect. (Though with proper warnings, lint, reference manuals, and an experienced guru, it can be greatly minimized.) Yes, I too am a bit cynical about HLLs, but they're not "only" for (extreme, maximum) portability. They are also for easier maintenance of code. But often they also are intentionally used for non-portable code (that needs extreme hardware-specific bits for functionality or needs extra performance in a non-standard way). Yes, you still have to workaround non-standard things, bugs, and design flaws, but overall it's better than it could be. Half the work is done, you just have to do the other 90%. |
|||
07 Apr 2019, 20:03 |
|
DimonSoft 08 Apr 2019, 11:42
rugxulo wrote: Yes, I too am a bit cynical about HLLs, but they're not "only" for (extreme, maximum) portability. They are also for easier maintenance of code. Well, in ideal world maintenance cost is not the main goal. The whole idea of keeping your code maintainable is based on the need to increase the reliability of the software product. Maintainability is just one of many ways to achieve reliability. And HLLs used to be as close to natural languages as possible to decrease the possibility of introducing bugs in the first place, maintainability has the same goal but over time. So, a HLL that gives maintainable code while introducing means to introduce bugs (caused by differences in lower-level implementations) that are difficult to spot looks like a human being who tries to put his pants on with one hand while taking them off with another. In real world maintenance cost is the main goal and reliability is the worst property of a product: the more reliable a program is the less updates and technical support services one can sell. So, everybody hates reliability. |
|||
08 Apr 2019, 11:42 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.