flat assembler
Message board for the users of flat assembler.

 flat assembler > Main > Division by multiplication. How to get the remainder?
Author
zhak

Joined: 12 Apr 2005
Posts: 489
Location: Belarus
When substituting unsigned integer division by multiplication (aka magic division)

For example,
Code:
```mov ecx, 10
div rcx    ```

can be substituted with
Code:
```mov rdx, 0xcccccccccccccccd
mul rdx
shr rdx, 3    ```

how to get the remainder?
I'm trying to adapt the method for base conversion, but cannot figure out how to retrieve the remainder of the division in an elegant way.

I first tried to reuse algorithm from this post:
Code:
```mov rsi,10
mov r8,0x199999999999999a

mul r8
mov rdi,rdx
mul rsi
mov rax,rdi
```

DL holds the remainder

But this algorithm produces incorrect results for very large numbers.
For example, 9223372036854775807 is converted to 9223372036854775808, etc.

If anyone is interested in current code, here's a compilable example:
Code:
```  mov rdx, 9223372036854775807
mov rcx, strFmt
call __dec2str

jmp \$

struct STRFMT
flags     rw 1
width     rw 1
size      rw 1
length    rb 1
precision rb 1
ends

STRFMT_WIDTH          = 0x0001
STRFMT_SIZE           = 0x0002
STRFMT_LENGTH         = 0x0004
STRFMT_PRECISION      = 0x0008
STRFMT_ALIGN_LEFT     = 0x0010
STRFMT_ALIGN_CENTER   = 0x0020
STRFMT_SIGNED_NUMBER  = 0x0100
STRFMT_SIGNED_PLUS    = 0x0200
STRFMT_SIGNED_BLANK   = 0x0400

__dec2str:
push rbp
push rdi
push rbx
push r8
push r9
push r10
mov rbp, rcx
test rdx, rdx
mov rax, rdx
mov r10, rdx
sub rsp, 56
xor ecx, ecx
lea rdi, [rsp + 48]
mov r8, 0x199999999999999a
mov ebx, 10
mov word[rdi], cx
test word[rbp + STRFMT.flags], STRFMT_SIGNED_NUMBER
jz short @f
test rax, rax
jns short @f
neg rax
@@:
mul r8
mov r9, rdx
mul rbx
mov rax, r9
or r9, rdx
or dl, 0x30
sub rdi, 2
mov [rdi], dx
inc ecx
jmp short @b

test word[rbp + STRFMT.flags], STRFMT_LENGTH
jz short .check_sign
mov al, byte[rbp + STRFMT.length]
cmp al, 20
jbe short @f
mov al, 20
@@:
sub eax, ecx
@@:
jle short .check_sign
sub rdi, 2
mov word[rdi], '0'
inc ecx
dec eax
jmp short @b

.check_sign:
test word[rbp + STRFMT.flags], STRFMT_SIGNED_NUMBER
jz short .ret
test r10, r10
mov ax, '-'
js short .apply_sign
test word[rbp + STRFMT.flags], STRFMT_SIGNED_PLUS
mov ax, '+'
jnz short .apply_sign
test word[rbp + STRFMT.flags], STRFMT_SIGNED_BLANK
jz short .ret
mov ax, ' '

.apply_sign:
sub rdi, 2
mov word[rdi], ax

.ret:
mov rdx, rdi
xor eax, eax
pop r10
pop r9
pop r8
pop rbx
pop rdi
pop rbp
ret

strFmt STRFMT STRFMT_LENGTH,0,0,0,0
```
12 Jun 2016, 20:22
system error

Joined: 01 Sep 2013
Posts: 671
zhak, that Agner Fog's version doesn't return the remainder. It returns the most significant digits in EDX after getting rid of the remainder in one pass. 456/10...... returns 45 for the first pass.
12 Jun 2016, 22:12
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 16135
Location: Hyperborea
You have to multiply the quotient by the divisor and subtract to get the remainder.

remainder = dividend - quotient * divisor

where: quotient = floor(dividend / divisor)
12 Jun 2016, 23:24
zhak

Joined: 12 Apr 2005
Posts: 489
Location: Belarus
Yeah, I thought about such approach in the first place as most obvious. But hoped there's some other magic trick for this
12 Jun 2016, 23:38
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 16135
Location: Hyperborea
I think that when you need the remainder it is probably just as efficient to use DIV. In the current batch of CPUs it is not too slow. So many people just assume it runs like a snail in molasses but in reality it is okay.
12 Jun 2016, 23:58
Mike Gonta

Joined: 26 Dec 2010
Posts: 213
revolution wrote:
You have to multiply the quotient by the divisor and subtract to get the remainder.
remainder = dividend - quotient * divisor
where: quotient = floor(dividend / divisor)
zhak wrote:
Yeah, I thought about such approach in the first place as most obvious. But hoped there's some other magic trick for this
After the initial multiplication rdx contains the quotient*8, simply save this as two temporary's and right shift one
by 2 (divide by 4) to get the quotient*2 and add together to get the quotient*10.
revolution wrote:
I think that when you need the remainder it is probably just as efficient to use DIV. In the current batch of CPUs
it is not too slow. So many people just assume it runs like a snail in molasses but in reality it is okay.
Exactly, and since internally division is done with multiplication and multiplication is done with shifts this technique
is more elegant.
fizzbuzzter

_________________
Mike Gonta
look and see - many look but few see

http://mikegonta.com
13 Jun 2016, 09:04
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 16135
Location: Hyperborea
Mike Gonta wrote:
After the initial multiplication rdx contains the quotient*8, simply save this as two temporary's and right shift one
by 2 (divide by 4) to get the quotient*2 and add together to get the quotient*10.
Before you use the value in RDX the lower three bits must be zeroed before you use it to get the final tens multiple. So you still either do the SHR by 3 and then SHL by 3, or you do AND with 0xfff...ff8. Usually it is easier to SHR by 3 to get the quotient and then two LEAs to do the multiply by 10.
13 Jun 2016, 09:11
zhak

Joined: 12 Apr 2005
Posts: 489
Location: Belarus
Thanks for the hints, guys. So I tried both div and mul variants:

First, DIV
Code:
```@@:
mov ebx, 10
xor edx, edx
div rbx
mov rbx, rax
or rbx, rdx
or dl, 0x30
sub rdi, 2
mov [rdi], dx
inc ecx
jmp short @b
```

and then MUL
Code:
```@@:
mov rbx, rax
mul r8
shr rdx, 3
mov rax, rdx
lea rdx, [rdx * 8]
lea rdx, [rdx + rax * 2]
sub rbx, rdx
mov rdx, rax
or rdx, rbx
or bl, 0x30
sub rdi, 2
mov [rdi], bx
inc ecx
jmp short @b
```

converting number 9223372036854775807 (19 iterations). Measured performance with simple RDTSC:
Code:
```cpuid
rdtsc
push rax
mov rdx, 9223372036854775807
mov rcx, strFmt
call __dec2str
cpuid
rdtsc
pop rbx
sub rax, rbx
```

On my Core i7-4790K, DIV gives an average of 1356 ticks, while MUL gives average of 1012 ticks.
Yes, it's not that much, like 18 ticks per iteration. But with huge number of divisions gain would become significant.
13 Jun 2016, 20:58
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 16135
Location: Hyperborea
The timing depends upon what the code is actually doing. Each use case will be different.

But I did notice that you load RBX with the value 10 for the DIV test, whereas you don't load R8 with its constant in the MUL test.

You can also eliminate one instruction in the MUL test:
Code:
```  mov rbx, rax
mul r8
shr rdx, 3
lea rax, [rdx * 8]
lea rax, [rax + rdx * 2]
sub rbx, rax ;RBX=remainder, RDX=quotient    ```
14 Jun 2016, 01:07
zhak

Joined: 12 Apr 2005
Posts: 489
Location: Belarus
you're right. thanks. fixed
14 Jun 2016, 06:44
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------Blog General----------------MainDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsCompiler InternalsIDE DevelopmentOS ConstructionNon-x86 architecturesHigh Level LanguagesProgramming Language DesignProjects and IdeasExamples and Tutorials Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum