Division by multiplication. How to get the remainder?

Index > Main > Division by multiplication. How to get the remainder?

Author

Thread

zhak

Joined: 12 Apr 2005
Posts: 501
Location: Belarus

zhak 12 Jun 2016, 20:22

When substituting unsigned integer division by multiplication (aka magic division)

For example,

Code:

mov ecx, 10
div rcx

can be substituted with

Code:

mov rdx, 0xcccccccccccccccd
mul rdx
shr rdx, 3

how to get the remainder?
I'm trying to adapt the method for base conversion, but cannot figure out how to retrieve the remainder of the division in an elegant way.

Thanks in advance!

I first tried to reuse algorithm from this post:

Code:

mov rsi,10
mov r8,0x199999999999999a

mul r8
mov rdi,rdx
mul rsi
mov rax,rdi

DL holds the remainder

But this algorithm produces incorrect results for very large numbers.
For example, 9223372036854775807 is converted to 9223372036854775808, etc.

If anyone is interested in current code, here's a compilable example:

Code:

  mov rdx, 9223372036854775807
  mov rcx, strFmt
  call __dec2str
  
  jmp $

struct STRFMT
  flags     rw 1
  width     rw 1
  size      rw 1
  length    rb 1
  precision rb 1
ends

STRFMT_WIDTH          = 0x0001
STRFMT_SIZE           = 0x0002
STRFMT_LENGTH         = 0x0004
STRFMT_PRECISION      = 0x0008
STRFMT_ALIGN_LEFT     = 0x0010
STRFMT_ALIGN_CENTER   = 0x0020
STRFMT_SIGNED_NUMBER  = 0x0100
STRFMT_SIGNED_PLUS    = 0x0200
STRFMT_SIGNED_BLANK   = 0x0400
  
__dec2str:
  push rbp
  push rdi
  push rbx
  push r8
  push r9
  push r10
  mov rbp, rcx
  test rdx, rdx
  jz short .check_padding
  mov rax, rdx
  mov r10, rdx
  sub rsp, 56
  xor ecx, ecx
  lea rdi, [rsp + 48]
  mov r8, 0x199999999999999a
  mov ebx, 10
  mov word[rdi], cx
  test word[rbp + STRFMT.flags], STRFMT_SIGNED_NUMBER
  jz short @f
  test rax, rax
  jns short @f
  neg rax
  @@:
  mul r8
  mov r9, rdx
  mul rbx
  mov rax, r9
  or r9, rdx
  jz short .check_padding
  or dl, 0x30
  sub rdi, 2
  mov [rdi], dx
  inc ecx
  jmp short @b
 
 .check_padding:
  test word[rbp + STRFMT.flags], STRFMT_LENGTH
  jz short .check_sign
  mov al, byte[rbp + STRFMT.length]
  cmp al, 20
  jbe short @f
  mov al, 20
  @@:
  sub eax, ecx
  @@:
  jle short .check_sign
  sub rdi, 2
  mov word[rdi], '0'
  inc ecx
  dec eax
  jmp short @b
  
 .check_sign:
  test word[rbp + STRFMT.flags], STRFMT_SIGNED_NUMBER
  jz short .ret
  test r10, r10
  mov ax, '-'
  js short .apply_sign
  test word[rbp + STRFMT.flags], STRFMT_SIGNED_PLUS
  mov ax, '+'
  jnz short .apply_sign
  test word[rbp + STRFMT.flags], STRFMT_SIGNED_BLANK
  jz short .ret
  mov ax, ' '
  
 .apply_sign:
  sub rdi, 2
  mov word[rdi], ax
  
 .ret:
  mov rdx, rdi
  xor eax, eax
  add rsp, 56
  pop r10
  pop r9
  pop r8
  pop rbx
  pop rdi
  pop rbp
  ret

strFmt STRFMT STRFMT_LENGTH,0,0,0,0

12 Jun 2016, 20:22

system error

Joined: 01 Sep 2013
Posts: 670

system error 12 Jun 2016, 22:12

zhak, that Agner Fog's version doesn't return the remainder. It returns the most significant digits in EDX after getting rid of the remainder in one pass. 456/10...... returns 45 for the first pass.

12 Jun 2016, 22:12

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20804
Location: In your JS exploiting you and your system

revolution 12 Jun 2016, 23:24

You have to multiply the quotient by the divisor and subtract to get the remainder.

remainder = dividend - quotient * divisor

where: quotient = floor(dividend / divisor)

12 Jun 2016, 23:24

zhak

Joined: 12 Apr 2005
Posts: 501
Location: Belarus

zhak 12 Jun 2016, 23:38

Yeah, I thought about such approach in the first place as most obvious. But hoped there's some other magic trick for this Smile

12 Jun 2016, 23:38

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20804
Location: In your JS exploiting you and your system

revolution 12 Jun 2016, 23:58

I think that when you need the remainder it is probably just as efficient to use DIV. In the current batch of CPUs it is not too slow. So many people just assume it runs like a snail in molasses but in reality it is okay.

12 Jun 2016, 23:58

Mike Gonta

Joined: 26 Dec 2010
Posts: 245

Mike Gonta 13 Jun 2016, 09:04

revolution wrote:

You have to multiply the quotient by the divisor and subtract to get the remainder.
remainder = dividend - quotient * divisor
where: quotient = floor(dividend / divisor)

zhak wrote:

Yeah, I thought about such approach in the first place as most obvious. But hoped there's some other magic trick for this

After the initial multiplication rdx contains the quotient*8, simply save this as two temporary's and right shift one
by 2 (divide by 4) to get the quotient*2 and add together to get the quotient*10.

revolution wrote:

I think that when you need the remainder it is probably just as efficient to use DIV. In the current batch of CPUs
it is not too slow. So many people just assume it runs like a snail in molasses but in reality it is okay.

Exactly, and since internally division is done with multiplication and multiplication is done with shifts this technique
is more elegant.
fizzbuzzter

_________________
Mike Gonta
look and see - many look but few see

https://mikegonta.com

13 Jun 2016, 09:04

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20804
Location: In your JS exploiting you and your system

revolution 13 Jun 2016, 09:11

Mike Gonta wrote:

After the initial multiplication rdx contains the quotient*8, simply save this as two temporary's and right shift one
by 2 (divide by 4) to get the quotient*2 and add together to get the quotient*10.

Before you use the value in RDX the lower three bits must be zeroed before you use it to get the final tens multiple. So you still either do the SHR by 3 and then SHL by 3, or you do AND with 0xfff...ff8. Usually it is easier to SHR by 3 to get the quotient and then two LEAs to do the multiply by 10.

13 Jun 2016, 09:11

zhak

Joined: 12 Apr 2005
Posts: 501
Location: Belarus

zhak 13 Jun 2016, 20:58

Thanks for the hints, guys. So I tried both div and mul variants:

First, DIV

Code:

@@:
  mov ebx, 10
  xor edx, edx
  div rbx
  mov rbx, rax
  or rbx, rdx
  jz short .check_padding
  or dl, 0x30
  sub rdi, 2
  mov [rdi], dx
  inc ecx
  jmp short @b

and then MUL

Code:

@@:
  mov rbx, rax
  mul r8
  shr rdx, 3
  mov rax, rdx
  lea rdx, [rdx * 8]
  lea rdx, [rdx + rax * 2]
  sub rbx, rdx
  mov rdx, rax
  or rdx, rbx
  jz short .check_padding
  or bl, 0x30
  sub rdi, 2
  mov [rdi], bx
  inc ecx
  jmp short @b

converting number 9223372036854775807 (19 iterations). Measured performance with simple RDTSC:

Code:

cpuid
  rdtsc
  push rax
  mov rdx, 9223372036854775807
  mov rcx, strFmt
  call __dec2str
  cpuid
  rdtsc
  pop rbx
  sub rax, rbx

On my Core i7-4790K, DIV gives an average of 1356 ticks, while MUL gives average of 1012 ticks.
Yes, it's not that much, like 18 ticks per iteration. But with huge number of divisions gain would become significant.

13 Jun 2016, 20:58

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20804
Location: In your JS exploiting you and your system

revolution 14 Jun 2016, 01:07

The timing depends upon what the code is actually doing. Each use case will be different.

But I did notice that you load RBX with the value 10 for the DIV test, whereas you don't load R8 with its constant in the MUL test.

You can also eliminate one instruction in the MUL test:

Code:

  mov rbx, rax
  mul r8
  shr rdx, 3
  lea rax, [rdx * 8]
  lea rax, [rax + rdx * 2]
  sub rbx, rax ;RBX=remainder, RDX=quotient

14 Jun 2016, 01:07

zhak

Joined: 12 Apr 2005
Posts: 501
Location: Belarus

zhak 14 Jun 2016, 06:44

you're right. thanks. fixed

14 Jun 2016, 06:44

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum