flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
zhak 12 Jun 2016, 20:22
When substituting unsigned integer division by multiplication (aka magic division)
For example, Code: mov ecx, 10 div rcx can be substituted with Code: mov rdx, 0xcccccccccccccccd mul rdx shr rdx, 3 how to get the remainder? I'm trying to adapt the method for base conversion, but cannot figure out how to retrieve the remainder of the division in an elegant way. Thanks in advance! I first tried to reuse algorithm from this post: Code: mov rsi,10 mov r8,0x199999999999999a mul r8 mov rdi,rdx mul rsi mov rax,rdi DL holds the remainder But this algorithm produces incorrect results for very large numbers. For example, 9223372036854775807 is converted to 9223372036854775808, etc. If anyone is interested in current code, here's a compilable example: Code: mov rdx, 9223372036854775807 mov rcx, strFmt call __dec2str jmp $ struct STRFMT flags rw 1 width rw 1 size rw 1 length rb 1 precision rb 1 ends STRFMT_WIDTH = 0x0001 STRFMT_SIZE = 0x0002 STRFMT_LENGTH = 0x0004 STRFMT_PRECISION = 0x0008 STRFMT_ALIGN_LEFT = 0x0010 STRFMT_ALIGN_CENTER = 0x0020 STRFMT_SIGNED_NUMBER = 0x0100 STRFMT_SIGNED_PLUS = 0x0200 STRFMT_SIGNED_BLANK = 0x0400 __dec2str: push rbp push rdi push rbx push r8 push r9 push r10 mov rbp, rcx test rdx, rdx jz short .check_padding mov rax, rdx mov r10, rdx sub rsp, 56 xor ecx, ecx lea rdi, [rsp + 48] mov r8, 0x199999999999999a mov ebx, 10 mov word[rdi], cx test word[rbp + STRFMT.flags], STRFMT_SIGNED_NUMBER jz short @f test rax, rax jns short @f neg rax @@: mul r8 mov r9, rdx mul rbx mov rax, r9 or r9, rdx jz short .check_padding or dl, 0x30 sub rdi, 2 mov [rdi], dx inc ecx jmp short @b .check_padding: test word[rbp + STRFMT.flags], STRFMT_LENGTH jz short .check_sign mov al, byte[rbp + STRFMT.length] cmp al, 20 jbe short @f mov al, 20 @@: sub eax, ecx @@: jle short .check_sign sub rdi, 2 mov word[rdi], '0' inc ecx dec eax jmp short @b .check_sign: test word[rbp + STRFMT.flags], STRFMT_SIGNED_NUMBER jz short .ret test r10, r10 mov ax, '-' js short .apply_sign test word[rbp + STRFMT.flags], STRFMT_SIGNED_PLUS mov ax, '+' jnz short .apply_sign test word[rbp + STRFMT.flags], STRFMT_SIGNED_BLANK jz short .ret mov ax, ' ' .apply_sign: sub rdi, 2 mov word[rdi], ax .ret: mov rdx, rdi xor eax, eax add rsp, 56 pop r10 pop r9 pop r8 pop rbx pop rdi pop rbp ret strFmt STRFMT STRFMT_LENGTH,0,0,0,0 |
|||
![]() |
|
revolution 12 Jun 2016, 23:24
You have to multiply the quotient by the divisor and subtract to get the remainder.
remainder = dividend - quotient * divisor where: quotient = floor(dividend / divisor) |
|||
![]() |
|
zhak 12 Jun 2016, 23:38
Yeah, I thought about such approach in the first place as most obvious. But hoped there's some other magic trick for this
![]() |
|||
![]() |
|
revolution 12 Jun 2016, 23:58
I think that when you need the remainder it is probably just as efficient to use DIV. In the current batch of CPUs it is not too slow. So many people just assume it runs like a snail in molasses but in reality it is okay.
|
|||
![]() |
|
Mike Gonta 13 Jun 2016, 09:04
revolution wrote: You have to multiply the quotient by the divisor and subtract to get the remainder. zhak wrote: Yeah, I thought about such approach in the first place as most obvious. But hoped there's some other magic trick for this by 2 (divide by 4) to get the quotient*2 and add together to get the quotient*10. revolution wrote: I think that when you need the remainder it is probably just as efficient to use DIV. In the current batch of CPUs is more elegant. fizzbuzzter |
|||
![]() |
|
revolution 13 Jun 2016, 09:11
Mike Gonta wrote: After the initial multiplication rdx contains the quotient*8, simply save this as two temporary's and right shift one |
|||
![]() |
|
zhak 13 Jun 2016, 20:58
Thanks for the hints, guys. So I tried both div and mul variants:
First, DIV Code: @@: mov ebx, 10 xor edx, edx div rbx mov rbx, rax or rbx, rdx jz short .check_padding or dl, 0x30 sub rdi, 2 mov [rdi], dx inc ecx jmp short @b and then MUL Code: @@: mov rbx, rax mul r8 shr rdx, 3 mov rax, rdx lea rdx, [rdx * 8] lea rdx, [rdx + rax * 2] sub rbx, rdx mov rdx, rax or rdx, rbx jz short .check_padding or bl, 0x30 sub rdi, 2 mov [rdi], bx inc ecx jmp short @b converting number 9223372036854775807 (19 iterations). Measured performance with simple RDTSC: Code: cpuid rdtsc push rax mov rdx, 9223372036854775807 mov rcx, strFmt call __dec2str cpuid rdtsc pop rbx sub rax, rbx On my Core i7-4790K, DIV gives an average of 1356 ticks, while MUL gives average of 1012 ticks. Yes, it's not that much, like 18 ticks per iteration. But with huge number of divisions gain would become significant. |
|||
![]() |
|
revolution 14 Jun 2016, 01:07
The timing depends upon what the code is actually doing. Each use case will be different.
But I did notice that you load RBX with the value 10 for the DIV test, whereas you don't load R8 with its constant in the MUL test. You can also eliminate one instruction in the MUL test: Code: mov rbx, rax mul r8 shr rdx, 3 lea rax, [rdx * 8] lea rax, [rax + rdx * 2] sub rbx, rax ;RBX=remainder, RDX=quotient |
|||
![]() |
|
zhak 14 Jun 2016, 06:44
you're right. thanks. fixed
|
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.