flat assembler
Message board for the users of flat assembler.
Index
> Main > Calculate if a number is equal to 1 mod 24 without using DIV Goto page 1, 2 Next 
Author 

typedef
Hint: LEA


24 May 2014, 21:36 

magicSqr
I know how to use LEA for addition and multiplication. I can't see how it helps with my problem though.


24 May 2014, 21:50 

revolution
You can try using reciprocal multiply. Or a precomputed bitmap.
But why is div not allowed? 

24 May 2014, 22:27 

magicSqr
It's not that it's not allowed, I just wondered if there was a quicker way since the quotient is unimportant, I just need the check the remainder.
The reason for the need for speed is that this div will be used billions of times on billions of numbers. 

24 May 2014, 22:50 

revolution
Have you tried using div? Or are you just assuming it will give poor results?


24 May 2014, 22:56 

revolution
BTW: CPUs run at billions of instruction per second, so if you are only needing a billion results then it is probably not worth the effort of writing more complex code.


24 May 2014, 22:58 

magicSqr
I just assumed that div would be slower as it calculates both quotient and remainder.
My thinking was along the lines of the similar way of checking if a number is even or odd by either testing bit 0 or SHR 1 and checking the CY flag. The whole routine will be run billions of times (and probably a good few orders of magnitudes more) so any time saving would be nice 

24 May 2014, 23:14 

revolution
magicSqr wrote: I just assumed that div would be slower ... You will have to give more details about what you are doing. And preferably show us the current code you are using that has the poor performing div. Optimising for speed is very domain specific. It may be that you are chasing the wrong part of the program entirely based on a flawed assumption. Have you profiled the program yet? Have you examined the cache hit/miss rate? Is it memory bound or CPU bound or a mixture? 

24 May 2014, 23:31 

magicSqr
I think you are making more of this than I intended
As an example, is RAX = 1 mod 2 (i.e. is it odd) Code: mov rbx, 2 div rbx cmp rdx, 0 je .even .odd ' do odd stuff here ' jmp wherever .even ' do even stuff here ' jmp wherever is a lot slower than... Code: shr rax, 1 jnc .even .odd ' do odd stuff here ' jmp wherever .even ' do even stuff here ' jmp wherever So I was simply asking if anyone knew of a quicker way of finding out if RAX = 1 mod 24. I hope that's a bit clearer 

25 May 2014, 00:07 

revolution
magicSqr wrote: So I was simply asking if anyone knew of a quicker way of finding out if RAX = 1 mod 24. 

25 May 2014, 00:21 

magicSqr
ok, thanks revolution. Just wondering if, in your coding, whether you use mul and div when working with 2^x or do you use shl and shr? I started programming back in the late '80s when div, especially, *was* an awful instruction, and so got used to bitshifting instead


25 May 2014, 00:44 

revolution
I still use shifts when working with powers of two. But this is just a hangover from a time when CPUs didn't have MUL/DIV instructions. Many of my colleagues do not understand the shifts in my code and often question me about what it is doing. When I do change to MUL/DIV I usually find no significant difference in run speeds. But not all programs are the same. If you have DIV in a highly critical code path then there can be advantages to using something else. But one has to consider the associated costs of how much extra coding time and testing is required compared to how much overall saving can actually be realised. Also the extra factor of disparate systems will give different runtime results. I find that in most cases a multiGHz CPU will easily do a DIV in less time than it took to read the value from DRAM. So a small amount of prefetching can give a large increase in performance even with DIV in the critical path.
BTW: DIV is less efficient than SHR, of course. But that is an entirely different thing. 

25 May 2014, 01:25 

revolution
Here is a magic value: 12,297,829,382,473,034,411 =ceiling(2^65/3)
v = input value remainder = v  24 * ((v * magic) shr 68) 

25 May 2014, 01:51 

magicSqr
revolution wrote: ...Many of my colleagues .... Under a certain age me thinks revolution wrote:
Thanks for this 

25 May 2014, 02:34 

revolution
Note that I give no warranty, guarantee or assurance that it works, or is fit for purpose. Use at you own risk. If while using this your dog bites you and runs away then that is not my fault. Also note that when using this method not all reciprocal divisors will give correct results for all possible input values. It is fine for 24 (you can prove this mathematically) but other divisors may need extra work for large input values. Like I mentioned above, test it to make sure it does what you expect over the entire range of input values.


25 May 2014, 04:36 

edfed
use a substraction loop?
a number is divisible by 3 if the sum of it's decimal digits is divisible by 3. first shift by 3, and then, try to divide by 3. 

26 May 2014, 13:56 

r22
2 MULs, 1 SHR and 1 SUB to optimize 1 DIV may be cutting it close on modern processors, unless you can interleave those instructions with other parts of your algorithm to mitigate partial stalls.


27 May 2014, 13:54 

edfed
with a 4G dwrods lut, you can obtain a relativelly fast division. it will just take a lot of time to fill the ram with such useless values.


27 May 2014, 14:59 

revolution
r22 wrote: 2 MULs, 1 SHR and 1 SUB ... Code: shr rdx,4 lea rdx,[rdx*2+edx] ;*3 lea rdx,[rdx*8] ;*24 Code: shr rdx,4 imul rdx,24 

27 May 2014, 15:10 

Goto page 1, 2 Next < Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.