flat assembler
Message board for the users of flat assembler.
![]() Goto page 1, 2 Next |
Author |
|
typedef 24 May 2014, 21:36
Hint: LEA
|
|||
![]() |
|
magicSqr 24 May 2014, 21:50
I know how to use LEA for addition and multiplication. I can't see how it helps with my problem though.
|
|||
![]() |
|
revolution 24 May 2014, 22:27
You can try using reciprocal multiply. Or a precomputed bitmap.
But why is div not allowed? |
|||
![]() |
|
magicSqr 24 May 2014, 22:50
It's not that it's not allowed, I just wondered if there was a quicker way since the quotient is unimportant, I just need the check the remainder.
The reason for the need for speed is that this div will be used billions of times on billions of numbers. |
|||
![]() |
|
revolution 24 May 2014, 22:56
Have you tried using div? Or are you just assuming it will give poor results?
|
|||
![]() |
|
revolution 24 May 2014, 22:58
BTW: CPUs run at billions of instruction per second, so if you are only needing a billion results then it is probably not worth the effort of writing more complex code.
|
|||
![]() |
|
magicSqr 24 May 2014, 23:14
I just assumed that div would be slower as it calculates both quotient and remainder.
My thinking was along the lines of the similar way of checking if a number is even or odd by either testing bit 0 or SHR 1 and checking the CY flag. The whole routine will be run billions of times (and probably a good few orders of magnitudes more) so any time saving would be nice ![]() |
|||
![]() |
|
revolution 24 May 2014, 23:31
magicSqr wrote: I just assumed that div would be slower ... You will have to give more details about what you are doing. And preferably show us the current code you are using that has the poor performing div. Optimising for speed is very domain specific. It may be that you are chasing the wrong part of the program entirely based on a flawed assumption. Have you profiled the program yet? Have you examined the cache hit/miss rate? Is it memory bound or CPU bound or a mixture? |
|||
![]() |
|
magicSqr 25 May 2014, 00:07
I think you are making more of this than I intended
![]() As an example, is RAX = 1 mod 2 (i.e. is it odd) Code: mov rbx, 2 div rbx cmp rdx, 0 je .even .odd ' do odd stuff here ' jmp wherever .even ' do even stuff here ' jmp wherever is a lot slower than... Code: shr rax, 1 jnc .even .odd ' do odd stuff here ' jmp wherever .even ' do even stuff here ' jmp wherever So I was simply asking if anyone knew of a quicker way of finding out if RAX = 1 mod 24. I hope that's a bit clearer ![]() |
|||
![]() |
|
revolution 25 May 2014, 00:21
magicSqr wrote: So I was simply asking if anyone knew of a quicker way of finding out if RAX = 1 mod 24. |
|||
![]() |
|
magicSqr 25 May 2014, 00:44
ok, thanks revolution. Just wondering if, in your coding, whether you use mul and div when working with 2^x or do you use shl and shr? I started programming back in the late '80s when div, especially, *was* an awful instruction, and so got used to bit-shifting instead
![]() |
|||
![]() |
|
revolution 25 May 2014, 01:25
I still use shifts when working with powers of two. But this is just a hangover from a time when CPUs didn't have MUL/DIV instructions. Many of my colleagues do not understand the shifts in my code and often question me about what it is doing. When I do change to MUL/DIV I usually find no significant difference in run speeds. But not all programs are the same. If you have DIV in a highly critical code path then there can be advantages to using something else. But one has to consider the associated costs of how much extra coding time and testing is required compared to how much overall saving can actually be realised. Also the extra factor of disparate systems will give different runtime results. I find that in most cases a multi-GHz CPU will easily do a DIV in less time than it took to read the value from DRAM. So a small amount of prefetching can give a large increase in performance even with DIV in the critical path.
BTW: DIV is less efficient than SHR, of course. But that is an entirely different thing. |
|||
![]() |
|
revolution 25 May 2014, 01:51
Here is a magic value: 12,297,829,382,473,034,411 =ceiling(2^65/3)
v = input value remainder = v - 24 * ((v * magic) shr 68) |
|||
![]() |
|
magicSqr 25 May 2014, 02:34
revolution wrote: ...Many of my colleagues .... Under a certain age me thinks ![]() revolution wrote:
Thanks for this ![]() |
|||
![]() |
|
revolution 25 May 2014, 04:36
Note that I give no warranty, guarantee or assurance that it works, or is fit for purpose. Use at you own risk. If while using this your dog bites you and runs away then that is not my fault. Also note that when using this method not all reciprocal divisors will give correct results for all possible input values. It is fine for 24 (you can prove this mathematically) but other divisors may need extra work for large input values. Like I mentioned above, test it to make sure it does what you expect over the entire range of input values.
|
|||
![]() |
|
edfed 26 May 2014, 13:56
use a substraction loop?
a number is divisible by 3 if the sum of it's decimal digits is divisible by 3. first shift by 3, and then, try to divide by 3. |
|||
![]() |
|
r22 27 May 2014, 13:54
2 MULs, 1 SHR and 1 SUB to optimize 1 DIV may be cutting it close on modern processors, unless you can interleave those instructions with other parts of your algorithm to mitigate partial stalls.
|
|||
![]() |
|
edfed 27 May 2014, 14:59
with a 4G dwrods lut, you can obtain a relativelly fast division. it will just take a lot of time to fill the ram with such useless values.
|
|||
![]() |
|
revolution 27 May 2014, 15:10
r22 wrote: 2 MULs, 1 SHR and 1 SUB ... Code: shr rdx,4 lea rdx,[rdx*2+edx] ;*3 lea rdx,[rdx*8] ;*24 Code: shr rdx,4 imul rdx,24 |
|||
![]() |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.