flat assembler
Message board for the users of flat assembler.

 Index > Main > Calculate if a number is equal to 1 mod 24 without using DIV Goto page 1, 2  Next
Author
magicSqr

Joined: 27 Aug 2011
Posts: 105
magicSqr
Hi,

Just a quickie (I hope)...

I need to calculate if a number is equal to 1 mod 24.

Is there any optimized code for this rather than using DIV and checking RDX?

I do not require the quotient.

many thanks,

magicĀ²
24 May 2014, 17:09
typedef

Joined: 25 Jul 2010
Posts: 2914
Location: 0x77760000
typedef
Hint: LEA
24 May 2014, 21:36
magicSqr

Joined: 27 Aug 2011
Posts: 105
magicSqr
I know how to use LEA for addition and multiplication. I can't see how it helps with my problem though.
24 May 2014, 21:50
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17893
revolution
You can try using reciprocal multiply. Or a precomputed bitmap.

But why is div not allowed?
24 May 2014, 22:27
magicSqr

Joined: 27 Aug 2011
Posts: 105
magicSqr
It's not that it's not allowed, I just wondered if there was a quicker way since the quotient is unimportant, I just need the check the remainder.

The reason for the need for speed is that this div will be used billions of times on billions of numbers.
24 May 2014, 22:50
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17893
revolution
Have you tried using div? Or are you just assuming it will give poor results?
24 May 2014, 22:56
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17893
revolution
BTW: CPUs run at billions of instruction per second, so if you are only needing a billion results then it is probably not worth the effort of writing more complex code.
24 May 2014, 22:58
magicSqr

Joined: 27 Aug 2011
Posts: 105
magicSqr
I just assumed that div would be slower as it calculates both quotient and remainder.

My thinking was along the lines of the similar way of checking if a number is even or odd by either testing bit 0 or SHR 1 and checking the CY flag.

The whole routine will be run billions of times (and probably a good few orders of magnitudes more) so any time saving would be nice
24 May 2014, 23:14
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17893
revolution
magicSqr wrote:
I just assumed that div would be slower ...
Slower than what?

You will have to give more details about what you are doing. And preferably show us the current code you are using that has the poor performing div. Optimising for speed is very domain specific. It may be that you are chasing the wrong part of the program entirely based on a flawed assumption. Have you profiled the program yet? Have you examined the cache hit/miss rate? Is it memory bound or CPU bound or a mixture?
24 May 2014, 23:31
magicSqr

Joined: 27 Aug 2011
Posts: 105
magicSqr
I think you are making more of this than I intended

As an example, is RAX = 1 mod 2 (i.e. is it odd)

Code:
```mov    rbx, 2
div    rbx
cmp    rdx, 0
je     .even

.odd

' do odd stuff here
' jmp wherever

.even

' do even stuff here
' jmp wherever
```

is a lot slower than...

Code:
```shr    rax, 1
jnc    .even

.odd

' do odd stuff here
' jmp wherever

.even

' do even stuff here
' jmp wherever
```

So I was simply asking if anyone knew of a quicker way of finding out if RAX = 1 mod 24.

I hope that's a bit clearer
25 May 2014, 00:07
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17893
revolution
magicSqr wrote:
So I was simply asking if anyone knew of a quicker way of finding out if RAX = 1 mod 24.
But therein lies the difficulty. Runtime speed is variable depending upon a multitude of factors. Even your example above of testing even/odd might not be "quicker" at all if the latency is completely hidden by memory accesses. I think you have just been assuming that DIV is some sort of awful instruction and anything used to avoid it will be faster. But that is a false assumption because there are many overriding factors that need to be considered. If you just go ahead and spend a few days coding and testing a different method and find that the overall program run speed is only 0.05% faster then you will be disappointed when you learn that a much simpler method of just doing a one instruction fix with a memory prefetch could have given 10% improvement.
25 May 2014, 00:21
magicSqr

Joined: 27 Aug 2011
Posts: 105
magicSqr
ok, thanks revolution. Just wondering if, in your coding, whether you use mul and div when working with 2^x or do you use shl and shr? I started programming back in the late '80s when div, especially, *was* an awful instruction, and so got used to bit-shifting instead
25 May 2014, 00:44
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17893
revolution
I still use shifts when working with powers of two. But this is just a hangover from a time when CPUs didn't have MUL/DIV instructions. Many of my colleagues do not understand the shifts in my code and often question me about what it is doing. When I do change to MUL/DIV I usually find no significant difference in run speeds. But not all programs are the same. If you have DIV in a highly critical code path then there can be advantages to using something else. But one has to consider the associated costs of how much extra coding time and testing is required compared to how much overall saving can actually be realised. Also the extra factor of disparate systems will give different runtime results. I find that in most cases a multi-GHz CPU will easily do a DIV in less time than it took to read the value from DRAM. So a small amount of prefetching can give a large increase in performance even with DIV in the critical path.

BTW: DIV is less efficient than SHR, of course. But that is an entirely different thing.
25 May 2014, 01:25
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17893
revolution
Here is a magic value: 12,297,829,382,473,034,411 =ceiling(2^65/3)

v = input value
remainder = v - 24 * ((v * magic) shr 68)
25 May 2014, 01:51
magicSqr

Joined: 27 Aug 2011
Posts: 105
magicSqr
revolution wrote:
...Many of my colleagues ....

Under a certain age me thinks

revolution wrote:

Here is a magic value: 12,297,829,382,473,034,411 =ceiling(2^65/3)

v = input value
remainder = v - 24 * ((v * magic) shr 6

Thanks for this
25 May 2014, 02:34
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17893
revolution
Note that I give no warranty, guarantee or assurance that it works, or is fit for purpose. Use at you own risk. If while using this your dog bites you and runs away then that is not my fault. Also note that when using this method not all reciprocal divisors will give correct results for all possible input values. It is fine for 24 (you can prove this mathematically) but other divisors may need extra work for large input values. Like I mentioned above, test it to make sure it does what you expect over the entire range of input values.
25 May 2014, 04:36
edfed

Joined: 20 Feb 2006
Posts: 4246
Location: 2018
edfed
use a substraction loop?

a number is divisible by 3 if the sum of it's decimal digits is divisible by 3.

first shift by 3, and then, try to divide by 3.
26 May 2014, 13:56
r22

Joined: 27 Dec 2004
Posts: 805
r22
2 MULs, 1 SHR and 1 SUB to optimize 1 DIV may be cutting it close on modern processors, unless you can interleave those instructions with other parts of your algorithm to mitigate partial stalls.
27 May 2014, 13:54
edfed

Joined: 20 Feb 2006
Posts: 4246
Location: 2018
edfed
with a 4G dwrods lut, you can obtain a relativelly fast division. it will just take a lot of time to fill the ram with such useless values.
27 May 2014, 14:59
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17893
revolution
r22 wrote:
2 MULs, 1 SHR and 1 SUB ...
That last MUL can be done thus:
Code:
```shr rdx,4
lea rdx,[rdx*2+edx] ;*3
lea rdx,[rdx*8] ;*24    ```
But I don't know if it is "faster" than:
Code:
```shr rdx,4
imul rdx,24    ```
27 May 2014, 15:10
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area
Goto page 1, 2  Next

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum