flat assembler
Message board for the users of flat assembler.

Index > Main > Calculate if a number is equal to 1 mod 24 without using DIV

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
magicSqr



Joined: 27 Aug 2011
Posts: 105
magicSqr 24 May 2014, 17:09
Hi,

Just a quickie (I hope)...

I need to calculate if a number is equal to 1 mod 24.

Is there any optimized code for this rather than using DIV and checking RDX?

I do not require the quotient.

many thanks,

magicĀ²
Post 24 May 2014, 17:09
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 24 May 2014, 21:36
Hint: LEA
Post 24 May 2014, 21:36
View user's profile Send private message Reply with quote
magicSqr



Joined: 27 Aug 2011
Posts: 105
magicSqr 24 May 2014, 21:50
I know how to use LEA for addition and multiplication. I can't see how it helps with my problem though.
Post 24 May 2014, 21:50
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 24 May 2014, 22:27
You can try using reciprocal multiply. Or a precomputed bitmap.

But why is div not allowed?
Post 24 May 2014, 22:27
View user's profile Send private message Visit poster's website Reply with quote
magicSqr



Joined: 27 Aug 2011
Posts: 105
magicSqr 24 May 2014, 22:50
It's not that it's not allowed, I just wondered if there was a quicker way since the quotient is unimportant, I just need the check the remainder.

The reason for the need for speed is that this div will be used billions of times on billions of numbers.
Post 24 May 2014, 22:50
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 24 May 2014, 22:56
Have you tried using div? Or are you just assuming it will give poor results?
Post 24 May 2014, 22:56
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 24 May 2014, 22:58
BTW: CPUs run at billions of instruction per second, so if you are only needing a billion results then it is probably not worth the effort of writing more complex code.
Post 24 May 2014, 22:58
View user's profile Send private message Visit poster's website Reply with quote
magicSqr



Joined: 27 Aug 2011
Posts: 105
magicSqr 24 May 2014, 23:14
I just assumed that div would be slower as it calculates both quotient and remainder.

My thinking was along the lines of the similar way of checking if a number is even or odd by either testing bit 0 or SHR 1 and checking the CY flag.

The whole routine will be run billions of times (and probably a good few orders of magnitudes more) so any time saving would be nice Wink
Post 24 May 2014, 23:14
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 24 May 2014, 23:31
magicSqr wrote:
I just assumed that div would be slower ...
Slower than what?

You will have to give more details about what you are doing. And preferably show us the current code you are using that has the poor performing div. Optimising for speed is very domain specific. It may be that you are chasing the wrong part of the program entirely based on a flawed assumption. Have you profiled the program yet? Have you examined the cache hit/miss rate? Is it memory bound or CPU bound or a mixture?
Post 24 May 2014, 23:31
View user's profile Send private message Visit poster's website Reply with quote
magicSqr



Joined: 27 Aug 2011
Posts: 105
magicSqr 25 May 2014, 00:07
I think you are making more of this than I intended Sad

As an example, is RAX = 1 mod 2 (i.e. is it odd)

Code:
mov    rbx, 2
div    rbx
cmp    rdx, 0
je     .even

.odd

' do odd stuff here 
' jmp wherever

.even

' do even stuff here 
' jmp wherever
    


is a lot slower than...

Code:
shr    rax, 1
jnc    .even

.odd

' do odd stuff here 
' jmp wherever

.even

' do even stuff here 
' jmp wherever
    


So I was simply asking if anyone knew of a quicker way of finding out if RAX = 1 mod 24.

I hope that's a bit clearer Wink
Post 25 May 2014, 00:07
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 25 May 2014, 00:21
magicSqr wrote:
So I was simply asking if anyone knew of a quicker way of finding out if RAX = 1 mod 24.
But therein lies the difficulty. Runtime speed is variable depending upon a multitude of factors. Even your example above of testing even/odd might not be "quicker" at all if the latency is completely hidden by memory accesses. I think you have just been assuming that DIV is some sort of awful instruction and anything used to avoid it will be faster. But that is a false assumption because there are many overriding factors that need to be considered. If you just go ahead and spend a few days coding and testing a different method and find that the overall program run speed is only 0.05% faster then you will be disappointed when you learn that a much simpler method of just doing a one instruction fix with a memory prefetch could have given 10% improvement.
Post 25 May 2014, 00:21
View user's profile Send private message Visit poster's website Reply with quote
magicSqr



Joined: 27 Aug 2011
Posts: 105
magicSqr 25 May 2014, 00:44
ok, thanks revolution. Just wondering if, in your coding, whether you use mul and div when working with 2^x or do you use shl and shr? I started programming back in the late '80s when div, especially, *was* an awful instruction, and so got used to bit-shifting instead Wink
Post 25 May 2014, 00:44
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 25 May 2014, 01:25
I still use shifts when working with powers of two. But this is just a hangover from a time when CPUs didn't have MUL/DIV instructions. Many of my colleagues do not understand the shifts in my code and often question me about what it is doing. When I do change to MUL/DIV I usually find no significant difference in run speeds. But not all programs are the same. If you have DIV in a highly critical code path then there can be advantages to using something else. But one has to consider the associated costs of how much extra coding time and testing is required compared to how much overall saving can actually be realised. Also the extra factor of disparate systems will give different runtime results. I find that in most cases a multi-GHz CPU will easily do a DIV in less time than it took to read the value from DRAM. So a small amount of prefetching can give a large increase in performance even with DIV in the critical path.

BTW: DIV is less efficient than SHR, of course. But that is an entirely different thing.
Post 25 May 2014, 01:25
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 25 May 2014, 01:51
Here is a magic value: 12,297,829,382,473,034,411 =ceiling(2^65/3)

v = input value
remainder = v - 24 * ((v * magic) shr 68)
Post 25 May 2014, 01:51
View user's profile Send private message Visit poster's website Reply with quote
magicSqr



Joined: 27 Aug 2011
Posts: 105
magicSqr 25 May 2014, 02:34
revolution wrote:
...Many of my colleagues ....

Under a certain age me thinks Very Happy

revolution wrote:

Here is a magic value: 12,297,829,382,473,034,411 =ceiling(2^65/3)

v = input value
remainder = v - 24 * ((v * magic) shr 6Cool

Thanks for this Wink
Post 25 May 2014, 02:34
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 25 May 2014, 04:36
Note that I give no warranty, guarantee or assurance that it works, or is fit for purpose. Use at you own risk. If while using this your dog bites you and runs away then that is not my fault. Also note that when using this method not all reciprocal divisors will give correct results for all possible input values. It is fine for 24 (you can prove this mathematically) but other divisors may need extra work for large input values. Like I mentioned above, test it to make sure it does what you expect over the entire range of input values.
Post 25 May 2014, 04:36
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4354
Location: Now
edfed 26 May 2014, 13:56
use a substraction loop?

a number is divisible by 3 if the sum of it's decimal digits is divisible by 3.

first shift by 3, and then, try to divide by 3.
Post 26 May 2014, 13:56
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 27 May 2014, 13:54
2 MULs, 1 SHR and 1 SUB to optimize 1 DIV may be cutting it close on modern processors, unless you can interleave those instructions with other parts of your algorithm to mitigate partial stalls.
Post 27 May 2014, 13:54
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4354
Location: Now
edfed 27 May 2014, 14:59
with a 4G dwrods lut, you can obtain a relativelly fast division. it will just take a lot of time to fill the ram with such useless values.
Post 27 May 2014, 14:59
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 27 May 2014, 15:10
r22 wrote:
2 MULs, 1 SHR and 1 SUB ...
That last MUL can be done thus:
Code:
shr rdx,4
lea rdx,[rdx*2+edx] ;*3
lea rdx,[rdx*8] ;*24    
But I don't know if it is "faster" than:
Code:
shr rdx,4
imul rdx,24    
Post 27 May 2014, 15:10
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.