flat assembler
Message board for the users of flat assembler.

Index > Main > Compute 10^n/10 using only 1 register with 2 instructions

Goto page Previous  1, 2, 3  Next
Author
Thread Post new topic Reply to topic
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20423
Location: In your JS exploiting you and your system
revolution 28 Jan 2019, 09:46
Oh, we need to do division in the ring.

So for 7 we do (6/4) mod 7 = 5
Becomes 2^32 * (6/(2^(32 mod period)) mod 7) / 7 = 0xB6DB6DB7

Smile
Post 28 Jan 2019, 09:46
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 28 Jan 2019, 10:06
I do not really understand what you are trying to do here. The construction I've shown does not use division anywhere.

For 7 it is all straightforward, because 7=2^3-1, so we already have a "nice" denominator and representation has period of 3. When numerator is -a and a<7, the block of 3 digits that is repeated is the same as binary representation of a:

-\frac{1}{7}=\ldots{}001,001,001,001,001_{(2)}

-\frac{2}{7}=\ldots{}010,010,010,010,010_{(2)}

-\frac{3}{7}=\ldots{}011,011,011,011,011_{(2)}

-\frac{4}{7}=\ldots{}100,100,100,100,100_{(2)}

-\frac{5}{7}=\ldots{}101,101,101,101,101_{(2)}

-\frac{6}{7}=\ldots{}110,110,110,110,110_{(2)}

If you only need low 32 digits, you just cut it off there (it does not matter that you cut in a middle of a block). For -\frac{6}{7} you get 0B6DB6DB6h. And you can get the low 32 digits of \frac{1}{7} by adding 1 to 0B6DB6DB6h. The low 32 digits are the same no matter if you add 1 to a whole 2-adic number or if just to a truncated version.
Post 28 Jan 2019, 10:06
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 28 Jan 2019, 10:26
In other words:

\frac{1}{7} = 1 - \frac{6}{7} = 1 + 6 $ shl $ 0 + 6 $ shl $ 3 + 6 $ shl $ 6 + 6 $ shl $ 9 + \ldots

\frac{1}{9} = 1 - \frac{56}{63} = 1 + 56 $ shl $ 0 + 56 $ shl $ 6 + 56 $ shl $ 12 + 56 $ shl $ 18 + \ldots

\frac{1}{11} = 1 - \frac{930}{1023} =  1 + 930 $ shl $ 0 + 930 $ shl $ 10 + 930 $ shl $ 20 + 930 $ shl $ 30 + \ldots

You could compute their truncated forms this way directly on a 32-bit CPU:
Code:
; compute 1/11:

        mov     eax,1

        xor     cl,cl
    periodic:

        mov     edx,930
        shl     edx,cl
        add     eax,edx

        add     cl,10

        cmp     cl,32
        jb      periodic    

The only hard part, if you wanted to compute 2-adic remainders at run-time, would be to find a period (preferably the minimum one). This means finding such a multiply of denominator that it is 2^k-1 for some k. You could apply methods discussed with Chinese remainder theorem. However, this might easily overflow on 32-bit architecture.

You may try to design a different algorithm computing 2-adic representation, but please note that 2-adic division is really something different from division of integers or standard base 2 floating-point numbers, even if the patterns of digits appear to be very similar. Numbers in base 2 may have infinite expansions to the right (negative powers of 2), while 2-adic numbers can expand infinitely to the left and the number of digits to the right of the point is always finite. With p-adic arithmetic you have an exact precision on the right side.

For example, consider a base 2 representation of \frac{1}{3}, which is 0.01010101010101...
If you try to multiply this by 3, the approximations are going to look like 0.1111... - their initial digit is wrong and there is a loss of precision.
On the other hand, the 2-adic representation of \frac{1}{3} is ...0101011. Now if you multiply it by 3, you get ...0001. You can stop at any point, dropping the higher digits, but the ones you have on the right end are already precise and final. This is because the low n digits of the result depend only on low n digits of the factors, so as long as you know the low digits of the factors precisely, you also know the low digits of the product precisely.
Post 28 Jan 2019, 10:26
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20423
Location: In your JS exploiting you and your system
revolution 28 Jan 2019, 11:00
I wanted to use the native division instruction.
Code:
mov ebx,7 ;the divisor
lea edx,[ebx-1] ;numerator high
xor eax,eax ;numerator low
div ebx ;6/7 bad result
inc eax ;+1    
But of course that is not the correct result. We need to adjust the numerator.
Code:
mov ebx,7 ;the divisor
lea edx,[ebx-1] ;numerator high
xor eax,eax ;numerator low
;code to compute (6 shr 2) mod 7 giving edx = 5
div ebx ;5/7 the correct result
inc eax ;+1    
The modulo division is easy, and I can understand it. The hard part, like you say, is to find the period. I still need to figure out that.

Edit: Something like this can do the modulo part
Code:
PERIOD = 3
mov ecx,32 mod PERIOD
.modulo_division:
bt edx,0
sbb eax,eax
and eax,ebx
add edx,eax
shr edx,1
loop .modulo_division    
It just remains to compute the PERIOD value.
Post 28 Jan 2019, 11:00
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 28 Jan 2019, 11:25
OK, I think I understand now. Basically, to find the low digits of \frac{1}{7} you want to compute 2^N divided by 7 (with a regular integer division with remainder), where N is at least 32 and a multiple of the period (3 in this case).

Then, because N is a multiple of period:
2^N \equiv 1\mod{7}
and your division gives:
2^N = 7M + 1.
Dividing both sides by 7, we get:
\frac{2^N}{7} = M + \frac{1}{7}
And thus:
M = - \frac{1}{7}+\frac{1}{7}2^N
This number only differs from actual -\frac{1}{7} on digits starting from position N. Since N is at least 32, it means that low 32 digits of this number are identical to -\frac{1}{7}. Now to get low digits of \frac{1}{7} you need to multiply this by 6 and then add 1, or alternatively just negate the number (subtract it from 0).

Yes, this seems like a valid method of obtaining the low digits of 2-adic fraction, but when N becomes too large it becomes problematic. And, of course, you need to know the period here.

I believe there should exist an actual algorithm for p-adic division digit after digit, starting from the lowest one - with such one we would not have to know the period in advance.
Post 28 Jan 2019, 11:25
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20423
Location: In your JS exploiting you and your system
revolution 28 Jan 2019, 11:28
The period for 1/37 is 36. I think that can't be done in 32-bit code.
Post 28 Jan 2019, 11:28
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 28 Jan 2019, 11:40
revolution wrote:
The period for 1/37 is 36. I think that can't be done in 32-bit code.
Well, you can do it like this:
Code:
; compute low 32 digits of -1/37:
        mov     edx,1 shl (36-32)
        xor     eax,eax
        sub     eax,1
        sbb     edx,0
        mov     ecx,37
        div     ecx
        test    edx,edx
        jnz     assertion_failed
; negate to get low 32 digits of 1/37:
        neg     eax    
For very large periods you would need to chain DIV instructions.
Post 28 Jan 2019, 11:40
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 28 Jan 2019, 11:59
I patched up a code that uses chained DIVs and should work for any odd divisor:
Code:
        mov     ecx,2   ; number of dwords, except the highest one all are 0FFFFFFFFh
        mov     ebx,1   ; start with 2^33-1
    search:
        push    ecx
        xor     edx,edx
        mov     eax,ebx
        jmp     first_division
    divide:
        or      eax,0FFFFFFFFh
      first_division:
        div     [divisor]
        loop    divide
        pop     ecx
        test    edx,edx
        jz      found
        stc
        rcl     ebx,1
        jnc     search
        mov     ebx,1
        inc     ecx
        jmp     search
    found:              ; eax is now low 32 digits of -1/[divisor]
        neg     eax     ; negate to get low 32 digits of 1/[divisor]    


Last edited by Tomasz Grysztar on 28 Jan 2019, 12:43; edited 1 time in total
Post 28 Jan 2019, 11:59
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20423
Location: In your JS exploiting you and your system
revolution 28 Jan 2019, 12:01
Yes, for very large periods I think using a normal native divider is the way. So the code needs to split in two paths, one for short periods where we can use a single imul, and another for other values where we use the div.
Post 28 Jan 2019, 12:01
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 28 Jan 2019, 12:05
revolution wrote:
Yes, for very large periods I think using a normal native divider is the way. So the code needs to split in two paths, one for short periods where we can use a single imul, and another for other values where we use the div.
When divisor is an odd number, once you have low 32 bits of its reciprocal, you always use just a single IMUL:
Code:
; assuming EAX is divisible by 137, compute the quotient:
        imul    eax,077975B9h   ; multiply by low 32 digits of 2-adic 1/137    
(for 137 the period is 68).
Post 28 Jan 2019, 12:05
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20423
Location: In your JS exploiting you and your system
revolution 28 Jan 2019, 12:13
Tomasz Grysztar wrote:
When divisor is an odd number, once you have low 32 bits of its reciprocal, you always use just a single IMUL.
Okay. I haven't tested it yet, I had just assumed it wouldn't be valid. Good, I'm glad my assumption was wrong. Smile
Post 28 Jan 2019, 12:13
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 28 Jan 2019, 12:22
revolution wrote:
Tomasz Grysztar wrote:
When divisor is an odd number, once you have low 32 bits of its reciprocal, you always use just a single IMUL.
Okay. I haven't tested it yet, I had just assumed it wouldn't be valid. Good, I'm glad my assumption was wrong. Smile
My explanation might have been concise, but if you follow all the details, it shows that all is predicable and precise. As long as the divisor is odd and the number is exactly divisible by it, if you multiply by the low bits of reciprocal, the same number of bits of result are going to be exact, with no exceptions.

The only hard part is to find the reciprocal. The algorithms we have here are not efficient. I may think about finding something better.

On the other hand, when you know the divisor in advance (which is usual case for replacing division with multiplication), you can take some time to find its 2-adic reciprocal and then just use it.
Post 28 Jan 2019, 12:22
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 28 Jan 2019, 13:20
All right, I designed much more efficient algorithm for computing 2-adic division. Just put A=1 to compute a reciprocal:
Code:
divide_2adic:
; in:
;  edx = low 32 bits of A
;  ebx = low 32 bits of B (must be odd)
; on success:
;  cf = 0
;  eax = low 32 bits of 2-adic A/B
        test    ebx,1
        jz      .invalid
        xor     eax,eax
    .iteration:
        mov     ecx,eax
        imul    ecx,ebx
        xor     ecx,edx
        bsf     ecx,ecx
        jz      .done
        bts     eax,ecx
        jmp     .iteration
    .done:
        clc
        ret
    .invalid:
        stc
        ret    
Code:
; a few tests:

        mov     edx,1
        mov     ebx,3
        call    divide_2adic    ; result: 0xAAAAAAAB

        mov     edx,-1
        mov     ebx,3
        call    divide_2adic    ; result: 0x55555555

        mov     edx,1
        mov     ebx,5
        call    divide_2adic    ; result: 0xCCCCCCCD

        mov     edx,1
        mov     ebx,7
        call    divide_2adic    ; result: 0xB6DB6DB7

        mov     edx,1
        mov     ebx,37
        call    divide_2adic    ; result: 0x914C1BAD

        mov     edx,1
        mov     ebx,137
        call    divide_2adic    ; result: 0x077975B9
    
Post 28 Jan 2019, 13:20
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20423
Location: In your JS exploiting you and your system
revolution 28 Jan 2019, 14:13
I did some more tests. In fact I brute forced the entire 32-bit space. Didn't take too long either, just a few minutes. Using a normal div instead of imul would have taken much longer.
Code:
format ELF executable 0
entry main

SYS_EXIT                = 1
SYS_WRITE               = 4
STD_OUTPUT              = 1

segment executable

main:
        mov     esi,1
    .loop_divisor:
        ;limit the print output to avoid spamming the console
        mov     edx,esi
        and     edx,not 1
        bsf     eax,edx
        bsr     edx,edx
        sub     edx,eax
        cmp     edx,4 - 1       ;if we have no more than the top four bits set then print something
        ja      .done_print
        mov     edx,esi
        mov     ecx,trial.at
        call    write_hex
        mov     ecx,trial
        mov     edx,trial.len
        call    out_message
    .done_print:
        ;divide_2adic:
        ;see post from Tomasz Grysztar at https://board.flatassembler.net/topic.php?p=208280#208280
        xor     edi,edi
    .divide_2adic_test:
        mov     ecx,edi
        imul    ecx,esi
        xor     ecx,1           ;fixed numerator of 1
        bsf     ecx,ecx
        jz      .divide_2adic_done
        bts     edi,ecx
        jmp     .divide_2adic_test
    .divide_2adic_done:         ;edi = the 2-adic divisor
        xor     eax,eax
    .loop_trials:
        mov     ecx,eax
        imul    ecx,edi
        imul    ecx,esi
        cmp     eax,ecx
        jne     .failed
        add     eax,esi         ;only trial proper multiples
        jnc     .loop_trials
        add     esi,2           ;only trial odd numbers
        jnc     .loop_divisor
        mov     ecx,success
        mov     edx,success.len
        call    out_message
    .close:
        mov     eax,SYS_EXIT
        xor     ebx,ebx
        int     0x80
    .failed:
        mov     edx,esi
        mov     ecx,fail.at
        call    write_hex
        mov     ecx,fail
        mov     edx,fail.len
        call    out_message
        jmp     .close

out_message:
        mov     eax,SYS_WRITE
        mov     ebx,STD_OUTPUT
        int     0x80
        retn

write_hex:
        ;ecx = address
        ;edx = value
    .next_nibble:
        mov     eax,edx
        shr     eax,28
        cmp     al,10
        sbb     al,0x69
        das
        mov     [ecx],al
        inc     ecx
        shl     edx,4
        jnz     .next_nibble
        retn

segment readable writeable

trial:          db      'Trialing 0x'
trial.at        db      '00000000',10
trial.len       =       $ - trial

fail:           db      'Failed at 0x'
fail.at         db      '00000000',10
fail.len        =       $ - fail

success:        db      'All values passed okay',10
success.len     =       $ - success    
Quote:
All values passed okay
Very good Smile
Post 28 Jan 2019, 14:13
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 28 Jan 2019, 15:22
This also seems like a good place to try out a BMI instruction:
Code:
divide_2adic:
; in:
;  edx = low 32 bits of A
;  ebx = low 32 bits of B (must be odd)
; on success:
;  cf = 0
;  eax = low 32 bits of 2-adic A/B
        test    ebx,1
        jz      .invalid
        xor     eax,eax
    .iteration:
        mov     ecx,eax
        imul    ecx,ebx
        xor     ecx,edx
        jz      .done
        blsi    ecx,ecx
        or      eax,ecx
        jmp     .iteration
    .done:
        clc
        ret
    .invalid:
        stc
        ret    
Post 28 Jan 2019, 15:22
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20423
Location: In your JS exploiting you and your system
revolution 28 Jan 2019, 15:38
Tomasz Grysztar wrote:
This also seems like a good place to try out a BMI instruction:
Hmm, I'll try maybe another time.
Quote:
Illegal instruction (core dumped)
Post 28 Jan 2019, 15:38
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20423
Location: In your JS exploiting you and your system
revolution 28 Jan 2019, 16:47
We can also get other ratios with this.

For example for division by 5:
To get a ratio of 0.2 (1/5) for numbers of the form 5*n use the value 0xcccccccd
To get a ratio of 0.4 (2/5) for numbers of the form 5*n use the value 0x9999999a
To get a ratio of 0.6 (3/5) for numbers of the form 5*n use the value 0x66666667
To get a ratio of 0.8 (4/5) for numbers of the form 5*n use the value 0x33333334
Post 28 Jan 2019, 16:47
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 28 Jan 2019, 22:30
Wait, even IMUL is not really needed there. The product can be accumulated instead of being recomputed every iteration:
Code:
divide_2adic:
; in:
;  edx = low 32 bits of A
;  ebx = low 32 bits of B (must be odd)
; on success:
;  cf = 0
;  eax = low 32 bits of 2-adic A/B
        test    ebx,1
        jz      .invalid
        xor     eax,eax
        xor     esi,esi
    .iteration:
        mov     ecx,esi
        xor     ecx,edx
        jz      .done
        bsf     ecx,ecx
        bts     eax,ecx
        mov     edi,ebx
        shl     edi,cl
        add     esi,edi
        jmp     .iteration
    .done:
        clc
        ret
    .invalid:
        stc
        ret    
Post 28 Jan 2019, 22:30
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20423
Location: In your JS exploiting you and your system
revolution 29 Jan 2019, 06:06
Tomasz Grysztar wrote:
Wait, even IMUL is not really needed there. The product can be accumulated instead of being recomputed every iteration:
Still passes all test values, and it's slightly quicker on my machine in the full 32-bit range tester.

Surprisingly the runtimes are about evenly split between the 2-adic computation and the range trialling checker. But in the final application the 2-adic computation time will be insignificant, which is why I didn't really care about using div. Just that using div was tricky with the period determination.
Post 29 Jan 2019, 06:06
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 29 Jan 2019, 09:30
You could also use this routine directly, to divide any two numbers. Well, except that B needs to be odd, but if it's even you can start with shifting both A and B right in parallel until B becomes odd:
Code:
divide_2adic:
; in:
;  edx = low 32 bits of A
;  ebx = low 32 bits of B
; on success:
;  cf = 0
;  eax = low 32 bits of 2-adic A/B
; on failure:
;  cf = 1
        test    ebx,1
        jz      .shift
        xor     eax,eax
        xor     esi,esi
    .iteration:
        mov     ecx,esi
        xor     ecx,edx
        jz      .done
        bsf     ecx,ecx
        bts     eax,ecx
        mov     edi,ebx
        shl     edi,cl
        add     esi,edi
        jmp     .iteration
    .done:
        clc
        ret
    .shift:
        shr     ebx,1
        jz      .invalid        ; divide by 0
        shr     edx,1
        jnc     divide_2adic
        ; underflow
    .invalid:
        stc
        ret    


In fact, with this function you get access to all basic arithmetic operations on 2-adic numbers (addition, subtraction, multiplication and division), as long as you can go by having just low 32 digits of the result. ADD, SUB and IMUL work just as well for truncated 2-adic numbers as for integers, so the division is the only operation that needs a specialized routine.

What is especially nice about truncated 2-adic arithmetics is that you can use it to perform several operations on rational numbers, and as long as you know that the final result is going to be an integer in 32-bit range, you know that result is going to be precise. You need such additional knowledge because just by looking at the lowest 32 bits of 2-adic result you are not able to tell an integer from a fraction.

Within this truncated 32-bit section, arithmetic operations do not lose precision at any point, except when you divide by 2, which is a shift right and can lose a set bit by shifting it out of register (it is a failure condition in the above function). But the number of bits that are shifted into negative range is always finite, so you can get around the problem by shifting your numbers left far enough (and then shift the final result back right).

Other than that, you can accumulate as many arithmetic operations on rational numbers as you wish and the result is still going to be exact, no matter how many additions/multiplications/divisions you do:
Code:
; compute (1/13+8/17)*(210/11+1) = 11

        mov     edx,1
        mov     ebx,13
        call    divide_2adic
        jc      failed
        push    eax
        mov     edx,8
        mov     ebx,17
        call    divide_2adic
        jc      failed
        pop     ebx
        add     eax,ebx
        push    eax
        mov     edx,210
        mov     ebx,11
        call    divide_2adic
        jc      failed
        inc     eax
        pop     ebx
        imul    eax,ebx

        cmp     eax,11
        jne     failed    
Post 29 Jan 2019, 09:30
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.