flat assembler
Message board for the users of flat assembler.

Index > Windows > Working with 64-bit numbers **Need help**!

Author
Thread Post new topic Reply to topic
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 05 Jan 2008, 03:48
Hey, I've been trying to work with shifting and rotating 64-bit numbers, and have been using the GPR's with shld and such to do this very slowly (with some memory dwords for temp storage of GPR's...Sad) and I know there has to be a better way. I've heard about FPU regs and other weird stuff but I've never used them before. Does anyone know how to use those so I can find an easier way? Confused
Post 05 Jan 2008, 03:48
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20356
Location: In your JS exploiting you and your system
revolution 05 Jan 2008, 04:18
If you use the 32bit instructions there is very little you can do to improve it. If you can use the 64bit instructions then your job is easier. Also try the SSE2, it can deal with 64bit integers, but it doesn't support rotates directly.

Using the FPU for integer tasks is risky and requires strict control on the ranges of numbers you are dealing with. Not recommended for the average app.
Post 05 Jan 2008, 04:18
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 05 Jan 2008, 04:30
Thanks, I'll check out SSE2. The main things I need to do with the stuff is rotations/shifting.... I was using combinations of sh*d and sh* to do it with the GPR's. Not a good idea for me, in some cases I had to rotate by over 60 bits Confused, so I reversed a lot of it (didn't work then) and eventually decided to check to u guys for help. Thanks
Post 05 Jan 2008, 04:30
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20356
Location: In your JS exploiting you and your system
revolution 05 Jan 2008, 04:36
Here are some macros from my hash library
Code:
macro rol64 rl,rh,bits,scratch {
    if ((bits) mod 64) > 32
      ror64 rl,rh,(64-((bits) mod 64)),scratch
    else if ((bits) mod 64) = 32
        xchg    rl,rh
    else if ((bits) mod 64) > 0
 mov     scratch,rh
  shld    rh,rl,((bits) mod 64)
       shld    rl,scratch,((bits) mod 64)
    end if
}
macro ror64 rl,rh,bits,scratch {
    if ((bits) mod 64) > 32
       rol64 rl,rh,(64-((bits) mod 64)),scratch
    else if ((bits) mod 64) = 32
        xchg    rl,rh
    else if ((bits) mod 64) > 0
 mov     scratch,rl
  shrd    rl,rh,((bits) mod 64)
       shrd    rh,scratch,((bits) mod 64)
    end if
}
macro shl64 rl,rh,bits {
    if ((bits) mod 64) > 32
       mov     rh,rl
       shl     rh,(((bits) mod 64)-32)
     xor     rl,rl
    else if ((bits) mod 64) = 32
   mov     rh,rl
       xor     rl,rl
    else if ((bits) mod 64) > 0
 shld    rh,rl,((bits) mod 64)
       shl     rl,((bits) mod 64)
    end if
}
macro shr64 rl,rh,bits {
    if ((bits) mod 64) > 32
       mov     rl,rh
       shr     rl,(((bits) mod 64)-32)
     xor     rh,rh
    else if ((bits) mod 64) = 32
   mov     rl,rh
       xor     rh,rh
    else if ((bits) mod 64) > 0
 shrd    rl,rh,((bits) mod 64)
       shr     rh,((bits) mod 64)
    end if
}    
It uses 32bit registers and gives medium performance.


Last edited by revolution on 05 Jan 2008, 05:38; edited 1 time in total
Post 05 Jan 2008, 04:36
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 05 Jan 2008, 05:00
"The hash library"?? Well, he is using pretty much the same exact instructions (and order ?!) that I used. I tried to make a macro, but I just replaced it with two instructions per rotate and two or three for a shift in the GPR's.. Yeah, that's exactly what I had in mine lol.. Save the low value, sh*d, then another sh*d with the saved value... I just ran a processor identifier thing for my intel and it says I've got SSE3 but not SSE4. I'll check out the instruction manuals again tomorrow morning about how that stuff works with numbers. Thanks for tracking down that code, if I can't get the SSE workin on it then I'll have to debug my crazy shifting tomorrow Smile... I think the entire schedule is off though, I'm attempting SHA-512. I got all the 32-bit versions of the SHA family done, except SHA-0, so all I need to do is figure out how to shift and rotate in 64-bit mode (at the same time make it easy to debug Sad)... If u want me to post the scrap of code here that I used for shifting/rot, I will when I get up in about 8 or 9 hours. Good night!
Post 05 Jan 2008, 05:00
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20356
Location: In your JS exploiting you and your system
revolution 05 Jan 2008, 05:36
AlexP wrote:
"The hash library"??
Yeah the hash library, it is on my HDD written by me. I thought everyone had one. Wink I've edited it, sorry for the confusion.

Strange coincidence, because the macros up there were also used for my SHA512 code which I did ~3 years ago.
Post 05 Jan 2008, 05:36
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 05 Jan 2008, 16:38
lol I had macros running for me too, but I've never used IF structures in FASM, so I just inlined-it when FASM said "cannnot cmp 32,6" Smile. Well, a few weeks ago I made a nice schedule for me, and I've got some projects planned to do. The first is a hash library, the second a symmetrical crypt library, and the third is asymmetrical (probably just RSA and maybe something from RC). I've gotten the SHA family almost done except for this small chunk of code, the only problem is that my functions don't accept zero-length hashes Smile... Could u post the code here that you used for SHA??? When I get my 512 done I'll post it here too.. Here's the revised code with your macros.
Revision 3 of code, with comments for people
Code:
;****************************************
;     SHA-384/512 Exported Functions
;****************************************
SHA384: mov [RequestedSize],384
        jmp @f
SHA512: mov [RequestedSize],512
    @@: push ebp
        mov ebp,esp
        pusha
        call PreProcess
        cmp eax,0
        jnz SHAerror
        xor edi,edi
    .A: ;Begin first message schedule
        push edi
        mov esi,[Base_Address]
        add esi,edi
        mov edi,MessageSch
        mov ecx,0x20
        rep movsd
        lea esi,[MessageSch+128] ;After first 16 words
        mov [Temp3],esi
        add [Temp3],512 ;End of message schedule
        xor ebp,ebp
    @@: ;Begin second message schedule
        ;Ror 19
        mov eax,[esi-16]
        mov ebx,[esi-12]
        ror64 eax,ebx,19,edi
        ;Ror 61
        mov ecx,[esi-16]
        mov edx,[esi-12]
        ror64 ecx,edx,61,edi
        ;Xor them
        xor eax,ecx
        xor ebx,edx
        ;Shr 6
        mov ecx,[esi-16]
        mov edx,[esi-12]
        shr64 ecx,edx,6
        ;Xor them
        xor eax,ecx
        xor ebx,edx
        ;Now add w[t-7]
        add eax,[esi-56]
        adc ebx,[esi-52]
        ;Store in temp vars
        mov [Temp1],eax
        mov [Temp2],ebx
        ;Now compute 0^0 of w[t-15]
        mov eax,[esi-120]
        mov ebx,[esi-116]
        ror64 eax,ebx,1,edi
        mov ecx,[esi-120]
        mov edx,[esi-116]
        ror64 ecx,edx,8,edi
        xor eax,ecx
        xor ebx,edx
        mov ecx,[esi-120]
        mov edx,[esi-116]
        shr64 ecx,edx,7
        xor eax,ecx
        xor ebx,edx
        add eax,[Temp1] ;0^0 is in regs,previous is in temps
        adc ebx,[Temp2]
        ;Done with 0^0, now add w[t-16]
        add eax,[esi-128]
        adc ebx,[esi-124]
        ;Done with main schedule loop
        mov [esi],eax
        mov [esi+4],ebx
        add esi,8
        cmp esi,[Temp3]
        jnz @b
        xor esi,esi
        pop edi
        call SHAhighInitializeVars
        ;End Schedule
    @@: ;Begin main loop

        ;Regs:
        ;Eax-Edx are scratch
        ;Esi is the message schedule offset, unused here in main loop
        ;Edi is scratch for rotations/shifting
        ;Ebp is the round number 0>79
        ;Esp is stack pointer

        ;Calculate T1 variable
        ;Equation: h+E^1(e) + Ch(e,f,g) + K[round] + W[round]

        ;Calculate E^1 summation of e
        ;Equation: Ror14(e) xor Ror18(e) xor Ror41(e)
        ;Ror by 14
        mov eax,[BE]
        mov ebx,[BE+4]
        ror64 eax,ebx,14,edi
        ;Ror by 18
        mov ecx,[BE]
        mov edx,[BE+4]
        ror64 ecx,edx,18,edi
        ;Xor the first two values
        xor eax,ecx
        xor ebx,edx
        ;Ror by 41
        mov ecx,[BE]
        mov edx,[BE+4]
        ror64 ecx,edx,41,edi
        ;Xor by the last value
        xor eax,ecx
        xor ebx,edx
        ;Add h to this
        add eax,[BHH]
        adc ebx,[BHH+4]

        ;Now calculate CH(e,f,g)
        ;Equation: (e and f) xor (not e and g)
        mov ecx,[BE]
        mov edx,[BE+4]
        and ecx,[BF]
        and edx,[BF+4]
        ;Not enough regs, so push these
        push ecx edx
        mov ecx,[BE]
        mov edx,[BE+4]
        not ecx
        not edx
        and ecx,[BG]
        and edx,[BG+4]
        ;xor by the old regs on stack
        xor ecx,[esp]
        xor edx,[esp+4]
        ;Get the old regs off the stack
        add esp,8
        add eax,ecx
        adc ebx,edx

        ;Add K offset from K80 constant table
        add eax,[ebp*8+K80]
        adc ebx,[ebp*8+K80+4]

        ;Add Message Schedule offset from MessageSch table
        add eax,[MessageSch+ebp*8]
        adc ebx,[MessageSch+ebp*8+4]

        ;And store them in the final T1 variable
        mov [BT1],eax
        mov [BT1+4],ebx

        ;Now find the T2 variable
        ;Equation: E^0(a) + Maj(a,b,c)

        ;First, E^0 of a
        ;Equation: Ror28(a) xor Ror34(a) xor Ror39(a)
        ;Ror by 28
        mov eax,[BA]
        mov ebx,[BA+4]
        ror64 eax,ebx,28,edi
        ;Ror by 34
        mov ecx,[BA]
        mov edx,[BA+4]
        ror64 ecx,edx,34,edi
        ;Xor the first two values
        xor eax,ecx
        xor ebx,edx
        ;Now ror by 39
        mov ecx,[BA]
        mov edx,[BA+4]
        ror64 ecx,edx,39,edi
        ;Xor by the last value
        xor eax,ecx
        xor ebx,edx

        ;Now find Maj(a,b,c)
        ;Equation: (a and b) xor (a and c) xor (b and c)
        mov ecx,[BA]
        mov edx,[BA+4]
        and ecx,[BB]
        and edx,[BB+4]
        ;Again, need more registers
        push ecx edx
        mov ecx,[BA]
        mov edx,[BA+4]
        and ecx,[BC]
        and edx,[BC+4]
        ;Xor by the old regs
        xor ecx,[esp]
        xor edx,[esp+4]
        ;Get old regs off stack
        add esp,8
        ;Need these regs
        push ecx edx
        mov ecx,[BB]
        mov edx,[BB+4]
        and ecx,[BC]
        and edx,[BC+4]
        ;Xor by old regs
        xor ecx,[esp]
        xor edx,[esp+4]
        ;Get rid of old regs
        add esp,8
        ;Done with Maj(a,b,c), now add it to the current val
        add eax,ecx
        adc ebx,edx
        ;And store them in the T2 variable
        mov [BT2],eax
        mov [BT2+4],ebx

        ;Here are the final moves in the main loop
        ;H=G
        mov eax,[BG]
        mov ebx,[BG+4]
        mov [BHH],eax
        mov [BHH+4],ebx
        ;G=F
        mov eax,[BF]
        mov ebx,[BF+4]
        mov [BG],eax
        mov [BG+4],ebx
        ;F=E
        mov eax,[BE]
        mov ebx,[BE+4]
        mov [BF],eax
        mov [BF+4],ebx
        ;E=D+T1    ;NOTE: This is not working correctly
        mov eax,[BD]
        mov ebx,[BD+4]
        add eax,[BT1]
        adc ebx,[BT1+4]
        mov [BE],eax
        mov [BE+4],ebx
        ;D=C
        mov eax,[BC]
        mov ebx,[BC+4]
        mov [BD],eax
        mov [BD+4],ebx
        ;C=B
        mov eax,[BB]
        mov ebx,[BB+4]
        mov [BC],eax
        mov [BC+4],ebx
        ;B=A
        mov eax,[BA]
        mov ebx,[BA+4]
        mov [BB],eax
        mov [BB+4],ebx
        ;A=T1+T2   ;NOTE: This is not working correctly
        mov eax,[BT1]
        mov ebx,[BT1+4]
        add eax,[BT2]
        adc ebx,[BT2+4]
        mov [BA],eax
        mov [BA+4],ebx
        ;End of main loop
        inc ebp
        cmp ebp,80
        jnz @b
        ;Add the current hash value to the previous
        call SHAhighAddResults
        add edi,128
        dec [NumBlocks]
        jnz .A
        mov edi,[OutData]
        mov esi,BH0
        mov ecx,16
        cmp [RequestedSize],384
        jne @f
        sub ecx,4
        ;And then move the final hash value to the out location
    @@: rep movsd
        call EndProcess
        cmp eax,0
        jnz SHAerror
        popa
        pop ebp
        ret 0xC
    

The template works, the variable switching at the end works, except I failed at getting the T1 and T2 equations working. They take up the bulk of my code, and due to a lack of GPR's, I use memory locations sometimes. I'll get around to just pushing a few registers before these main equations happen so I don't have to do this. Sad... I'm working on simplifying the code right now, should be done in an hour or so.

Everything but the two equations for the T1 and T2 variables are working, this may be because of the shifting logic (would explain it, but then it means more problems with the message schedule). I'm using some pretty messy logic when I keep track of the registers, I am hoping that once I get this all figured out I'll use the stack for the A-H working variables to slim down memory costs. I commented most of the equations and such to make it easier to read, I already found and fixed a pair of very bad bugs, but it didn't solve it. I keep on looking through it and debugging, but I can't find the problem... If you would like a copy and paste try just ask for the full code.
Post 05 Jan 2008, 16:38
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20356
Location: In your JS exploiting you and your system
revolution 14 Jan 2008, 09:22
AlexP: I can' t post my existing SHA code. Although written by me it is not owned by me, my company retains copyright. The macros I posted above are as far as I can go with posting code.

But looking at your code it is not all that clear. In my code I used macros named things like: sigma0, sigma1, alpha0, alpha1, cho, maj. Then a round macro and finally combined then into a short piece of code that is mostly macros that does the hashing. I find the macro names provide a clear overview of the process that is taking place in the main loop.
Post 14 Jan 2008, 09:22
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 18 Jan 2008, 23:20
Okay, I just noticed this post. I thought I had found the problem with my 64-bit code, but I have yet to fully debug it. Thanks though!
Post 18 Jan 2008, 23:20
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.