flat assembler
Message board for the users of flat assembler.
Index
> Windows > Working with 64bit numbers **Need help**! 
Author 

revolution 05 Jan 2008, 04:18
If you use the 32bit instructions there is very little you can do to improve it. If you can use the 64bit instructions then your job is easier. Also try the SSE2, it can deal with 64bit integers, but it doesn't support rotates directly.
Using the FPU for integer tasks is risky and requires strict control on the ranges of numbers you are dealing with. Not recommended for the average app. 

05 Jan 2008, 04:18 

AlexP 05 Jan 2008, 04:30
Thanks, I'll check out SSE2. The main things I need to do with the stuff is rotations/shifting.... I was using combinations of sh*d and sh* to do it with the GPR's. Not a good idea for me, in some cases I had to rotate by over 60 bits , so I reversed a lot of it (didn't work then) and eventually decided to check to u guys for help. Thanks


05 Jan 2008, 04:30 

revolution 05 Jan 2008, 04:36
Here are some macros from my hash library
Code: macro rol64 rl,rh,bits,scratch { if ((bits) mod 64) > 32 ror64 rl,rh,(64((bits) mod 64)),scratch else if ((bits) mod 64) = 32 xchg rl,rh else if ((bits) mod 64) > 0 mov scratch,rh shld rh,rl,((bits) mod 64) shld rl,scratch,((bits) mod 64) end if } macro ror64 rl,rh,bits,scratch { if ((bits) mod 64) > 32 rol64 rl,rh,(64((bits) mod 64)),scratch else if ((bits) mod 64) = 32 xchg rl,rh else if ((bits) mod 64) > 0 mov scratch,rl shrd rl,rh,((bits) mod 64) shrd rh,scratch,((bits) mod 64) end if } macro shl64 rl,rh,bits { if ((bits) mod 64) > 32 mov rh,rl shl rh,(((bits) mod 64)32) xor rl,rl else if ((bits) mod 64) = 32 mov rh,rl xor rl,rl else if ((bits) mod 64) > 0 shld rh,rl,((bits) mod 64) shl rl,((bits) mod 64) end if } macro shr64 rl,rh,bits { if ((bits) mod 64) > 32 mov rl,rh shr rl,(((bits) mod 64)32) xor rh,rh else if ((bits) mod 64) = 32 mov rl,rh xor rh,rh else if ((bits) mod 64) > 0 shrd rl,rh,((bits) mod 64) shr rh,((bits) mod 64) end if } Last edited by revolution on 05 Jan 2008, 05:38; edited 1 time in total 

05 Jan 2008, 04:36 

AlexP 05 Jan 2008, 05:00
"The hash library"?? Well, he is using pretty much the same exact instructions (and order ?!) that I used. I tried to make a macro, but I just replaced it with two instructions per rotate and two or three for a shift in the GPR's.. Yeah, that's exactly what I had in mine lol.. Save the low value, sh*d, then another sh*d with the saved value... I just ran a processor identifier thing for my intel and it says I've got SSE3 but not SSE4. I'll check out the instruction manuals again tomorrow morning about how that stuff works with numbers. Thanks for tracking down that code, if I can't get the SSE workin on it then I'll have to debug my crazy shifting tomorrow ... I think the entire schedule is off though, I'm attempting SHA512. I got all the 32bit versions of the SHA family done, except SHA0, so all I need to do is figure out how to shift and rotate in 64bit mode (at the same time make it easy to debug )... If u want me to post the scrap of code here that I used for shifting/rot, I will when I get up in about 8 or 9 hours. Good night!


05 Jan 2008, 05:00 

revolution 05 Jan 2008, 05:36
AlexP wrote: "The hash library"?? Strange coincidence, because the macros up there were also used for my SHA512 code which I did ~3 years ago. 

05 Jan 2008, 05:36 

AlexP 05 Jan 2008, 16:38
lol I had macros running for me too, but I've never used IF structures in FASM, so I just inlinedit when FASM said "cannnot cmp 32,6" . Well, a few weeks ago I made a nice schedule for me, and I've got some projects planned to do. The first is a hash library, the second a symmetrical crypt library, and the third is asymmetrical (probably just RSA and maybe something from RC). I've gotten the SHA family almost done except for this small chunk of code, the only problem is that my functions don't accept zerolength hashes ... Could u post the code here that you used for SHA??? When I get my 512 done I'll post it here too.. Here's the revised code with your macros.
Revision 3 of code, with comments for people Code: ;**************************************** ; SHA384/512 Exported Functions ;**************************************** SHA384: mov [RequestedSize],384 jmp @f SHA512: mov [RequestedSize],512 @@: push ebp mov ebp,esp pusha call PreProcess cmp eax,0 jnz SHAerror xor edi,edi .A: ;Begin first message schedule push edi mov esi,[Base_Address] add esi,edi mov edi,MessageSch mov ecx,0x20 rep movsd lea esi,[MessageSch+128] ;After first 16 words mov [Temp3],esi add [Temp3],512 ;End of message schedule xor ebp,ebp @@: ;Begin second message schedule ;Ror 19 mov eax,[esi16] mov ebx,[esi12] ror64 eax,ebx,19,edi ;Ror 61 mov ecx,[esi16] mov edx,[esi12] ror64 ecx,edx,61,edi ;Xor them xor eax,ecx xor ebx,edx ;Shr 6 mov ecx,[esi16] mov edx,[esi12] shr64 ecx,edx,6 ;Xor them xor eax,ecx xor ebx,edx ;Now add w[t7] add eax,[esi56] adc ebx,[esi52] ;Store in temp vars mov [Temp1],eax mov [Temp2],ebx ;Now compute 0^0 of w[t15] mov eax,[esi120] mov ebx,[esi116] ror64 eax,ebx,1,edi mov ecx,[esi120] mov edx,[esi116] ror64 ecx,edx,8,edi xor eax,ecx xor ebx,edx mov ecx,[esi120] mov edx,[esi116] shr64 ecx,edx,7 xor eax,ecx xor ebx,edx add eax,[Temp1] ;0^0 is in regs,previous is in temps adc ebx,[Temp2] ;Done with 0^0, now add w[t16] add eax,[esi128] adc ebx,[esi124] ;Done with main schedule loop mov [esi],eax mov [esi+4],ebx add esi,8 cmp esi,[Temp3] jnz @b xor esi,esi pop edi call SHAhighInitializeVars ;End Schedule @@: ;Begin main loop ;Regs: ;EaxEdx are scratch ;Esi is the message schedule offset, unused here in main loop ;Edi is scratch for rotations/shifting ;Ebp is the round number 0>79 ;Esp is stack pointer ;Calculate T1 variable ;Equation: h+E^1(e) + Ch(e,f,g) + K[round] + W[round] ;Calculate E^1 summation of e ;Equation: Ror14(e) xor Ror18(e) xor Ror41(e) ;Ror by 14 mov eax,[BE] mov ebx,[BE+4] ror64 eax,ebx,14,edi ;Ror by 18 mov ecx,[BE] mov edx,[BE+4] ror64 ecx,edx,18,edi ;Xor the first two values xor eax,ecx xor ebx,edx ;Ror by 41 mov ecx,[BE] mov edx,[BE+4] ror64 ecx,edx,41,edi ;Xor by the last value xor eax,ecx xor ebx,edx ;Add h to this add eax,[BHH] adc ebx,[BHH+4] ;Now calculate CH(e,f,g) ;Equation: (e and f) xor (not e and g) mov ecx,[BE] mov edx,[BE+4] and ecx,[BF] and edx,[BF+4] ;Not enough regs, so push these push ecx edx mov ecx,[BE] mov edx,[BE+4] not ecx not edx and ecx,[BG] and edx,[BG+4] ;xor by the old regs on stack xor ecx,[esp] xor edx,[esp+4] ;Get the old regs off the stack add esp,8 add eax,ecx adc ebx,edx ;Add K offset from K80 constant table add eax,[ebp*8+K80] adc ebx,[ebp*8+K80+4] ;Add Message Schedule offset from MessageSch table add eax,[MessageSch+ebp*8] adc ebx,[MessageSch+ebp*8+4] ;And store them in the final T1 variable mov [BT1],eax mov [BT1+4],ebx ;Now find the T2 variable ;Equation: E^0(a) + Maj(a,b,c) ;First, E^0 of a ;Equation: Ror28(a) xor Ror34(a) xor Ror39(a) ;Ror by 28 mov eax,[BA] mov ebx,[BA+4] ror64 eax,ebx,28,edi ;Ror by 34 mov ecx,[BA] mov edx,[BA+4] ror64 ecx,edx,34,edi ;Xor the first two values xor eax,ecx xor ebx,edx ;Now ror by 39 mov ecx,[BA] mov edx,[BA+4] ror64 ecx,edx,39,edi ;Xor by the last value xor eax,ecx xor ebx,edx ;Now find Maj(a,b,c) ;Equation: (a and b) xor (a and c) xor (b and c) mov ecx,[BA] mov edx,[BA+4] and ecx,[BB] and edx,[BB+4] ;Again, need more registers push ecx edx mov ecx,[BA] mov edx,[BA+4] and ecx,[BC] and edx,[BC+4] ;Xor by the old regs xor ecx,[esp] xor edx,[esp+4] ;Get old regs off stack add esp,8 ;Need these regs push ecx edx mov ecx,[BB] mov edx,[BB+4] and ecx,[BC] and edx,[BC+4] ;Xor by old regs xor ecx,[esp] xor edx,[esp+4] ;Get rid of old regs add esp,8 ;Done with Maj(a,b,c), now add it to the current val add eax,ecx adc ebx,edx ;And store them in the T2 variable mov [BT2],eax mov [BT2+4],ebx ;Here are the final moves in the main loop ;H=G mov eax,[BG] mov ebx,[BG+4] mov [BHH],eax mov [BHH+4],ebx ;G=F mov eax,[BF] mov ebx,[BF+4] mov [BG],eax mov [BG+4],ebx ;F=E mov eax,[BE] mov ebx,[BE+4] mov [BF],eax mov [BF+4],ebx ;E=D+T1 ;NOTE: This is not working correctly mov eax,[BD] mov ebx,[BD+4] add eax,[BT1] adc ebx,[BT1+4] mov [BE],eax mov [BE+4],ebx ;D=C mov eax,[BC] mov ebx,[BC+4] mov [BD],eax mov [BD+4],ebx ;C=B mov eax,[BB] mov ebx,[BB+4] mov [BC],eax mov [BC+4],ebx ;B=A mov eax,[BA] mov ebx,[BA+4] mov [BB],eax mov [BB+4],ebx ;A=T1+T2 ;NOTE: This is not working correctly mov eax,[BT1] mov ebx,[BT1+4] add eax,[BT2] adc ebx,[BT2+4] mov [BA],eax mov [BA+4],ebx ;End of main loop inc ebp cmp ebp,80 jnz @b ;Add the current hash value to the previous call SHAhighAddResults add edi,128 dec [NumBlocks] jnz .A mov edi,[OutData] mov esi,BH0 mov ecx,16 cmp [RequestedSize],384 jne @f sub ecx,4 ;And then move the final hash value to the out location @@: rep movsd call EndProcess cmp eax,0 jnz SHAerror popa pop ebp ret 0xC The template works, the variable switching at the end works, except I failed at getting the T1 and T2 equations working. They take up the bulk of my code, and due to a lack of GPR's, I use memory locations sometimes. I'll get around to just pushing a few registers before these main equations happen so I don't have to do this. ... I'm working on simplifying the code right now, should be done in an hour or so. Everything but the two equations for the T1 and T2 variables are working, this may be because of the shifting logic (would explain it, but then it means more problems with the message schedule). I'm using some pretty messy logic when I keep track of the registers, I am hoping that once I get this all figured out I'll use the stack for the AH working variables to slim down memory costs. I commented most of the equations and such to make it easier to read, I already found and fixed a pair of very bad bugs, but it didn't solve it. I keep on looking through it and debugging, but I can't find the problem... If you would like a copy and paste try just ask for the full code. 

05 Jan 2008, 16:38 

revolution 14 Jan 2008, 09:22
AlexP: I can' t post my existing SHA code. Although written by me it is not owned by me, my company retains copyright. The macros I posted above are as far as I can go with posting code.
But looking at your code it is not all that clear. In my code I used macros named things like: sigma0, sigma1, alpha0, alpha1, cho, maj. Then a round macro and finally combined then into a short piece of code that is mostly macros that does the hashing. I find the macro names provide a clear overview of the process that is taking place in the main loop. 

14 Jan 2008, 09:22 

AlexP 18 Jan 2008, 23:20
Okay, I just noticed this post. I thought I had found the problem with my 64bit code, but I have yet to fully debug it. Thanks though!


18 Jan 2008, 23:20 

< Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992023, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.