flat assembler
Message board for the users of flat assembler.
Index
> Windows > Need higher analysis of function Goto page Previous 1, 2 |
Author |
|
AlexP 24 Dec 2007, 02:19
lol It must have worked for you, but with my Intel i thought it was LE. The least sig byte is supposed to be first in memory for LE, am I right? Don't need to use that much, in the case of your algo it must of preferred else. I believe I need to change it so the byte loads into the upper bits of the regs. Thanks for help and code, the initialization routine works fine I found out Now I get to debug the entire thing! At least I get until next Tuesday off from school, if I get Rijndael up and running by then I'll have fun with it. By the way, do you think I should use CFB or OFB mode? Both do not require the Rijndael Decryption routine to do the actual decryption, and the best advice I have found elsewhere is to use CFB for some advantage or another. You know more about this than I, what do you think? -Anyone out there too please
http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation http://www.progressive-coding.com/tutorial.php?id=3 |
|||
24 Dec 2007, 02:19 |
|
revolution 24 Dec 2007, 02:33
CFB, OFB? Depends on what you want to achieve. Resistance to certain types of attack, ease of error correction, ease of error detection, is it needed for streaming, is it needed for files, etc.
Personally I think CFB is nicer than OFB. But then I also think CBC is nicer than CFB. In my opinion having separated encrypt and decrypt functions leads to less problems later. |
|||
24 Dec 2007, 02:33 |
|
AlexP 24 Dec 2007, 02:40
Well, apparently the IV in both CFB and OFB need to be the same for encryption/decryption, am I right? I also read that the IV should be randomly generated and attached to the end of the encrypted data. does this sound like a good standard for me to adopt for AES-specific? The decryption algorithm for my code is barely longer, just a couple of lines at the end of the key expansion algo. My plan is to use a quick+fast CFB mode for AES, which is seeded with SHA-256 or 512, depending on what I feel like accomplishing that day
EDIT: Got key schedule debugged, might spend some time looking for any possible way 2 optimize. I noticed that my implementation is smaller than the one you posted earlier, I'll quick post mine here. If anyone can see any places where I can slim down, please point them out. Code: ;************************************** ; Key Expansion ;On Stack: *Input Key,*OutputKey,Cipher,bool Decrypt ;************************************** RD_Expand_Key: push ebp mov ebp,esp push eax ebx ecx edx edi esi mov ebx, [ebp+0x8] ;Pointer to input key mov edx, [ebp+0xC] ;Pointer to output key mov ecx, [ebp+0x10] ;Cipher identifier ;Set variables for cipher cmp ecx,0x80 ;128 bit key jnz Test192 mov [Nk],4 mov [Nr],10 jmp AfterTest Test192: cmp ecx,0xC0 ;192 bits jnz Test256 mov [Nk],6 mov [Nr],12 jmp AfterTest Test256: cmp ecx,0x100 ;256 bits jnz Enda mov [Nk],8 mov [Nr],14 AfterTest: ;Place input into first Nk bytes of expanded key mov ecx,[Nk] mov edi,edx mov esi,ebx rep movs DWORD [edi],DWORD [esi] mov eax,[Nk] shl eax,2 mov [NkT4],eax mov ebx,[Nr] inc ebx ; lea ebx,[ebx*4] ;Nb*(Nr+1)*4 lea ebx,[ebx*4] ; BeginWhile: ;Last key entry -> ecx sub eax,4 mov ecx,[edx+eax] add eax,4 push edx push eax xor edx,edx div [NkT4] mov edi,edx mov esi,eax pop eax pop edx cmp edi,0 jne ElseIf rol ecx,8 call SubWord dec esi xor ecx,[RCon+esi*4] ElseIf: cmp edi,16 jne EndIf cmp [Nk],8 jne EndIf call SubWord EndIf: mov edi,eax sub eax,[NkT4] mov eax,[edx+eax] xor eax,ecx mov [edx+edi],eax mov eax,edi add eax,4 cmp eax,ebx jne BeginWhile cmp dword [ebp+0x14],0 je Enda ;Make decryption round keys instead ;Apply u-table xor'ing to each key except first and last ;esi is the counter (by ones) ;edi is the byte offset for current word of key schedule ;Skip the first and last round keys ;A round key is static 16 bytes!!! (4 words) ;That's because there's 4 columns in state, Nb for AES mov esi,1 mov edi,edx add edi,16 ;Skip first round key StartDecryptKeys: movzx eax,byte[edi] movzx ebx,byte[edi+1] movzx ecx,byte[edi+2] movzx edx,byte[edi+3] mov eax,[eax*4+u1] xor eax,[ebx*4+u2] xor eax,[ecx*4+u3] xor eax,[edx*4+u4] mov [edi],eax add edi,16 inc esi cmp esi,[Nr] ;Skip last round key jne StartDecryptKeys Enda: pop esi edi edx ecx ebx eax ebp ret 0x10 The subword routine works, haven't debugged the decryption keys though. I do not know what the outcome is supposed to be, but I followed your implementation so I believe it should work -?? A little messy coding here and there, but I narrowed it down quite well I think Here's the SubWord, I think it's best possible I decided agains the mov and mask method I used earlier, and decided to push the dword into memory so the bytes can be accessed there one by one. -ECX contains dword to be substituted- Code: SubWord: push eax ebx edx ecx movzx eax,byte[esp] movzx ebx,byte[esp+1] movzx ecx,byte[esp+2] movzx edx,byte[esp+3] mov al,byte[SBox+eax] mov ah,byte[SBox+ebx] mov bl,byte[SBox+ecx] mov bh,byte[SBox+edx] movzx ecx,bx shl ecx,16 or ecx,eax add esp,0x4 pop edx ebx eax ret EDIT: Hey, I've got some pretty bad corruption in my code. I try to debug it, and about halfway through the encryption routine there's messed up instructions, and suddenly 0000 machine code that keeps runnin' on. Here's the files, please tell me how this dll is messing up! |
|||
24 Dec 2007, 02:40 |
|
AlexP 24 Dec 2007, 15:55
-double post, i know- Someone please look at this! I have no idea why the dll's code is messing up like that, it just drops off randomly in the middle of a function. Please help!
For Revolution: I got the cipher running now, and the table offsets are the same as the documentation says, but every time I run it it does not pick out the right dword from the tables, or something because it's not working. I have all the stuff in front of me on a giant whiteboard, and the program took the bytes out of the state as I predicted them to according to the documentation. Whenever I times those by four and add them to the table offset, I get a dword that does not match up with the example vectors. In your example, you did three of the xor's, then xor'd with the round key, then finished off with the last xor. I do not do it that way, in fact I noticed that I can just xor the entire state every round, not every loop of that macro. I'll try to find someone who has example vectors for the T tables, but please help me if you can! I am constantly glancing over to fasm to see if anyone can reply to me, so I should respond within minutes. |
|||
24 Dec 2007, 15:55 |
|
handyman 25 Dec 2007, 01:20
Check your pushes and pops for balance. The SubWord: and SubWordB: pushes and pops are not balanced. You have one more push then pop so you are doing a return to some oddball point in memory (to whatever is in EAX), plus the values in the popped registers are probably are not what you want since the balance is not there: i.e. the ecx push is kind of gumming up the works since it is the last thing pushed on the stack and not the first popped off.
HINT: if your program goes weird, make sure your not goofing up the stack. Just noticed the esp adjust in code, so this wasn't it. I hate direct stack adjustment manipulations. |
|||
25 Dec 2007, 01:20 |
|
AlexP 25 Dec 2007, 03:13
If you look above in SubWord, four registers are pushed. ECX is pushed last becaue that is the register I am working with, then before the pops I add 4 to esp to get it off the stack and preserving the value. Kind of complicated, but it does work. I think the same goes for SubWordB, thanks a lot for looking at it! I'm copying the same code from Revolution's table lookup routine, down to the bit. It's not posted here yet, there have been many changes,but I decided to keep it simple and use a macro for now, to be made into a little function later if possible. The diagonal-linearity of the state's table lookups in Revolution's code was off by one, but I took his word that the code works so I am trying it both ways. Nothing has worked so far, the initial AddRoundKey is working, the key schedule has been perfected for all modes, no memory leakage so far, and the only thing left is about a ten line code that adds an offset to the table, which is copied almost directly from Revolution's code. Sry for late posting, had to go to church. Merry Christmas!!
Got it HandyMan- Would you mind looking at something else? Okay, here it is anyway.. This is code scraps from Revolution's version Code: macro encrypt_block blks,keys,rounds { align 16 encrypt_block_#blks#_#keys#: if rounds>0 define_shifts blks keyaddition blks mov edx,(rounds-1) add edi,blks*4 .a: tablelookup_key_add blks,esi,ebp,0 xchg esi,ebp add edi,blks*4 sub edx,1 jnz .a tablelookupf_key_add blks,esi,ebp,0 if (rounds and 1) copy_data blks,ebp,esi end if ret end if } The only thing of importance is the tablelookup_key_add macro, here: Code: macro tablelookup_key_add blks,source,dest,key_off { repeat blks movzx eax,b[source+((0000+%-1) mod blks)*4+0] movzx ebx,b[source+((sft1+%-1) mod blks)*4+1] movzx ecx,b[source+((sft2+%-1) mod blks)*4+2] mov eax,[eax*4+RD_tables.t1] xor eax,[ebx*4+RD_tables.t2] movzx ebx,b[source+((sft3+%-1) mod blks)*4+3] xor eax,[ecx*4+RD_tables.t3] xor eax,[edi+(%-1)*4+key_off] xor eax,[ebx*4+RD_tables.t4] mov [dest+(%-1)*4],eax end repeat } He sais that this works, and here is my version: Code: macro encrypt_round { repeat 4 movzx eax,byte[State+((0000+%-1) mod 4)*4+0] movzx ebx,byte[State+((0001+%-1) mod 4)*4+1] movzx ecx,byte[State+((0002+%-1) mod 4)*4+2] mov eax,[eax*4+t1] xor eax,[ebx*4+t2] movzx ebx,byte[State+((0003+%-1) mod 4)*4+3] xor eax,[ecx*4+t3] xor eax,[edi+(%-1)*4+edx] xor eax,[ebx*4+t4] mov [State+(%-1)*4],eax end repeat } The source has been replaced with State, blks is replaced with 4, and KeyOff is replaced with edx. Everything else is the same, the constansts sft1 ect... are the number they say in my case, it should work. The tables are the same and everything, except when I run the macro code it produces results other than the documentation. If you would like the entire program that runs the code so you can mess with it, please feel free to say so. Thanks again |
|||
25 Dec 2007, 03:13 |
|
handyman 25 Dec 2007, 03:43
I ran original code submitted prior to by last post in Olly and it blew up at
Code: CipherLoop: movzx eax,BYTE[State+edi+0] at this point edi was 415010 and State was 415000 so combined pointer ended up with a value like 830010 which is way outside of the defined data section. I am running Windows 98 so it may be operating a bit different than XP or Vista as far as when things seem to blowup. I just get a memory access violation and everything stops in Olly. on macro question, are you sure a that all values (whether in register or variable) used are identical during test of code between what you trying to do and what was supplied as example? Only way to be sure is to get value list of original and then compare that against what is being supplied by changed code. |
|||
25 Dec 2007, 03:43 |
|
AlexP 25 Dec 2007, 05:21
Okay: Why does this little macro not work?
Code: con = 0x100000000 macro SHAadd 1,2 { mov eax,((1+2)mod con) } I have to pass it two memory addresses every time it's called, but it never works... Lol I just noticed that I don't need to mod every single addition by 2^32. Took me a while to realize win32 regs do it for me ... i had made an entire macro with like 10 div instructions I thought I was forced to use. |
|||
25 Dec 2007, 05:21 |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.