flat assembler
Message board for the users of flat assembler.

Index > Windows > Need higher analysis of function

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20298
Location: In your JS exploiting you and your system
revolution 24 Dec 2007, 02:01
Your code must be working in BE fashion. My code works in LE fashion. I don't necessarily follow the books exactly. I usually make sure I understand the underlying algorithm and then code it up in the way that makes the most sense to me based upon the processor I am using. Besides, the documentation does not dictate any internal representation requirements. Implementors are free to do whatever they like as long as it produces the right output.

If you find my code is wrong for your purposes then go ahead and change it to suit your following functions that need the data in BE format.

You can use it in any way you like, I posted it here for everyone to use.
Post 24 Dec 2007, 02:01
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 24 Dec 2007, 02:19
lol It must have worked for you, but with my Intel i thought it was LE. The least sig byte is supposed to be first in memory for LE, am I right? Don't need to use that much, in the case of your algo it must of preferred else. I believe I need to change it so the byte loads into the upper bits of the regs. Thanks for help and code, the initialization routine works fine I found out Smile Now I get to debug the entire thing! At least I get until next Tuesday off from school, if I get Rijndael up and running by then I'll have fun with it. By the way, do you think I should use CFB or OFB mode? Both do not require the Rijndael Decryption routine to do the actual decryption, and the best advice I have found elsewhere is to use CFB for some advantage or another. You know more about this than I, what do you think? -Anyone out there too please

http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
http://www.progressive-coding.com/tutorial.php?id=3
Post 24 Dec 2007, 02:19
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20298
Location: In your JS exploiting you and your system
revolution 24 Dec 2007, 02:33
CFB, OFB? Depends on what you want to achieve. Resistance to certain types of attack, ease of error correction, ease of error detection, is it needed for streaming, is it needed for files, etc.

Personally I think CFB is nicer than OFB. But then I also think CBC is nicer than CFB. In my opinion having separated encrypt and decrypt functions leads to less problems later.
Post 24 Dec 2007, 02:33
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 24 Dec 2007, 02:40
Well, apparently the IV in both CFB and OFB need to be the same for encryption/decryption, am I right? I also read that the IV should be randomly generated and attached to the end of the encrypted data. does this sound like a good standard for me to adopt for AES-specific? The decryption algorithm for my code is barely longer, just a couple of lines at the end of the key expansion algo. My plan is to use a quick+fast CFB mode for AES, which is seeded with SHA-256 or 512, depending on what I feel like accomplishing that day Smile

EDIT: Got key schedule debugged, might spend some time looking for any possible way 2 optimize. I noticed that my implementation is smaller than the one you posted earlier, I'll quick post mine here. If anyone can see any places where I can slim down, please point them out.

Code:
;**************************************
;           Key Expansion
;On Stack: *Input Key,*OutputKey,Cipher,bool Decrypt
;**************************************
RD_Expand_Key:
        push ebp
        mov ebp,esp
        push eax ebx ecx edx edi esi
        mov ebx, [ebp+0x8] ;Pointer to input key
        mov edx, [ebp+0xC] ;Pointer to output key
        mov ecx, [ebp+0x10] ;Cipher identifier
        ;Set variables for cipher
        cmp ecx,0x80  ;128 bit key
        jnz Test192
        mov [Nk],4
        mov [Nr],10
        jmp AfterTest
        Test192:
        cmp ecx,0xC0  ;192 bits
        jnz Test256
        mov [Nk],6
        mov [Nr],12
        jmp AfterTest
        Test256:
        cmp ecx,0x100 ;256 bits
        jnz Enda
        mov [Nk],8
        mov [Nr],14
        AfterTest:
        ;Place input into first Nk bytes of expanded key
        mov ecx,[Nk]
        mov edi,edx
        mov esi,ebx
        rep movs DWORD [edi],DWORD [esi]

        mov eax,[Nk]
        shl eax,2
        mov [NkT4],eax
        mov ebx,[Nr]
        inc ebx         ;
        lea ebx,[ebx*4] ;Nb*(Nr+1)*4
        lea ebx,[ebx*4] ;
        BeginWhile:
        ;Last key entry -> ecx
        sub eax,4
        mov ecx,[edx+eax]
        add eax,4
        push edx
        push eax
        xor edx,edx
        div [NkT4]
        mov edi,edx
        mov esi,eax
        pop eax
        pop edx
        cmp edi,0
        jne ElseIf
        rol ecx,8
        call SubWord
        dec esi
        xor ecx,[RCon+esi*4]
        ElseIf:
        cmp edi,16
        jne EndIf
        cmp [Nk],8
        jne EndIf
        call SubWord
        EndIf:
        mov edi,eax
        sub eax,[NkT4]
        mov eax,[edx+eax]
        xor eax,ecx
        mov [edx+edi],eax
        mov eax,edi
        add eax,4
        cmp eax,ebx
        jne BeginWhile
        cmp dword [ebp+0x14],0
        je Enda
        ;Make decryption round keys instead
        ;Apply u-table xor'ing to each key except first and last
        ;esi is the counter (by ones)
        ;edi is the byte offset for current word of key schedule
        ;Skip the first and last round keys
        ;A round key is static 16 bytes!!! (4 words)
        ;That's because there's 4 columns in state, Nb for AES
        mov esi,1
        mov edi,edx
        add edi,16 ;Skip first round key
        StartDecryptKeys:
        movzx eax,byte[edi]
        movzx ebx,byte[edi+1]
        movzx ecx,byte[edi+2]
        movzx edx,byte[edi+3]
        mov eax,[eax*4+u1]
        xor eax,[ebx*4+u2]
        xor eax,[ecx*4+u3]
        xor eax,[edx*4+u4]
        mov [edi],eax
        add edi,16
        inc esi
        cmp esi,[Nr]  ;Skip last round key
        jne StartDecryptKeys
        Enda:
        pop esi edi edx ecx ebx eax ebp
        ret 0x10
    

The subword routine works, haven't debugged the decryption keys though. I do not know what the outcome is supposed to be, but I followed your implementation so I believe it should work -?? A little messy coding here and there, but I narrowed it down quite well I think

Here's the SubWord, I think it's best possible
I decided agains the mov and mask method I used earlier, and decided to push the dword into memory so the bytes can be accessed there one by one. -ECX contains dword to be substituted-
Code:
SubWord:
        push eax ebx edx ecx
        movzx eax,byte[esp]
        movzx ebx,byte[esp+1]
        movzx ecx,byte[esp+2]
        movzx edx,byte[esp+3]
        mov al,byte[SBox+eax]
        mov ah,byte[SBox+ebx]
        mov bl,byte[SBox+ecx]
        mov bh,byte[SBox+edx]
        movzx ecx,bx
        shl ecx,16
        or ecx,eax
        add esp,0x4
        pop edx ebx eax
        ret
    


EDIT: Hey, I've got some pretty bad corruption in my code. I try to debug it, and about halfway through the encryption routine there's messed up instructions, and suddenly 0000 machine code that keeps runnin' on. Here's the files, please tell me how this dll is messing up!
Post 24 Dec 2007, 02:40
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 24 Dec 2007, 15:55
-double post, i know- Someone please look at this! I have no idea why the dll's code is messing up like that, it just drops off randomly in the middle of a function. Please help!

For Revolution: I got the cipher running now, and the table offsets are the same as the documentation says, but every time I run it it does not pick out the right dword from the tables, or something because it's not working. I have all the stuff in front of me on a giant whiteboard, and the program took the bytes out of the state as I predicted them to according to the documentation. Whenever I times those by four and add them to the table offset, I get a dword that does not match up with the example vectors. In your example, you did three of the xor's, then xor'd with the round key, then finished off with the last xor. I do not do it that way, in fact I noticed that I can just xor the entire state every round, not every loop of that macro. I'll try to find someone who has example vectors for the T tables, but please help me if you can! I am constantly glancing over to fasm to see if anyone can reply to me, so I should respond within minutes.
Post 24 Dec 2007, 15:55
View user's profile Send private message Visit poster's website Reply with quote
handyman



Joined: 04 Jun 2007
Posts: 40
Location: USA - KS
handyman 25 Dec 2007, 01:20
Check your pushes and pops for balance. The SubWord: and SubWordB: pushes and pops are not balanced. You have one more push then pop so you are doing a return to some oddball point in memory (to whatever is in EAX), plus the values in the popped registers are probably are not what you want since the balance is not there: i.e. the ecx push is kind of gumming up the works since it is the last thing pushed on the stack and not the first popped off.

HINT: if your program goes weird, make sure your not goofing up the stack.

Just noticed the esp adjust in code, so this wasn't it. I hate direct stack adjustment manipulations.
Post 25 Dec 2007, 01:20
View user's profile Send private message Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 25 Dec 2007, 03:13
If you look above in SubWord, four registers are pushed. ECX is pushed last becaue that is the register I am working with, then before the pops I add 4 to esp to get it off the stack and preserving the value. Kind of complicated, but it does work. I think the same goes for SubWordB, thanks a lot for looking at it! I'm copying the same code from Revolution's table lookup routine, down to the bit. It's not posted here yet, there have been many changes,but I decided to keep it simple and use a macro for now, to be made into a little function later if possible. The diagonal-linearity of the state's table lookups in Revolution's code was off by one, but I took his word that the code works so I am trying it both ways. Nothing has worked so far, the initial AddRoundKey is working, the key schedule has been perfected for all modes, no memory leakage so far, and the only thing left is about a ten line code that adds an offset to the table, which is copied almost directly from Revolution's code. Sry for late posting, had to go to church. Merry Christmas!!

Got it HandyMan- Would you mind looking at something else?

Okay, here it is anyway.. This is code scraps from Revolution's version

Code:
macro encrypt_block blks,keys,rounds 
{ 
align 16 
encrypt_block_#blks#_#keys#: 
if rounds>0 
        define_shifts blks 
        keyaddition blks 
        mov     edx,(rounds-1) 
        add     edi,blks*4 
.a:     tablelookup_key_add blks,esi,ebp,0 
        xchg    esi,ebp 
        add     edi,blks*4 
        sub     edx,1 
        jnz     .a 
        tablelookupf_key_add blks,esi,ebp,0 
        if (rounds and 1) 
          copy_data blks,ebp,esi 
        end if 
        ret 
end if 
}
    

The only thing of importance is the tablelookup_key_add macro, here:
Code:
macro tablelookup_key_add blks,source,dest,key_off 
{ 
        repeat blks 
        movzx   eax,b[source+((0000+%-1) mod blks)*4+0] 
        movzx   ebx,b[source+((sft1+%-1) mod blks)*4+1] 
        movzx   ecx,b[source+((sft2+%-1) mod blks)*4+2] 
        mov     eax,[eax*4+RD_tables.t1] 
        xor     eax,[ebx*4+RD_tables.t2] 
        movzx   ebx,b[source+((sft3+%-1) mod blks)*4+3] 
        xor     eax,[ecx*4+RD_tables.t3] 
        xor     eax,[edi+(%-1)*4+key_off] 
        xor     eax,[ebx*4+RD_tables.t4] 
        mov     [dest+(%-1)*4],eax 
        end repeat 
}        
    

He sais that this works, and here is my version:
Code:
macro encrypt_round
{ 
        repeat 4
        movzx   eax,byte[State+((0000+%-1) mod 4)*4+0]
        movzx   ebx,byte[State+((0001+%-1) mod 4)*4+1]
        movzx   ecx,byte[State+((0002+%-1) mod 4)*4+2]
        mov     eax,[eax*4+t1]
        xor     eax,[ebx*4+t2]
        movzx   ebx,byte[State+((0003+%-1) mod 4)*4+3]
        xor     eax,[ecx*4+t3]
        xor     eax,[edi+(%-1)*4+edx]
        xor     eax,[ebx*4+t4]
        mov     [State+(%-1)*4],eax
        end repeat 
}
    

The source has been replaced with State, blks is replaced with 4, and KeyOff is replaced with edx. Everything else is the same, the constansts sft1 ect... are the number they say in my case, it should work. The tables are the same and everything, except when I run the macro code it produces results other than the documentation. If you would like the entire program that runs the code so you can mess with it, please feel free to say so. Thanks again
Post 25 Dec 2007, 03:13
View user's profile Send private message Visit poster's website Reply with quote
handyman



Joined: 04 Jun 2007
Posts: 40
Location: USA - KS
handyman 25 Dec 2007, 03:43
I ran original code submitted prior to by last post in Olly and it blew up at

Code:
CipherLoop:
      movzx eax,BYTE[State+edi+0]
    


at this point edi was 415010 and State was 415000 so combined pointer ended up with a value like 830010 which is way outside of the defined data section.

I am running Windows 98 so it may be operating a bit different than XP or Vista as far as when things seem to blowup. I just get a memory access violation and everything stops in Olly.

on macro question, are you sure a that all values (whether in register or variable) used are identical during test of code between what you trying to do and what was supplied as example? Only way to be sure is to get value list of original and then compare that against what is being supplied by changed code.
Post 25 Dec 2007, 03:43
View user's profile Send private message Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 25 Dec 2007, 05:21
Okay: Why does this little macro not work?

Code:
con = 0x100000000
macro SHAadd 1,2
{
        mov eax,((1+2)mod con)
}
    

I have to pass it two memory addresses every time it's called, but it never works...

Lol I just noticed that I don't need to mod every single addition by 2^32. Took me a while to realize win32 regs do it for me Smile... i had made an entire macro with like 10 div instructions I thought I was forced to use.
Post 25 Dec 2007, 05:21
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.