flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
revolution 08 Mar 2008, 06:35
I suggest you just go ahead and code up all the solutions you have presented above and time it in the real situation that you want it to run. There is more to optimisation than just looking at code in the screen, so many interactions between sections inside, and outside, the CPU have a very significant effect on the runtime. You can't look at a piece of code in isolation and say it is fast. There are too many dependencies upon other things.
Also look at how much time you are spending on optimising and compare that to how time will be saved when the program is running. If you spend 1 month optimising and only save seconds in runtime then your time probably has not been utilised efficiently. The only caveat to that is if your code has a specific deadline where it must produce results within a certain time limit, then of course you have to optimise to meet the demand. But situations like this are rare so it is unlikely you will find this requirement. |
|||
![]() |
|
AlexP 08 Mar 2008, 16:20
Okay, from the start I've had many problems with endian-ness:
When moving from memory to a register, it goes from 1,2,3,4 to 2,1,4,3. When moving from register to memory, it goes from 1,2,3,4 to 4,3,2,1 When using an operation (like xor) from register to memory, the memory operand goes from 1,2,3,4 in memory to 4,3,2,1 before the operation. When using the bswap instruction (register), it goes from 1,2,3,4 to 4,3,2,1. Everytime I receive user input I almost always have to perform bswap to fix it out before I even think of working with it. I see two different types of byte-swapping in my computer, are my results normal? Can someone please tell me why I have both middle-endian AND little-endian-ness in my computer? It is very hard to work with, Last edited by AlexP on 08 Mar 2008, 17:14; edited 1 time in total |
|||
![]() |
|
revolution 08 Mar 2008, 17:09
It might be worth mentioning that what I mentioned above only applies to modern CPU super scalar architecture. If you are programming for some type of embedded system (which would be likely if you have a deadline to meet with the code timing) then that particular CPU may have more predicable software performance characteristics. eg. An 80386sx embedded system will be much easier to optimise for, even without running the code or actually doing timing tests. But still timing tests should be done to verify that your code is running as expected and required.
Last edited by revolution on 08 Mar 2008, 17:23; edited 1 time in total |
|||
![]() |
|
revolution 08 Mar 2008, 17:18
AlexP wrote: Okay, from the start I've had many problems with endian-ness: Many of the hashing and encryption algos specify the byte order as big-endian. So to solve the problem just after loading the data from memory do a bswap. When working with the data internally always use little-endian, this is the native CPU format. Then just before storing the finalised value do a bswap to put it back to big-endian. This should save you many headaches and wasted time wondering about what goes where. So do like this:
|
|||
![]() |
|
AlexP 08 Mar 2008, 17:21
Thanks, I thought I had a perfectly normal Pentium laptop with little-endianness, but apparently I've got, uhh, what you said. I have no deadlines, but what I do have is time.
What this applies to is my attempt at taking a byte[] from memory, using it to create another byte[] after all operations are done with. The only problem is that when I ever get user input, I almost always have to swap it and then end up with the wrong type of array, I don't see any way to get it right... |
|||
![]() |
|
revolution 08 Mar 2008, 17:28
AlexP wrote: Thanks, I thought I had a perfectly normal Pentium laptop with little-endianness, but apparently I've got, uhh, what you said. I have no deadlines, but what I do have is time. I expect your laptop, if it is under ~8-10 years old, will be super-scaler and thus trickier to optimise for. As for the byte[] thing you describe I am not sure what you are trying to say/ask. If you algo is primarily a big-endian algo then have have little choice but to do bswaps. But you mention byte arrays where the endian thing doesn't make sense. |
|||
![]() |
|
AlexP 08 Mar 2008, 17:32
Yeah, I have to receive a byte[] from the user, then perform my operations to create a new byte[]. This new byte[] will be used by another one of my algorithms, and it is just easier for this new byte[] to be (a byte[] ), so my operations in the second algorithm can just use memory operations instead of lodsd, then performing the operation.
So far I've not found any easy way to do this, so I'm sticking with bswap'ing the input before using it. Thanks for insights, I will definitely research those things when I have time. |
|||
![]() |
|
revolution 08 Mar 2008, 17:39
I think you will need to provide details of you algo, we are talking about in circles. You need to be more specific about why a byte array needs any endian-ness attached. Endian-ness would imply you are using, at a minimum, word arrays, and more probably I expect you are using dword arrays.
|
|||
![]() |
|
AlexP 08 Mar 2008, 18:17
Okay, here. You may recognize it, I have been studying many implementations of AES and am trying to combine the best of them.
Code: ; ----------------------------------------------------------------------------- ; ; aes encryption - user interface ; ----------------------------------------------------------------------------- aes_encuser: ; preserve registers sub esp, aes_stack_space mov [esp+12], esi mov [esp+ 8], edi mov [esp+ 4], ebp mov [esp+ 0], ebx ; obtain input parameter cld mov esi, [esp+aes_stack_space+aes_encrypt_in_blk] ; lodsd mov edi, aes_state.state ; stosd ; format user input to little-endian architectures, first key addition rept 4 column { lodsd xor eax, d[aes_state.keys+(column-1)*4] stosd } ; perform the 13 normal rounds if aes_encrypt_unroll rept 13 round { table_lookup_key_add (aes_state.state),(aes_state.state),((round)*16) } else mov edx, 13 mov edi, aes_state.keys+16 ; add edi,16? @@: table_lookup_key_add (aes_state.state),(aes_state.state),edi add edi, 4*4 dec edx jnz @b end if ; final round and key addition table_lookup_final (aes_state.state),(esp+aes_stack_space+aes_encrypt_out_blk),(224) ; restore registers and return mov ebx, [esp+ 0] mov ebp, [esp+ 4] mov edi, [esp+ 8] mov esi, [esp+12] add esp, aes_stack_space ret 8 Here is the table lookup macros, pretty much the ones you used. Code: ; ----------------------------------------------------------------------------- ; ; core aes routines ; ----------------------------------------------------------------------------- macro table_lookup_key_add source,dest,key { rept 4 column { movzx eax, b[source+((column-1) mod 4)*4+0] movzx ebx, b[source+((column+0) mod 4)*4+1] movzx ecx, b[source+((column+1) mod 4)*4+2] mov eax, [eax*4+aes_tables.tab_t1] xor eax, [ebx*4+aes_tables.tab_t2] movzx ebx, b[source+((column+2) mod 4)*4+3] xor eax, [ecx*4+aes_tables.tab_t3] xor eax, [(column-1)*4+key] xor eax, [ebx*4+aes_tables.tab_t4] mov d[dest+(column-1)*4], eax \} } ; stosd . . . / sub edi,16 macro table_lookup_final source,dest,key { rept 4 column { movzx eax, b[source+((column+1) mod 4)*4+2] movzx ebx, b[source+((column+2) mod 4)*4+3] movzx ecx, b[source+((column-1) mod 4)*4+0] movzx edx, b[source+((column+0) mod 4)*4+1] mov al, b[eax+aes_tables.tab_s] ; xlatb gives more instructions... mov ah, b[ebx+aes_tables.tab_s] shl eax, 16 mov al, b[ecx+aes_tables.tab_s] mov ah, b[edx+aes_tables.tab_s] xor eax, [(column-1)*4+key] mov [dest+(column-1)*4], eax \} } ; stosd . . . / sub edi,16 Hope it helps, the key schedule I am using is a dword[] (reversed in memory). I have chosen this path so when I use memory operands for xor operations it translates to normal. PS: In the user interface, when the first key addition happens I need to insert "bswap eax". Then it will work. (lol I've said that hundreds of times, rarely ever happens.) |
|||
![]() |
|
revolution 08 Mar 2008, 18:33
AlexP wrote: Hope it helps, the key schedule I am using is a dword[] (reversed in memory). I have chosen this path so when I use memory operands for xor operations it translates to normal. One golden rule that I perhaps should have mentioned earlier, and I assumed you had already applied, is "get it working, then get it fast". |
|||
![]() |
|
AlexP 08 Mar 2008, 19:25
Thanks, I'll do that. Replace the input to the main algorithm with whatever is supposed to go there, to see if it works before worrying about all the formatting and stuff.
|
|||
![]() |
|
AlexP 09 Mar 2008, 07:42
Thank you revolution, I took (both) your advice, and I finally hit that F7 and it worked. It's taken many dozens of hours (mostly spend reading and researching AES), and many more to finally get something working. This is definitely the best project I've worked on so far, and now all I have to do is have fun making it better.
Right now, it's a hybrid between good coding practice, specification, and table usage. I feel that the encryption algo is as good as I need it for now, though it took me awhile to figure out why, in your version, you switched the "source" and "destination" between every round, then I saw I was overwriting what I needed! Works great now. I'm also planning to buy the official AES book, made by the designers and containing information that I REALLY want to learn. At a whopping $50, I'll have to get a little more money until I can order it. Thanks again! |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.