flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
Remy Vincent 10 Apr 2008, 00:03
These kind of optimization could very very easily be done with just an ADA directive... Why don't you wait for "high diploma people" to organize ADA new reunions and decide new directives compiler ...
|
|||
![]() |
|
bitRAKE 10 Apr 2008, 14:21
I'm assuming the prefix should preceed the instruction...
66 00 01 02 03 04 05 06 ...and not... 00 01 02 03 04 05 06 66 My AMD optimizations haven't ventured into the FPU arena - I wasn't even aware of this use of 66. At the time AMD's FPU implementation was much better than Intel's - since they brought over the guys from Digital Equipment Corporation that worked on the Alpha. Last edited by bitRAKE on 10 Apr 2008, 14:25; edited 1 time in total |
|||
![]() |
|
AlexP 10 Apr 2008, 14:23
Hmm, I've seen 0x66 mentioned as padding, I've seen it literally in the middle of 0x90 bytes for padding, but I have nothing to contribute to this discussion.
|
|||
![]() |
|
revolution 10 Apr 2008, 14:37
I've always thought that optimising nops (I'm referring to opcode 0x90, not the prefix 0x66) was a bit pointless. Back in the (good? bad?) old 8086 days nops were used for delays, and of course there was no sense in optimising delays. Now fast forward a few years to modern CPUs, we use nops for alignment before entering a loop, so the loop should be optimised to death and the pre-nops are kind of like the .001% icing on the cake.
|
|||
![]() |
|
bitRAKE 10 Apr 2008, 14:54
AMD also recommended using all kinds of "NOP" to reduce false dependancies with adjacent code.
Code: ;MOV REG, REG ;XCHG REG, REG ;CMOVcc REG, REG ;SHR REG, 0 ;SAR REG, 0 ;SHL REG, 0 ;SHRD REG, REG, 0 ;SHLD REG, REG, 0 ;LEA REG, [REG] ;LEA REG, [REG+00] ;LEA REG, [REG*1+00] ;LEA REG, [REG+00000000] ;LEA REG, [REG*1+00000000] NOP2_EAX EQU <DB 8Bh,0C0h> ;MOV EAX, EAX NOP2_ECX EQU <DB 8Bh,0C9h> ;MOV ECX, ECX NOP2_EDX EQU <DB 8Bh,0D2h> ;MOV EDX, EDX NOP2_EBX EQU <DB 8Bh,0DBh> ;MOV EBX, EBX NOP2_ESP EQU <DB 8Bh,0E4h> ;MOV ESP, ESP NOP2_EBP EQU <DB 8Bh,0EDh> ;MOV EBP, EBP NOP2_ESI EQU <DB 8Bh,0F6h> ;MOV ESI, ESI NOP2_EDI EQU <DB 8Bh,0FFh> ;MOV EDI, EDI ; No SIB byte, source in ModR/M byte: ;NOP2_EAX EQU <DB 8Dh,00h> ; lea eax, [eax] ;NOP2_ECX EQU <DB 8Dh,09h> ; lea ecx, [ecx] ;NOP2_EDX EQU <DB 8Dh,12h> ; lea edx, [edx] ;NOP2_EBX EQU <DB 8Dh,1Bh> ; lea ebx, [ebx] ;NOP2_ESI EQU <DB 8Dh,36h> ; lea esi, [esi] ;NOP2_EDI EQU <DB 8Dh,3Fh> ; lea edi, [edi] ; SIB byte to select source: NOP3_EAX EQU <DB 8Dh,04h,20h> ;LEA EAX, [EAX] NOP3_ECX EQU <DB 8Dh,0Ch,21h> ;LEA ECX, [ECX] NOP3_EDX EQU <DB 8Dh,14h,22h> ;LEA EDX, [EDX] NOP3_EBX EQU <DB 8Dh,1Ch,23h> ;LEA EBX, [EBX] NOP3_ESI EQU <DB 8Dh,24h,24h> ;LEA ESP, [ESP] NOP3_EDI EQU <DB 8Dh,34h,26h> ;LEA ESI, [ESI] NOP3_ESP EQU <DB 8Dh,3Ch,27h> ;LEA EDI, [EDI] ; No SIB byte, but add signed byte: ;NOP3_EAX EQU <DB 8Dh,40h,00h> ; lea eax, [eax+00] ;NOP3_ECX EQU <DB 8Dh,49h,00h> ; lea ecx, [ecx+00] ;NOP3_EDX EQU <DB 8Dh,52h,00h> ; lea edx, [edx+00] ;NOP3_EBX EQU <DB 8Dh,5Bh,00h> ; lea ebx, [ebx+00] NOP3_EBP EQU <DB 8Dh,6Dh,00h> ; lea ebp, [ebp+00] ;NOP3_ESI EQU <DB 8Dh,76h,00h> ; lea esi, [esi+00] ;NOP3_EDI EQU <DB 8Dh,7Fh,00h> ; lea edi, [edi+00] ; SIB byte, and add signed byte: NOP4_EAX EQU <DB 8Dh,44h,20h,0> ;lea eax, [00][eax] NOP4_ECX EQU <DB 8Dh,4Ch,21h,0> ;lea ecx, [00][ecx] NOP4_EDX EQU <DB 8Dh,54h,22h,0> ;lea edx, [00][edx] NOP4_EBX EQU <DB 8Dh,5Ch,23h,0> ;lea ebx, [00][ebx] NOP4_ESP EQU <DB 8Dh,64h,24h,0> ;lea esp, [00][esp] NOP4_EBP EQU <DB 8Dh,6Ch,25h,0> ;lea ebp, [00][ebp] NOP4_ESI EQU <DB 8Dh,74h,26h,0> ;lea esi, [00][esi] NOP4_EDI EQU <DB 8Dh,7Ch,27h,0> ;lea edi, [00][edi] ;NOP5_EAX EQU <TEST EAX, 0FFFF0000h> ;NOP5_EAX EQU <CMP EAX, 0FFFF0000h> ; No SIB byte, but add signed dword: NOP6_EAX EQU <DB 8Dh,080h,0,0,0,0> ;lea eax, [eax+00000000] NOP6_EBX EQU <DB 8Dh,09Bh,0,0,0,0> ;lea ebx, [ebx+00000000] NOP6_ECX EQU <DB 8Dh,089h,0,0,0,0> ;lea ecx, [ecx+00000000] NOP6_EDX EQU <DB 8Dh,092h,0,0,0,0> ;lea edx, [edx+00000000] NOP6_ESI EQU <DB 8Dh,0B6h,0,0,0,0> ;lea esi, [esi+00000000] NOP6_EDI EQU <DB 8Dh,0BFh,0,0,0,0> ;lea edi, [edi+00000000] NOP6_EBP EQU <DB 8Dh,0ADh,0,0,0,0> ;lea ebp, [ebp+00000000] ; SIB byte, and add signed dword NOP7_EAX EQU <DB 8Dh,084h,20h,0,0,0,0> ;lea eax, [00000000][eax] NOP7_ECX EQU <DB 8Dh,08Ch,21h,0,0,0,0> ;lea ecx, [00000000][ecx] NOP7_EDX EQU <DB 8Dh,094h,22h,0,0,0,0> ;lea edx, [00000000][edx] NOP7_EBX EQU <DB 8Dh,09Ch,23h,0,0,0,0> ;lea ebx, [00000000][ebx] NOP7_ESP EQU <DB 8Dh,0A4h,24h,0,0,0,0> ;lea esp, [00000000][esp] NOP7_EBP EQU <DB 8Dh,0ACh,25h,0,0,0,0> ;lea ebp, [00000000][ebp] NOP7_ESI EQU <DB 8Dh,0B4h,26h,0,0,0,0> ;lea esi, [00000000][esi] NOP7_EDI EQU <DB 8Dh,0BCh,27h,0,0,0,0> ;lea edi, [00000000][edi] ; SIB byte adding signed dword: ;NOP7_EAX EQU <DB 8Dh,04h,05h,0,0,0,0> ;LEA EAX, [][EAX+00000000] ;NOP7_ECX EQU <DB 8Dh,0Ch,0Dh,0,0,0,0> ;LEA ECX, [][ECX+00000000] ;NOP7_EDX EQU <DB 8Dh,14h,15h,0,0,0,0> ;LEA EDX, [][EDX+00000000] ;NOP7_EBX EQU <DB 8Dh,1Ch,1Dh,0,0,0,0> ;LEA EBX, [][EBX+00000000] ;NOP7_EBP EQU <DB 8Dh,2Ch,2Dh,0,0,0,0> ;LEA EBP, [][EBP+00000000] ;NOP7_ESI EQU <DB 8Dh,34h,35h,0,0,0,0> ;LEA ESI, [][ESI+00000000] ;NOP7_EDI EQU <DB 8Dh,3Ch,3Dh,0,0,0,0> ;LEA EDI, [][EDI+00000000] ![]() |
|||
![]() |
|
revolution 10 Apr 2008, 15:05
bitRAKE wrote:
Code: nop5 equ db 03eh,08dh,044h,020h,000h ;lea eax,ds:[eax+000h] - uses SIB |
|||
![]() |
|
edfed 10 Apr 2008, 15:48
nops but ... change flags.
Code: xor eax,0 and eax not 0 or eax,0 add eax,0 imul eax,1 ;to wait a long time idiv eax,1 ; to wait a very long time sub eax,0 jmp $+1 jmp $+2 |
|||
![]() |
|
revolution 10 Apr 2008, 15:52
edfed wrote:
|
|||
![]() |
|
edfed 10 Apr 2008, 16:16
sorry, i wanted to say:
jmp $+3 i never use nops... |
|||
![]() |
|
revolution 10 Apr 2008, 16:21
edfed wrote:
|
|||
![]() |
|
edfed 10 Apr 2008, 16:40
![]() i read somewhere the $+3 will generate a jmp word and $+2 will generate a jmp byte. |
|||
![]() |
|
vid 10 Apr 2008, 17:37
If you want "jmp word" use "jmp word", don't expect assembler to generate any particular form. FASM can choose any form for given mnemonics, even 32bit offset would be good result for "jmp $+2"
|
|||
![]() |
|
bitRAKE 10 Apr 2008, 21:23
revolution wrote:
|
|||
![]() |
|
rugxulo 14 Apr 2008, 21:57
Kuemmel wrote:
Uh, even my "new" AMD64x2 laptop (bought in August) only has family "0x0F". So this optimization (unless it also works on older machines) would be very very fringe. |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.