flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
revolution 24 Jun 2010, 03:37
edemko: Are you referring to some particular piece of code? Where do you see the often used align's?
|
|||
![]() |
|
edemko 24 Jun 2010, 07:47
people may bore, i won't revo
revolution, there is a man on wasm.ru, he says: "When all else fails use solder". |
|||
![]() |
|
revolution 24 Jun 2010, 08:27
edemko wrote: people may bore, i won't revo edemko wrote: revolution, there is a man on wasm.ru, he says: "When all else fails use solder". Anyhow, I have no idea what this topic is about. Care to enlighten us? If not, then I can move it to heap? |
|||
![]() |
|
edemko 24 Jun 2010, 09:02
it's personal programing style with those aligns often
delete it there is nothing to talk about thanks |
|||
![]() |
|
revolution 24 Jun 2010, 09:17
Moving >>> deleting.
But, explaining >>> moving. Can you show some code to illustrate what you are asking? |
|||
![]() |
|
ManOfSteel 24 Jun 2010, 09:22
It's so elementary, I don't understand how people still have doubts about this one! Once and for all, you use the align directive when you need to cause an eclipse. See?
Now go read the fabulous manual to make sure you don't miss anything important. |
|||
![]() |
|
edemko 24 Jun 2010, 09:47
Code: =1= ... B400: neg ecx ; Same with AlignmentDispatchNT: PICREFERENCE AlignmentDispatchNT, RP, 03H, 9CH, 83H jmp ebx ENDIF align 16 C100: ; Code for aligned src. SSE2 or later instruction set ; The nice case, src and dest have same alignment. ; Loop. ecx has negative index from the end, counting up to zero movaps xmm0, [esi+ecx] movaps xmm1, [esi+ecx+10H] ... =2= ... ret GL_Paint ENDP ALIGN 4 MainWinProc PROC hWnd:HWND,uMsg:UINT,wParam:WPARAM,lParam:LPARAM LOCAL crect:RECT LOCAL ps:PAINTSTRUCT ... =3= ... ;standard dialog proc function align 4 DlgProcAta proc hWin:HWND,uMsg:UINT,wParam:WPARAM,lParam:LPARAM mov eax,uMsg ... =4= ... ;Create an object. And store pointer to it align 16 BDO_CreateStandart proc BDO_Address:DWORD mov ESI, [ESP+DWORD] ; BDO_Address ; mov EBX, SizeOf(BDO_Class)/DWORD ... =5= ... ;=========================================================================================== ; Définitions des datas chaines / variables initialisées ou non initialisées / structures ;------------------------------------------------------------------------------------------- section '.data' data readable writeable ClassName db ' AsmGges Win32', 0 WinCaption db ' Snake v.04 Fasm assembler', 0 AppName db ' AsmGges Snake',0 LimitsMsg db '< Snake outside > ',0 BiteMsg db '< Bitten snake > ',0 MazeMsg db '< Snake in wall > ',0 ScoreMsg db 'Score: Time: mn s ',0 align 4 SnakeFile db 'res\snake.bmp',0 align 4 SysFile db 'res\sys.bmp',0 align 4 ScoreFile db 'res\snake.sna',0 align 4 szSound3 db 'sound\pop.wav',0 align 4 szSound6 db 'sound\end.wav',0 align 4 szSound7 db 'sound\tankyou.wav',0 align 4 MidiFile db 'sound\music.mid',0 align 4 ... =6= ... align 4 ;max transfer rates for UDMA modes szUdmaTransf0 db "16.7",0 szUdmaTransf1 db "25.0",0 szUdmaTransf2 db "33.3",0 szUdmaTransf3 db "44.4",0 szUdmaTransf4 db "66.7",0 szUdmaTransf5 db "100.0",0 szUdmaTransf6 db "133.0",0 szUdmaModeStr db "%d: Timing %d ns; Max. Transf. Rate %s Mb/s",0 ; Standard ATA/ATAPI-%d align 4 ;max transfer rates for PIO modes szPioTransf0 db "3.3",0 szPioTransf1 db "5.2",0 szPioTransf2 db "8.3",0 szPioTransf3 db "IORDY 11.1",0 szPioTransf4 db "IORDY 16.6",0 ;use szUdmaModeStr instead of szPioModeStr ;szPioModeStr db "PIO mode %d: Cycle Time %d ns; Max Transfer Rate %s Mb/s",0 align 4 ;media types szRemovable db "Removable",0 szFixed db "Fixed",0 szFloppy db "Floppy Disk",0 szCleaner db "Drive Cleaner",0 szUnknown db "Unknown",0 szCDROM db "CD-ROM",0 szCDR db "CD-Recordable (Write Once)",0 szCDRW db "CD-Rewriteable",0 szDVDROM db "DVD-ROM",0 szDVDR db "DVD-Recordable (Write Once)",0 szDVDRW db "DVD-Rewriteable",0 szIomZip db "Iomega Zip Drive",0 szIomJaz db "Iomega Jaz Drive",0 szMagnetic db "Magnetic Disk",0 szDVDRAM db "DVD-RAM",0 ;gen strings align 4 szRamDrive db "RAM Drive",0 szRemoteDrive db "Remote Drive",0 szNotSupport db "Not supported",0 szChsGeom db "%lu X %lu X %lu",0 szNum db "%lu",0 szScsiInqHead db "%08X";"%08Xh",0 szHexNum db "%08Xh",0 szLun db "%03d",0 szPad9Int db "%09lu",0 szBusVersion db "%lu.%04lu",0 szAdvapi db "advapi32.dll",0 szChkToken db "CheckTokenMembership",0 szNoAdmin db "You don't have Administrator rights. Some informatin will be unavailable.",0 ;bus types strings align 4 szScsi db "SCSI",0 ;it's best to add " bus" to each of the following szAtapi db "ATAPI",0 szAta db "ATA",0 sz1394 db "1394",0 szSsa db "SSA",0 szFiber db "FIBER",0 szUsb db "USB",0 szRaid db "RAID",0 align 4 ;device types szController db "Controller Device",0 szDiskDev db "Disk Device",0 szFileSys db "File System Device",0 szScsiDev db "SCSI Device",0 szVirtualDisk db "Virtual Disk Device",0 szMassStor db "Mass Storage Device",0 szDVD db "DVD Device",0 ;SCSI-2 device types szDirAccess db "Direct Access Device",0 ;00 szTape db "Tape Device",0 ;01 szPrinter db "Printer Device",0 ;02 szProcessor db "Processor Device",0 ;03 szWorm db "WORM Device",0 ;04 szCdrom db "CDROM Device",0 ;05 szScanner db "Scanner Device",0 ;06 szOptical db "Optical Disk",0 ;07 szMediaChng db "Media Changer",0 ;08 szComm db "Comm. Device",0 ;09 szAscit8 db "ASCIT8",0 ;0A szAscit80 db "ASCIT8",0 ;0B szArray db "Array Device",0 ;0C szEnclosure db "Enclosure Device",0 ;0D szRbc db "RBC Device",0 ;0E ;"Unknown Device" // 0x0F szDeviceNum db " (%08Xh)",0 align 4 szTab1 db "ATA ID",0;"ATA Information",0 szTab2 db "Storage Query ID",0;"Storage Query Property",0 szTab3 db "SCSI Inquiry",0;"SCSI Information",0 szTab4 db "SCSI Miniport",0;"SMART Identify",0 szTab5 db "Optical Devices",0 ... =7= ... align 16 WndProc: ; RCX=hWnd, EDX=uMsg, R8=wParam, R9=lParam cmp edx,WM_CREATE jz WndProc_CREATE cmp edx,WM_COMMAND jz WndProc_COMMAND cmp edx,WM_SIZE jz WndProc_SIZE cmp edx,WM_WINDOWPOSCHANGED jz WndProc_WINDOWPOSCHANGED cmp edx,WM_CLOSE jz WndProc_CLOSE cmp edx,WM_DESTROY jz WndProc_DESTROY align 16 ; next routine is called very frequently, we should increase it's speed a bit by align 16 ; routine common for both WndProc_DefFrameProcA as well WndProc_Def WndProc_DefFrameProcA: WndProc_Def: push rcx rdx r8 r9 sub rsp,8*(4+1) mov qword [rsp+8*4],r9 ; lParam mov r9,r8 ; wParam mov r8d,edx ; uMsg mov rdx,qword [hWndClient] ; hWndMDIClient ; mov rcx,qword [rsp+8*(4+1+3)] ; rcx=hWnd now call qword [DefFrameProcA] WndProc_COMMAND_end: add rsp,8*(4+1) pop r9 r8 rdx rcx ret align 16 WndProc_CREATE: push rcx rdx r8 r9 sub rsp,8*(4+11) ... =8= ... align 16 CMD_RUN: ; the easiest to do, just order DBG_CONTINUE to debug loop and resume thread owning debug loop mov eax,DBG_CONTINUE CMD_RUN_common_begin: push rcx rdx r8 r9 sub rsp,8*(4+1) ... =9= ... align 16 ten_powers dt 1.0 dt 0.1 dt 0.01 dt 0.001 dt 0.0001 dt 0.00001 dt 0.000001 dt 0.0000001 dt 0.00000001 dt 0.000000001 dt 0.0000000001 dt 0.00000000001 dt 0.000000000001 dt 0.0000000000001 dt 0.00000000000001 dt 0.000000000000001 dt 0.0000000000000001 dt 0.00000000000000001 dt 0.000000000000000001 dw 0 ; for align only ... |
|||
![]() |
|
revolution 24 Jun 2010, 09:55
There are two competing things to consider when using align's like your example above.
1) Align bloats the code and fills the precious caches faster, potentially harming performance. 2) Align allows the x86 decoder to more easily generate uops when instruction don't overlap a boundary, potentially improving performance. It is very situation specific as to what you want to do and depends upon what you want to achieve. Although a good rule of thumb would probably be: If you don't know for sure that align is beneficial in your code, then it probably isn't something you should waste time caring about. |
|||
![]() |
|
edemko 24 Jun 2010, 10:09
Intel says dwords and other Xwords aligned improve readability as the cpu needs twice less requests to fetch the data. It is ok, speculative SSE instructions are ok too. Copying memory we can always use mov ecx,esi(edi) // and ecx,sizeof.Xword-1 // rep movsb // rep movsX //etc thus there is no reason to align strings: clever algo will fix that.
|
|||
![]() |
|
edemko 24 Jun 2010, 10:13
procs: why does align apply them
|
|||
![]() |
|
revolution 24 Jun 2010, 10:18
edemko wrote: procs: why does align apply them |
|||
![]() |
|
edfed 24 Jun 2010, 11:08
i think about a feature about align.
it can be possible, for speed and performance tests, to align every instuctions on Dword boundary (filling spare byte with nop) and then, see if a part of code can be faster if every instuctions are aligned. intel tells that nops can take 0 clock to execute in some cicrumstances, making the idea not so dumb. if 3 bytes instructions streams are filled with nops to align every instruction, maybe code will be faster.... a sort of directive can be insterresting. one to activate the instructions alignment, and one to desactivate it. for example: Code: instalign 4 mov eax,[234234] in al,dx uninstalign i don't know exactly the encoding of instructions, but i think there are a lot of non byte/word/dword instructions in IA32 |
|||
![]() |
|
edemko 24 Jun 2010, 11:19
Code: instro instro ... etc more instro align Xword ? |
|||
![]() |
|
edemko 24 Jun 2010, 11:23
Quote: Why to use it so often: seeing such a code i remember a "Humorous web-pages" topic: instros come ODD & EVEN making a 50/50 balance. It concerns many of you. I understand some places need that but. Sleepy as i am? eh |
|||
![]() |
|
revolution 24 Jun 2010, 11:36
Aligning to 4 bytes is pointless. 16 bytes would be more sensible since the decoders are based upon 16 byte chunks. If you can get three instructions into each 16 byte chunk then the AMD chips can perform quite well, but extra nop's still take time to decode even if execution time is zero so liberally throwing in nop's would probably not be best unless you really know the internal structure of the decoder well. But then the Intel chips behave differently, some of them can decode 4 instructions at a time so putting 4 instructions into a 16 byte block can be beneficial there.
There is no one universal solution to a problem. You have to trade things off. IMO you will probably waste more time playing with it than you can ever hope to save by getting align's and nop's in perfect synchronism for all situations. |
|||
![]() |
|
sinsi 24 Jun 2010, 11:50
All you have to do is go to the masm32 forum, look in the laboratory and see how timings fluctuate even with running the code in a million loops.
Sometimes "align 4" is good, "align 8" is good etc. The amount of task switching in todays OS's kills the idea of caches anyway. 64 bytes? pfff! Some functions that were aligned to 16 and had inner loops aligned to 4 were a lot faster, but generally code alignment isn't a big deal. Data alignment is though, as is stack alignment in win64 (yuk). Some sse instructions assume alignment too. |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.