flat assembler
Message board for the users of flat assembler.

Index > Main > align

Author
Thread Post new topic Reply to topic
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 24 Jun 2010, 02:50
Why to use it so often: seeing such a code i remember a "Humorous web-pages" topic: instros come ODD & EVEN making a 50/50 balance. It concerns many of you. I understand some places need that but. Sleepy as i am?
Post 24 Jun 2010, 02:50
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20571
Location: In your JS exploiting you and your system
revolution 24 Jun 2010, 03:37
edemko: Are you referring to some particular piece of code? Where do you see the often used align's?
Post 24 Jun 2010, 03:37
View user's profile Send private message Visit poster's website Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 24 Jun 2010, 07:47
people may bore, i won't revo
revolution, there is a man on wasm.ru, he says: "When all else fails use solder".
Post 24 Jun 2010, 07:47
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20571
Location: In your JS exploiting you and your system
revolution 24 Jun 2010, 08:27
edemko wrote:
people may bore, i won't revo
Erm, I don't get you?
edemko wrote:
revolution, there is a man on wasm.ru, he says: "When all else fails use solder".
Sure, but he ain't me. And I only use solder when I want to burn my fingers (again).

Anyhow, I have no idea what this topic is about. Care to enlighten us? If not, then I can move it to heap?
Post 24 Jun 2010, 08:27
View user's profile Send private message Visit poster's website Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 24 Jun 2010, 09:02
it's personal programing style with those aligns often
delete it there is nothing to talk about
thanks
Post 24 Jun 2010, 09:02
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20571
Location: In your JS exploiting you and your system
revolution 24 Jun 2010, 09:17
Moving >>> deleting.

But, explaining >>> moving.

Can you show some code to illustrate what you are asking?
Post 24 Jun 2010, 09:17
View user's profile Send private message Visit poster's website Reply with quote
ManOfSteel



Joined: 02 Feb 2005
Posts: 1154
ManOfSteel 24 Jun 2010, 09:22
It's so elementary, I don't understand how people still have doubts about this one! Once and for all, you use the align directive when you need to cause an eclipse. See?

Now go read the fabulous manual to make sure you don't miss anything important.
Post 24 Jun 2010, 09:22
View user's profile Send private message Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 24 Jun 2010, 09:47
Code:
=1=
...
B400:   neg     ecx

        ; Same with AlignmentDispatchNT:        
        PICREFERENCE AlignmentDispatchNT, RP, 03H, 9CH, 83H
        jmp     ebx        
ENDIF

align   16
C100:   ; Code for aligned src. SSE2 or later instruction set
        ; The nice case, src and dest have same alignment.

        ; Loop. ecx has negative index from the end, counting up to zero
        movaps  xmm0, [esi+ecx]
        movaps  xmm1, [esi+ecx+10H]
...








=2=
...
 ret
GL_Paint ENDP


ALIGN 4

MainWinProc PROC hWnd:HWND,uMsg:UINT,wParam:WPARAM,lParam:LPARAM
LOCAL crect:RECT
LOCAL ps:PAINTSTRUCT
...








=3=
...
;standard dialog proc function
align 4
DlgProcAta proc hWin:HWND,uMsg:UINT,wParam:WPARAM,lParam:LPARAM

        mov             eax,uMsg
...







=4=
...
;Create an object. And store pointer to it
align 16
BDO_CreateStandart      proc    BDO_Address:DWORD
                mov             ESI, [ESP+DWORD]        ; BDO_Address
;               mov             EBX, SizeOf(BDO_Class)/DWORD
...







=5=
...
;===========================================================================================
; Définitions des datas chaines / variables initialisées ou non initialisées / structures
;-------------------------------------------------------------------------------------------
section '.data' data readable writeable

  ClassName    db ' AsmGges Win32', 0
  WinCaption   db ' Snake v.04 Fasm assembler', 0
  AppName      db ' AsmGges Snake',0
  LimitsMsg    db '< Snake outside >  ',0
  BiteMsg      db '< Bitten snake >  ',0
  MazeMsg      db '< Snake in wall >  ',0
  ScoreMsg     db 'Score:     Time:    mn   s ',0
  align 4

  SnakeFile    db 'res\snake.bmp',0
  align 4
  SysFile      db 'res\sys.bmp',0
  align 4
  ScoreFile    db 'res\snake.sna',0
  align 4

  szSound3     db 'sound\pop.wav',0
  align 4
  szSound6     db 'sound\end.wav',0
  align 4
  szSound7     db 'sound\tankyou.wav',0
  align 4
  MidiFile     db 'sound\music.mid',0
  align 4
...









=6=
...
align 4
;max transfer rates for UDMA modes
szUdmaTransf0   db "16.7",0
szUdmaTransf1   db "25.0",0
szUdmaTransf2   db "33.3",0
szUdmaTransf3   db "44.4",0
szUdmaTransf4   db "66.7",0
szUdmaTransf5   db "100.0",0
szUdmaTransf6   db "133.0",0
szUdmaModeStr   db "%d: Timing %d ns; Max. Transf. Rate %s Mb/s",0      ; Standard ATA/ATAPI-%d

align 4
;max transfer rates for PIO modes
szPioTransf0    db "3.3",0
szPioTransf1    db "5.2",0
szPioTransf2    db "8.3",0
szPioTransf3    db "IORDY 11.1",0
szPioTransf4    db "IORDY 16.6",0
;use szUdmaModeStr instead of szPioModeStr
;szPioModeStr   db "PIO mode %d: Cycle Time %d ns; Max Transfer Rate %s Mb/s",0

align 4
;media types
szRemovable             db "Removable",0
szFixed                 db "Fixed",0
szFloppy                db "Floppy Disk",0
szCleaner               db "Drive Cleaner",0
szUnknown               db "Unknown",0
szCDROM                 db "CD-ROM",0
szCDR                   db "CD-Recordable (Write Once)",0
szCDRW                  db "CD-Rewriteable",0
szDVDROM                db "DVD-ROM",0
szDVDR                  db "DVD-Recordable (Write Once)",0
szDVDRW                 db "DVD-Rewriteable",0
szIomZip                db "Iomega Zip Drive",0
szIomJaz                db "Iomega Jaz Drive",0
szMagnetic              db "Magnetic Disk",0
szDVDRAM                db "DVD-RAM",0
;gen strings
align 4
szRamDrive              db "RAM Drive",0
szRemoteDrive   db "Remote Drive",0
szNotSupport    db "Not supported",0
szChsGeom               db "%lu X %lu X %lu",0
szNum                   db "%lu",0
szScsiInqHead   db "%08X";"%08Xh",0
szHexNum                db "%08Xh",0
szLun                   db "%03d",0
szPad9Int               db "%09lu",0
szBusVersion    db "%lu.%04lu",0
szAdvapi                db "advapi32.dll",0
szChkToken              db "CheckTokenMembership",0
szNoAdmin               db "You don't have Administrator rights. Some informatin will be unavailable.",0
;bus types strings
align 4
szScsi                  db "SCSI",0     ;it's best to add " bus" to each of the following
szAtapi                 db "ATAPI",0
szAta                   db "ATA",0
sz1394                  db "1394",0
szSsa                   db "SSA",0
szFiber                 db "FIBER",0
szUsb                   db "USB",0
szRaid                  db "RAID",0
align 4
;device types
szController    db "Controller Device",0
szDiskDev               db "Disk Device",0
szFileSys               db "File System Device",0
szScsiDev               db "SCSI Device",0
szVirtualDisk   db "Virtual Disk Device",0
szMassStor              db "Mass Storage Device",0
szDVD                   db "DVD Device",0
;SCSI-2 device types
szDirAccess             db "Direct Access Device",0 ;00
szTape                  db "Tape Device",0              ;01
szPrinter               db "Printer Device",0   ;02
szProcessor             db "Processor Device",0 ;03
szWorm                  db "WORM Device",0              ;04
szCdrom                 db "CDROM Device",0             ;05
szScanner               db "Scanner Device",0   ;06
szOptical               db "Optical Disk",0             ;07
szMediaChng             db "Media Changer",0    ;08
szComm                  db "Comm. Device",0             ;09
szAscit8                db "ASCIT8",0                   ;0A
szAscit80               db "ASCIT8",0                   ;0B
szArray                 db "Array Device",0             ;0C
szEnclosure             db "Enclosure Device",0 ;0D
szRbc                   db "RBC Device",0               ;0E
;"Unknown Device"        // 0x0F
szDeviceNum             db " (%08Xh)",0

align 4
szTab1                  db "ATA ID",0;"ATA Information",0
szTab2                  db "Storage Query ID",0;"Storage Query Property",0
szTab3                  db "SCSI Inquiry",0;"SCSI Information",0
szTab4                  db "SCSI Miniport",0;"SMART Identify",0
szTab5                  db "Optical Devices",0
...








=7=
...
align 16
WndProc:
; RCX=hWnd, EDX=uMsg, R8=wParam, R9=lParam
        cmp     edx,WM_CREATE
        jz      WndProc_CREATE
        cmp     edx,WM_COMMAND
        jz      WndProc_COMMAND
        cmp     edx,WM_SIZE
        jz      WndProc_SIZE
        cmp     edx,WM_WINDOWPOSCHANGED
        jz      WndProc_WINDOWPOSCHANGED
        cmp     edx,WM_CLOSE
        jz      WndProc_CLOSE
        cmp     edx,WM_DESTROY
        jz      WndProc_DESTROY

align 16
; next routine is called very frequently, we should increase it's speed a bit by align 16
; routine common for both WndProc_DefFrameProcA as well WndProc_Def
WndProc_DefFrameProcA:
WndProc_Def:
        push    rcx rdx r8 r9
        sub     rsp,8*(4+1)
        mov     qword [rsp+8*4],r9              ; lParam
        mov     r9,r8                           ; wParam
        mov     r8d,edx                         ; uMsg
        mov     rdx,qword [hWndClient]          ; hWndMDIClient
;       mov     rcx,qword [rsp+8*(4+1+3)]       ; rcx=hWnd now
        call    qword [DefFrameProcA]
WndProc_COMMAND_end:
        add     rsp,8*(4+1)
        pop     r9 r8 rdx rcx
        ret

align 16
WndProc_CREATE:
        push    rcx rdx r8 r9
        sub     rsp,8*(4+11)
...








=8=
...
align 16
CMD_RUN:
; the easiest to do, just order DBG_CONTINUE to debug loop and resume thread owning debug loop
        mov     eax,DBG_CONTINUE
CMD_RUN_common_begin:
        push    rcx rdx r8 r9
        sub     rsp,8*(4+1)
...








=9=
...
align 16
ten_powers      dt      1.0
                dt      0.1
                dt      0.01
                dt      0.001
                dt      0.0001
                dt      0.00001
                dt      0.000001
                dt      0.0000001
                dt      0.00000001
                dt      0.000000001
                dt      0.0000000001
                dt      0.00000000001
                dt      0.000000000001
                dt      0.0000000000001
                dt      0.00000000000001
                dt      0.000000000000001
                dt      0.0000000000000001
                dt      0.00000000000000001
                dt      0.000000000000000001

                dw      0                       ; for align only
...

    
Post 24 Jun 2010, 09:47
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20571
Location: In your JS exploiting you and your system
revolution 24 Jun 2010, 09:55
There are two competing things to consider when using align's like your example above.

1) Align bloats the code and fills the precious caches faster, potentially harming performance.
2) Align allows the x86 decoder to more easily generate uops when instruction don't overlap a boundary, potentially improving performance.

It is very situation specific as to what you want to do and depends upon what you want to achieve. Although a good rule of thumb would probably be: If you don't know for sure that align is beneficial in your code, then it probably isn't something you should waste time caring about.
Post 24 Jun 2010, 09:55
View user's profile Send private message Visit poster's website Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 24 Jun 2010, 10:09
Intel says dwords and other Xwords aligned improve readability as the cpu needs twice less requests to fetch the data. It is ok, speculative SSE instructions are ok too. Copying memory we can always use mov ecx,esi(edi) // and ecx,sizeof.Xword-1 // rep movsb // rep movsX //etc thus there is no reason to align strings: clever algo will fix that.
Post 24 Jun 2010, 10:09
View user's profile Send private message Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 24 Jun 2010, 10:13
procs: why does align apply them
Post 24 Jun 2010, 10:13
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20571
Location: In your JS exploiting you and your system
revolution 24 Jun 2010, 10:18
edemko wrote:
procs: why does align apply them
Because of the instruction decoder. It reads stuff from cache in 16byte chunks. If instructions overlap a boundary then the decoder needs an extra cycle to decode the full instruction. Also the target of a call, or jmp, might be in the middle of a 16byte chunk and create extra cycles to read subsequent instructions from cache. Anyhow, these are very esoteric things that would be tuned with lots of experimentation to get the right balance. But also note that each and every CPU model behaves differently, some may be faster in one way while others might be faster in another way. It all depends upon what you are trying to achieve.
Post 24 Jun 2010, 10:18
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4354
Location: Now
edfed 24 Jun 2010, 11:08
i think about a feature about align.

it can be possible, for speed and performance tests, to align every instuctions on Dword boundary (filling spare byte with nop) and then, see if a part of code can be faster if every instuctions are aligned.

intel tells that nops can take 0 clock to execute in some cicrumstances, making the idea not so dumb.

if 3 bytes instructions streams are filled with nops to align every instruction, maybe code will be faster....

a sort of directive can be insterresting.

one to activate the instructions alignment, and one to desactivate it.

for example:
Code:

instalign 4

mov eax,[234234]
in al,dx

uninstalign
    

i don't know exactly the encoding of instructions, but i think there are a lot of non byte/word/dword instructions in IA32
Post 24 Jun 2010, 11:08
View user's profile Send private message Visit poster's website Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 24 Jun 2010, 11:19
Code:
instro
instro
...
etc
more instro
align Xword
?
    
Post 24 Jun 2010, 11:19
View user's profile Send private message Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 24 Jun 2010, 11:23
Quote:
Why to use it so often: seeing such a code i remember a "Humorous web-pages" topic: instros come ODD & EVEN making a 50/50 balance. It concerns many of you. I understand some places need that but. Sleepy as i am?

eh
Post 24 Jun 2010, 11:23
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20571
Location: In your JS exploiting you and your system
revolution 24 Jun 2010, 11:36
Aligning to 4 bytes is pointless. 16 bytes would be more sensible since the decoders are based upon 16 byte chunks. If you can get three instructions into each 16 byte chunk then the AMD chips can perform quite well, but extra nop's still take time to decode even if execution time is zero so liberally throwing in nop's would probably not be best unless you really know the internal structure of the decoder well. But then the Intel chips behave differently, some of them can decode 4 instructions at a time so putting 4 instructions into a 16 byte block can be beneficial there.

There is no one universal solution to a problem. You have to trade things off. IMO you will probably waste more time playing with it than you can ever hope to save by getting align's and nop's in perfect synchronism for all situations.
Post 24 Jun 2010, 11:36
View user's profile Send private message Visit poster's website Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 794
Location: Adelaide
sinsi 24 Jun 2010, 11:50
All you have to do is go to the masm32 forum, look in the laboratory and see how timings fluctuate even with running the code in a million loops.
Sometimes "align 4" is good, "align 8" is good etc. The amount of task switching in todays OS's kills the idea of caches anyway. 64 bytes? pfff!

Some functions that were aligned to 16 and had inner loops aligned to 4 were a lot faster, but generally code alignment isn't a big deal.
Data alignment is though, as is stack alignment in win64 (yuk). Some sse instructions assume alignment too.
Post 24 Jun 2010, 11:50
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.