None of both utilities is ultimate !!!
They can only reduce the number of mistakes with unaligned stack,
Any of them isn't able to certainly tell that there isn't any misaligned
prologue anymore.

1.
fta16.exe checks stack disaligment at the time of calling API
rsp must be aligned at dqword
single steping of program is extremely slow process
XP64 is recommended to execute it
Vista is slower than XP because huger code for handling exeptions
as a result, fta16.exe shows the first address found
upto 131072 addresses are stored in *.ftr file in binary format
it is suitable for small programs
it may last too long to test big program

timings:


2.
get fdbg for win64 and extract fdisasm.exe from the archive
fdisasm.exe your_prog.exe
and then walk through your_prog.d64 with text editor

as the first step, check the entry point to stack aligment

then search repetitive until end with this pattern:
sub rsp,

calculate the number of pushed registers and use this 2 formulas to check:

number of pushed registers is power of 2 (e.g. push rcx rdx r8 r9)
sub rsp,xxxxxxx8 is correct

number of pushed registers isn't power of 2 (e.g. push rax rsi rdi)
sub rsp,xxxxxxx0 is correct

fxa16.exe does the job automatically
it checks for sub rsp,...
then calculates the number of preceeding pushes
then verifies if stack matches alignment
if an error is found, offset in *.d64 file and sub rsp instruction is displayed

the check routine is the simplest possible and may be faked by e.g.

1. fake
sub rsp,28
sub rsp,8

2. fake
push rbx
mov rbx,rax
sub rsp,20

for the routine work correctly, any instruction mustn't be between:
last push
sub rsp
