For low level programming, you often have to wait for a specific time to let the hardware do its thing. Using a CPU loop for that is generally a very bad idea, because CPU clock is not fixed, meaning your delay will vary on different machines.
What one can do instead, is checking the PS/2 control port, which has a bit that changes every 15 usecs. Count 67 of that, and you'll get a close approximate to 67 * 15 = 1005 usecs = 1 millisecond.
; delay 1 millisec using PS/2
mov cx, 67 ; loop counter
in al, 61h ; read in PS/2 port
and al, 10h ; get the oscillating bit
mov ah, al ; save current value
@@: in al, 61h ; read in PS/2 port again
and al, 10h
cmp al, ah ; did it change?
je @b
mov ah, al ; save current value
pause ; let the CPU rest a bit to avoid overheat
dec cx
jnz @b ; loop if there's more
This works perfectly on real hardware and in most emulators, except the most common one (*khm* qemu).
So the next best thing you can do is using the PIT, which oscillates at 1,193,182 Hz, so 1193 cycles gives ca. 1 millisec. One could set up a one-shoot timer, but that's tricky and needs working interrupt handlers. The good news is, you can just poll the PIT registers instead:
; delay 1 millisec using PIT
mov cx, 1193 ; loop counter
xor al, al ; read in counter value into bx
out 43h, al ; send PIT the counter read command
in al, 40h ; read in low byte
mov bl, al
in al, 40h ; read in high byte
mov bh, al
add bx, cx ; add the required amount
@@: xor al, al ; read in again into ax
out 43h, al
in al, 40h
mov ah, al ; low byte to ah
in al, 40h
xchg al, ah ; high byte in al, so swap
pause ; let the CPU rest a bit to avoid overheat
cmp ax, bx ; if current < start + delay, go again
jb @b
To wait more than one millisec, either multiply the constant in cx, or simply call the above code in a loop (it's more bullet-proof because cx might overflow).
Hope this helps.