flat assembler
Message board for the users of flat assembler.
Index
> Main > How to recognize whether AL has a zero bit in each tetrad? Goto page 1, 2 Next |
Author |
|
revolution 15 Oct 2017, 10:59
SergeASM wrote: ... faster ... Also, it is probably worthless to test only sample code. You should try to test the entire code and see which option(s) work best when the whole application is running. |
|||
15 Oct 2017, 10:59 |
|
Furs 15 Oct 2017, 12:32
I had to Google what "tetrad" means (maybe I suck ) but this is usually called a "nibble" (4-bits).
The "pattern" here is that either the number mod 16 is 15, or it is higher than 239, it's the only time it will give out the answer "no". I'm not sure if you can use "ah" (i.e. rest of eax) as you didn't mention it, but if you can, let's assume ah is zero, try something like: Code: add eax, 16+1 dec ah and al, ah test al, 15 jz no Then we test for low 15 bits to see if they are zeros -- because they are zero only if they were 15 (we added '1' to them when we did first addition, that's why "16+1") or if the carry was produced (it zeroed al). Meaning they should be zero if number was >= 240 or if low 4 bits were all 1s. I only wanted to use one branch for obvious reasons. I have no idea if it's faster than a lookup table though, but you can't test if there is no alternative |
|||
15 Oct 2017, 12:32 |
|
revolution 15 Oct 2017, 12:39
Furs: That is a great analysis. But is it faster?
|
|||
15 Oct 2017, 12:39 |
|
Furs 15 Oct 2017, 12:40
I don't know, it probably is in some cases (it depends on surrounding code, since otherwise if branch gets predicted properly all the time it doesn't matter which one you use), probably not in others etc.
But still you need "different" ways to be able to test them, can't test something when you don't even know alternative |
|||
15 Oct 2017, 12:40 |
|
Tomasz Grysztar 15 Oct 2017, 12:41
Furst: I think your first two instructions may be contracted to:
Code: add eax,(-1) shl 8 + 16 + 1 |
|||
15 Oct 2017, 12:41 |
|
Furs 15 Oct 2017, 12:48
I was worried about the carry going into the al but it seems you are right! I just tested it, so now it's:
Code: add eax,(-1) shl 8 + 16 + 1 and al, ah test al, 15 jz no |
|||
15 Oct 2017, 12:48 |
|
Tomasz Grysztar 15 Oct 2017, 12:51
Furs wrote: I was worried about the carry going into the al |
|||
15 Oct 2017, 12:51 |
|
revolution 15 Oct 2017, 13:35
Furs wrote: I think now it's definitely "faster" or at least the "same" speed. Also much more friendly to the cache (no need for a lookup table using 256 bytes). |
|||
15 Oct 2017, 13:35 |
|
Tomasz Grysztar 15 Oct 2017, 13:49
Because this is just a yes/no test, you could even try BT instead of XLAT. Lookup table would then be 32 bytes instead of 256.
|
|||
15 Oct 2017, 13:49 |
|
Furs 15 Oct 2017, 14:01
Yeah but AFAIK bt with "mem, reg" operands is super slow. (only that combination)
|
|||
15 Oct 2017, 14:01 |
|
l_inc 15 Oct 2017, 16:14
This one does not need to assume an initial value of ah and seems almost 2 times faster on my machine (not taking the branch into account):
Code: add al,$11 lahf test ah,$11 jnz .no lahf may be unavailable in 64-bit mode on some machines and may also be slower on some other machines. _________________ Faith is a superposition of knowledge and fallacy |
|||
15 Oct 2017, 16:14 |
|
Furs 15 Oct 2017, 16:31
Nice, I completely forgot that the Auxiliary Flag even exists.
|
|||
15 Oct 2017, 16:31 |
|
Furs 16 Oct 2017, 12:00
BTW without using the Auxiliary Flag trick, I realized you don't have to assume or touch the upper parts of eax, as long as you know 'ah' is zero. i.e.
Code: add eax, 15 shl 8 + 16 + 1 and al, ah test al, 15 jz no |
|||
16 Oct 2017, 12:00 |
|
SergeASM 21 Oct 2017, 11:57
Furs wrote: I had to Google what "tetrad" means (maybe I suck ) but this is usually called a "nibble" (4-bits). Sorry, I'm not an English speaker. Google Translate gave me two results: tetrad and quaternion. Thanks, but I wrote my naive test Code: xor ah,ah shl eax,4 cmp al,$F0 je .no cmp ah,$0F je .no _________________ Serge |
|||
21 Oct 2017, 11:57 |
|
Furs 21 Oct 2017, 12:42
With random values? Two branches would be much worse in a non-simulated test, at least on newer CPUs, but I have no idea about Athlon II. I mean, make sure you test it with "reasonable" values you can expect in the program etc.
With two branches tell me if the following is faster on your CPU? Code: add al, 16 + 1 jc .no test al, 15 jz .no |
|||
21 Oct 2017, 12:42 |
|
revolution 21 Oct 2017, 13:30
Make your tests in the real code. No amount of simulation will help IMO.
|
|||
21 Oct 2017, 13:30 |
|
_shura 21 Oct 2017, 18:19
Code: not al and al, 0x0f jz yes and al, 0xf0 jz yes |
|||
21 Oct 2017, 18:19 |
|
Furs 21 Oct 2017, 20:28
@_shura You meant to use "test" right? Now the second one will always be 0. test is also better as it can micro-fuse on newer CPUs with the branch (speed may be the same, but it's one less micro-op, which can help in other situations).
I doubt it's faster than the add+jc I posted (also two branches) even on Athlon which probably doesn't even do micro-fusing (because add->jc is probably not micro-fused since it's not very "common" like test/cmp + branch) since it has 1 more insn so (at least) 1 more micro op. |
|||
21 Oct 2017, 20:28 |
|
_shura 21 Oct 2017, 20:35
nobody said al had to contain the initial value
what about Code: not al test al, 0x0f jz @f ;only one jump test al, 0xf0 @@: ;ZF set: at least one nibble is 1111. ;ZF cleared: each nibble contains at least one null. ret |
|||
21 Oct 2017, 20:35 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.