flat assembler
Message board for the users of flat assembler.

Index > Main > 256 possible order. What is the fastest code?

Goto page Previous  1, 2, 3, 4, 5, 6  Next
Author
Thread Post new topic Reply to topic
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18841
Location: In your JS exploiting you and your system
revolution
Furs wrote:
And even in that case I doubt it's worse, because what you mentioned, branch prediction, actually helps binary search more, not goes against it.
You can't know that unless you also know the data pattern.

If the pattern is pathological it can make all predictions be wrong and make your code worse.

Like I said, there is no way to know without testing it. You can't look at these thing statically and "know" it is better.
Post 13 Aug 2022, 21:50
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3471
Location: vpcmipstrm
bitRAKE
If we were going to build a test program, what would that look like?

What I would like is a simple dialog application with a menu. It would give options for the dispatch method, and also the source of data for the dispatcher. Data sources would be a PRNG, or a file. Selecting a dispatch method would try to create a thread on a different core - maybe, high priority or real-time. The dialog would report on total dispatches and dispatches per second.

Does this sound agreeable?

Should a workload be added to each branch? Maybe that should be another menu option - levels of code or data load.

Seems like a fairly easy thing.

_________________
¯\(°_o)/¯ unlicense.org
Post 14 Aug 2022, 00:41
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18841
Location: In your JS exploiting you and your system
revolution
bitRAKE wrote:
If we were going to build a test program, what would that look like?
It would look like the final app (i.e. it is the final app). Anything else is just guessing. This also has the advantage of telling you where the real hot spots are, and not waste time optimising the wrong bits.

If it is deemed good enough to make it "faster" then it is also good enough to make sure it really is faster, and not use simulated results and then hope the real thing is the same.
Post 14 Aug 2022, 02:01
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3471
Location: vpcmipstrm
bitRAKE
Yeah, probably too many pitfalls to create a simulation that actually models what is happening.

_________________
¯\(°_o)/¯ unlicense.org
Post 14 Aug 2022, 04:08
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1885
Furs
macomics wrote:
Furs wrote:
Furs wrote:
... always use binary search.

But it takes way less branches so it's far easier to predict?

Here's a simple task for you to think about. Given 2 identical glass balls and a skyscraper on N floors. It is known for sure that the ball definitely breaks when thrown from the last floor, and also that both balls break when thrown from the same floor. Find the minimum floor for the minimum number of throws from which the ball will break when thrown?

Solving this problem, it may become clear to you why C/Pascal compilers do not generate a large number of jumps by creating a binary tree when generating a switch/case.
I have no idea what you're talking about. Most good optimizing compilers will always generate a binary search when doing sparse switch/case jumps, or an array if they're close to each other with not much gap, as long as the entries are reasonably enough (i.e. more than 4 or so).

I never seen them generate a LINEAR search with jumps.
Post 14 Aug 2022, 13:56
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1885
Furs
revolution wrote:
Furs wrote:
And even in that case I doubt it's worse, because what you mentioned, branch prediction, actually helps binary search more, not goes against it.
You can't know that unless you also know the data pattern.

If the pattern is pathological it can make all predictions be wrong and make your code worse.

Like I said, there is no way to know without testing it. You can't look at these thing statically and "know" it is better.
You don't need to test it when you can predict it based on the situation.

The data pattern is irrelevant unless all of the data is in the first or second jump (in linear case). But as I said, that's only for weak CPUs with weak branch predictors. The better the branch prediction, the better off binary search will be compared to linear search.

Some things are just obvious.


Like the "obvious" pattern:
Code:
if(some_boolean_var && some_expensive_function())    
which is objectively better than
Code:
if(some_expensive_function() && some_boolean_var)    
No test needed Wink
Post 14 Aug 2022, 13:57
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18841
Location: In your JS exploiting you and your system
revolution
Furs wrote:
The data pattern is irrelevant ...
It is 100% about the data pattern. If you don't know what your patterns are is then you can't say if they are BTB friendly or BTB hostile.

You can't use lack of knowledge to make a claim that it will be fine. If you don't know your patterns, just say that, don't then go on to say the branch predictor will figure it out. It won't, it is dumb, and it is easy to have bad patterns in the data that choke the BTB with wrong predictions.
Post 14 Aug 2022, 14:12
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1885
Furs
You're still missing the point. You can THINK of possible data patterns in your head (or whatever) to figure out ways in which the jumps can be taken.

With some simple solutions you can literally even prove that B > A (algorithms) in all cases except one or two. You don't freaking need to test what's obvious.

As I said before, even with no branch predictor at all, unless most of your data is in the first or second jump you place linearly, you're better off with binary search.

And if you have such data but 256 possible entries I think you need to rethink your design in the first place... So linear still sucks because you shouldn't be using this data pattern in the first place. Use RLE or something. Smile
Post 16 Aug 2022, 12:50
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18841
Location: In your JS exploiting you and your system
revolution
I wish it was as "obvious" as you seems to suggest. You claimed "always use binary search. It's objectively better unless you have less than 4 entries.". It isn't obvious to me how you can know that, when you don't know the data patterns, and you don't know the misprediction penalty.

A single misprediction penalty can easily eat up the time of four correctly predicted cmp instructions (more actually) in all CPUs that I am aware of. So your claim of 4 entries being enough to "obviously" be better than everything else is not obvious to me in any way.
Post 16 Aug 2022, 15:42
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3471
Location: vpcmipstrm
bitRAKE
These discussions devolve because there is no firm ground to stand on. That's why I optimize for size - it's either smaller or it isn't. With no example code, there is little point - we are adrift in the middle of an endless sea. Even with an example, we'd have a dozen usage scenarios.

Branch predictors have gotten so good. This adaptive ability is quite good to have.

_________________
¯\(°_o)/¯ unlicense.org
Post 16 Aug 2022, 18:53
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18841
Location: In your JS exploiting you and your system
revolution
Optimisation for size is unambiguous.

But sometimes that doesn't do what the boss wants, and we need to improve the performance, to reduce the server count, to save money. At that point we have to begin the arduous process of doing real testing and discovering that data patterns drive everything.
Post 16 Aug 2022, 22:08
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1885
Furs
revolution wrote:
I wish it was as "obvious" as you seems to suggest. You claimed "always use binary search. It's objectively better unless you have less than 4 entries.". It isn't obvious to me how you can know that, when you don't know the data patterns, and you don't know the misprediction penalty.

A single misprediction penalty can easily eat up the time of four correctly predicted cmp instructions (more actually) in all CPUs that I am aware of. So your claim of 4 entries being enough to "obviously" be better than everything else is not obvious to me in any way.
As I said, if your pattern is so whack that it benefits linear search then you need to rethink your data storage. Use RLE for far better performance (because you'll loop for the same value which happens a lot).

In any case, linear search is most definitely the wrong choice, unless the amount of branches is really low as I mentioned. It's as simple as that.
Post 17 Aug 2022, 13:13
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18841
Location: In your JS exploiting you and your system
revolution
If you have complete control over every aspect of the code and data then do whatever you want. But how will you "rethink your data storage" if 1) it isn't stored but delivered, and 2) you have no control over the delivery from external sources?

Since the OP never mentioned anything about the data format, or the source, then there are absolutely no assumptions we can make that will be "obviously" correct or "objectively better".
Post 17 Aug 2022, 13:56
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18841
Location: In your JS exploiting you and your system
revolution
Thanks to my colleague for helping me to test this code:
Code:
;to assemble: "fasm -d TEST=uselss_cmp_chain"
;to assemble: "fasm -d TEST=objectivy_better_binary"

macro uselss_cmp_chain {
        cmp     al,1
        jz      item1
        cmp     al,2
        jz      item2
        cmp     al,3
        jz      item3
        cmp     al,4
        jz      item4
}

macro objectivy_better_binary {
        cmp     al,2
        ja      high_half
        jb      item1
        jmp     item2
high_half:
        cmp     al,3
        je      item3
        jmp     item4
}

format elf executable

SYS_EXIT        = 1

        mov     ecx,1000000
outer_loop:
        mov     ebx,the_data_pattern_is_unimportant_len
        mov     esi,the_data_pattern_is_unimportant
inner_loop:
        lodsb
        match x,TEST {x}
next_item:
        dec     ebx
        jnz     inner_loop
        loop    outer_loop
        mov     eax,SYS_EXIT
        int     0x80

item1:  jmp     next_item
item2:  jmp     next_item
item3:  jmp     next_item
item4:  jmp     next_item

the_data_pattern_is_unimportant:
        while % <= 1000
                db 1
        end while
        db 2,3,4
the_data_pattern_is_unimportant_len = $ - the_data_pattern_is_unimportant    
Result:
Code:
~ fasm -d TEST=uselss_cmp_chain test.asm && time ./test
flat assembler  version 1.73.08  (4029940 kilobytes memory)
2 passes, 1139 bytes.

real    0m4.510s
user    0m4.464s
sys     0m0.000s
~ fasm -d TEST=objectivy_better_binary test.asm && time ./test
flat assembler  version 1.73.08  (4041296 kilobytes memory)
2 passes, 1137 bytes.

real    0m5.159s
user    0m5.108s
sys     0m0.000s    
Data patterns matter. Ignore them at your peril.
Post 18 Aug 2022, 04:14
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18841
Location: In your JS exploiting you and your system
revolution
It also shows this result: the worst case for the cmp chain:
Code:
the_data_pattern_is_unimportant:
        while % <= 1000
                db 4
        end while
        db 2,3,1
the_data_pattern_is_unimportant_len = $ - the_data_pattern_is_unimportant    
Result:
Code:
~ fasm -d TEST=uselss_cmp_chain test.asm && time ./test
flat assembler  version 1.73.08  (4043436 kilobytes memory)
2 passes, 1139 bytes.

real    0m6.561s
user    0m6.528s
sys     0m0.004s
~ fasm -d TEST=objectivy_better_binary test.asm && time ./test
flat assembler  version 1.73.08  (4027244 kilobytes memory)
2 passes, 1137 bytes.

real    0m7.789s
user    0m7.652s
sys     0m0.000s    
Post 18 Aug 2022, 04:27
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18841
Location: In your JS exploiting you and your system
revolution
Furs wrote:
You don't freaking need to test what's obvious.
I'm still not seeing the obvious here.

If I didn't test it then I would fool myself into thinking I had awesome code, when in reality I had crap code. So not too different from many of the HLL compilers. Laughing
Post 18 Aug 2022, 04:30
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1885
Furs
revolution wrote:
I'm still not seeing the obvious here.
You can count the jumps in each situation.

And nice try with your test, using only 4 items and coding the binary search with too many jumps. Wink
Post 18 Aug 2022, 12:26
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18841
Location: In your JS exploiting you and your system
revolution
Show your improvement.
Post 18 Aug 2022, 12:31
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8026
Location: Kraków, Poland
Tomasz Grysztar
revolution wrote:
Code:
macro uselss_cmp_chain {
        cmp     al,1
        jz      item1
        cmp     al,2
        jz      item2
        cmp     al,3
        jz      item3
        cmp     al,4
        jz      item4
}

macro objectivy_better_binary {
        cmp     al,2
        ja      high_half
        jb      item1
        jmp     item2
high_half:
        cmp     al,3
        je      item3
        jmp     item4
}    
Side-stepping the topic a little, this could be made into a fun 4-way junction:
Code:
; assume AL is in range 1-4
        cmp     al,3
        ja      item4
        je      item3
        jpe     item2
        jmp     item1    
Post 18 Aug 2022, 14:48
View user's profile Send private message Visit poster's website Reply with quote
ProMiNick



Joined: 24 Mar 2012
Posts: 688
Location: Russian Federation, Sochi
ProMiNick
fasm -d TEST=hack_1234_chain_1
fasm -d TEST=hack_1234_chain_2
fasm -d TEST=hack_1234_chain_3
Code:
hack_1234_chain_1 {
sub al,3 ;add al,-3
jz item3
jp item2
js item1
jmp item4 }

hack_1234_chain_2 {
add al,-3
jz item3
jp item2
ja item1
jmp item4 }

hack_1234_chain_3 {
sub al,3
jz item3
jp item2
jb item1
jmp item4 }    
Post 18 Aug 2022, 15:00
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5, 6  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.