flat assembler
Message board for the users of flat assembler.
Index
> Main > 256 possible order. What is the fastest code? Goto page Previous 1, 2, 3, 4, 5, 6 Next |
Author |
|
bitRAKE 14 Aug 2022, 00:41
If we were going to build a test program, what would that look like?
What I would like is a simple dialog application with a menu. It would give options for the dispatch method, and also the source of data for the dispatcher. Data sources would be a PRNG, or a file. Selecting a dispatch method would try to create a thread on a different core - maybe, high priority or real-time. The dialog would report on total dispatches and dispatches per second. Does this sound agreeable? Should a workload be added to each branch? Maybe that should be another menu option - levels of code or data load. Seems like a fairly easy thing. _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
14 Aug 2022, 00:41 |
|
revolution 14 Aug 2022, 02:01
bitRAKE wrote: If we were going to build a test program, what would that look like? If it is deemed good enough to make it "faster" then it is also good enough to make sure it really is faster, and not use simulated results and then hope the real thing is the same. |
|||
14 Aug 2022, 02:01 |
|
bitRAKE 14 Aug 2022, 04:08
Yeah, probably too many pitfalls to create a simulation that actually models what is happening.
_________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
14 Aug 2022, 04:08 |
|
Furs 14 Aug 2022, 13:56
macomics wrote:
I never seen them generate a LINEAR search with jumps. |
|||
14 Aug 2022, 13:56 |
|
Furs 14 Aug 2022, 13:57
revolution wrote:
The data pattern is irrelevant unless all of the data is in the first or second jump (in linear case). But as I said, that's only for weak CPUs with weak branch predictors. The better the branch prediction, the better off binary search will be compared to linear search. Some things are just obvious. Like the "obvious" pattern: Code: if(some_boolean_var && some_expensive_function()) Code: if(some_expensive_function() && some_boolean_var) |
|||
14 Aug 2022, 13:57 |
|
revolution 14 Aug 2022, 14:12
Furs wrote: The data pattern is irrelevant ... You can't use lack of knowledge to make a claim that it will be fine. If you don't know your patterns, just say that, don't then go on to say the branch predictor will figure it out. It won't, it is dumb, and it is easy to have bad patterns in the data that choke the BTB with wrong predictions. |
|||
14 Aug 2022, 14:12 |
|
Furs 16 Aug 2022, 12:50
You're still missing the point. You can THINK of possible data patterns in your head (or whatever) to figure out ways in which the jumps can be taken.
With some simple solutions you can literally even prove that B > A (algorithms) in all cases except one or two. You don't freaking need to test what's obvious. As I said before, even with no branch predictor at all, unless most of your data is in the first or second jump you place linearly, you're better off with binary search. And if you have such data but 256 possible entries I think you need to rethink your design in the first place... So linear still sucks because you shouldn't be using this data pattern in the first place. Use RLE or something. |
|||
16 Aug 2022, 12:50 |
|
revolution 16 Aug 2022, 15:42
I wish it was as "obvious" as you seems to suggest. You claimed "always use binary search. It's objectively better unless you have less than 4 entries.". It isn't obvious to me how you can know that, when you don't know the data patterns, and you don't know the misprediction penalty.
A single misprediction penalty can easily eat up the time of four correctly predicted cmp instructions (more actually) in all CPUs that I am aware of. So your claim of 4 entries being enough to "obviously" be better than everything else is not obvious to me in any way. |
|||
16 Aug 2022, 15:42 |
|
bitRAKE 16 Aug 2022, 18:53
These discussions devolve because there is no firm ground to stand on. That's why I optimize for size - it's either smaller or it isn't. With no example code, there is little point - we are adrift in the middle of an endless sea. Even with an example, we'd have a dozen usage scenarios.
Branch predictors have gotten so good. This adaptive ability is quite good to have. _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
16 Aug 2022, 18:53 |
|
revolution 16 Aug 2022, 22:08
Optimisation for size is unambiguous.
But sometimes that doesn't do what the boss wants, and we need to improve the performance, to reduce the server count, to save money. At that point we have to begin the arduous process of doing real testing and discovering that data patterns drive everything. |
|||
16 Aug 2022, 22:08 |
|
Furs 17 Aug 2022, 13:13
revolution wrote: I wish it was as "obvious" as you seems to suggest. You claimed "always use binary search. It's objectively better unless you have less than 4 entries.". It isn't obvious to me how you can know that, when you don't know the data patterns, and you don't know the misprediction penalty. In any case, linear search is most definitely the wrong choice, unless the amount of branches is really low as I mentioned. It's as simple as that. |
|||
17 Aug 2022, 13:13 |
|
revolution 17 Aug 2022, 13:56
If you have complete control over every aspect of the code and data then do whatever you want. But how will you "rethink your data storage" if 1) it isn't stored but delivered, and 2) you have no control over the delivery from external sources?
Since the OP never mentioned anything about the data format, or the source, then there are absolutely no assumptions we can make that will be "obviously" correct or "objectively better". |
|||
17 Aug 2022, 13:56 |
|
revolution 18 Aug 2022, 04:14
Thanks to my colleague for helping me to test this code:
Code: ;to assemble: "fasm -d TEST=uselss_cmp_chain" ;to assemble: "fasm -d TEST=objectivy_better_binary" macro uselss_cmp_chain { cmp al,1 jz item1 cmp al,2 jz item2 cmp al,3 jz item3 cmp al,4 jz item4 } macro objectivy_better_binary { cmp al,2 ja high_half jb item1 jmp item2 high_half: cmp al,3 je item3 jmp item4 } format elf executable SYS_EXIT = 1 mov ecx,1000000 outer_loop: mov ebx,the_data_pattern_is_unimportant_len mov esi,the_data_pattern_is_unimportant inner_loop: lodsb match x,TEST {x} next_item: dec ebx jnz inner_loop loop outer_loop mov eax,SYS_EXIT int 0x80 item1: jmp next_item item2: jmp next_item item3: jmp next_item item4: jmp next_item the_data_pattern_is_unimportant: while % <= 1000 db 1 end while db 2,3,4 the_data_pattern_is_unimportant_len = $ - the_data_pattern_is_unimportant Code: ~ fasm -d TEST=uselss_cmp_chain test.asm && time ./test flat assembler version 1.73.08 (4029940 kilobytes memory) 2 passes, 1139 bytes. real 0m4.510s user 0m4.464s sys 0m0.000s ~ fasm -d TEST=objectivy_better_binary test.asm && time ./test flat assembler version 1.73.08 (4041296 kilobytes memory) 2 passes, 1137 bytes. real 0m5.159s user 0m5.108s sys 0m0.000s |
|||
18 Aug 2022, 04:14 |
|
revolution 18 Aug 2022, 04:27
It also shows this result: the worst case for the cmp chain:
Code: the_data_pattern_is_unimportant: while % <= 1000 db 4 end while db 2,3,1 the_data_pattern_is_unimportant_len = $ - the_data_pattern_is_unimportant Code: ~ fasm -d TEST=uselss_cmp_chain test.asm && time ./test flat assembler version 1.73.08 (4043436 kilobytes memory) 2 passes, 1139 bytes. real 0m6.561s user 0m6.528s sys 0m0.004s ~ fasm -d TEST=objectivy_better_binary test.asm && time ./test flat assembler version 1.73.08 (4027244 kilobytes memory) 2 passes, 1137 bytes. real 0m7.789s user 0m7.652s sys 0m0.000s |
|||
18 Aug 2022, 04:27 |
|
revolution 18 Aug 2022, 04:30
Furs wrote: You don't freaking need to test what's obvious. If I didn't test it then I would fool myself into thinking I had awesome code, when in reality I had crap code. So not too different from many of the HLL compilers. |
|||
18 Aug 2022, 04:30 |
|
Furs 18 Aug 2022, 12:26
revolution wrote: I'm still not seeing the obvious here. And nice try with your test, using only 4 items and coding the binary search with too many jumps. |
|||
18 Aug 2022, 12:26 |
|
revolution 18 Aug 2022, 12:31
Show your improvement.
|
|||
18 Aug 2022, 12:31 |
|
Tomasz Grysztar 18 Aug 2022, 14:48
revolution wrote:
Code: ; assume AL is in range 1-4 cmp al,3 ja item4 je item3 jpe item2 jmp item1 |
|||
18 Aug 2022, 14:48 |
|
ProMiNick 18 Aug 2022, 15:00
fasm -d TEST=hack_1234_chain_1
fasm -d TEST=hack_1234_chain_2 fasm -d TEST=hack_1234_chain_3 Code: hack_1234_chain_1 { sub al,3 ;add al,-3 jz item3 jp item2 js item1 jmp item4 } hack_1234_chain_2 { add al,-3 jz item3 jp item2 ja item1 jmp item4 } hack_1234_chain_3 { sub al,3 jz item3 jp item2 jb item1 jmp item4 } |
|||
18 Aug 2022, 15:00 |
|
Goto page Previous 1, 2, 3, 4, 5, 6 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.