flat assembler
Message board for the users of flat assembler.

Index > Main > Intel plans doubling 16 general purpose registers to 32

Goto page Previous  1, 2, 3, 4
Author
Thread Post new topic Reply to topic
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20413
Location: In your JS exploiting you and your system
revolution 07 Aug 2023, 03:19
There is more to making fast code than timing small code sections with rdtsc.

For any non-trivial program: The largest contributor to performance will almost certainly be memory access patterns, and cache usage. Your code might be different, but without testing how will you know?

It is kind of useless to have many super-fast tiny blocks of code, and then connect them together in a full program and discover that your cache gets filled and the whole thing thrashes, killing performance.

Test your whole code, the final app, as a whole, with everything working and running on normal data.

Uses for rdtsc should be limited to identifying hotspots to direct the optimisation efforts where the time spent is most productive. But also remember that rdtsc itself induces penalties possibly changing results that otherwise could be quite different.
Post 07 Aug 2023, 03:19
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 07 Aug 2023, 04:41
revolution, you have a habit of giving this same generic canned-like response every time the topic of code timing comes up. Memory, cache, data access, optimize the high level algorithm, ..., small code may be better than huge code that trashes the code cache, ..., micro benchmarks are useless, test your whole program, ..., etc . I have heard all of this multiple times before and am well aware of where the hotspots are and how to find them. Sometimes the bottlenecks really do fall under the rubric of in-register throughput computation.
Now, I was simply musing (and revolution is not the intended audience) that at some point intel's nominal add/mul latency ratio went from 3/5 to 4/4, i.e. that addition became relatively "slower". Since Anger Fog goes through the trouble of assigning values here and intel even posts numbers on its intrinsics guide, they must be worth something.
And, before revolution jumps back in tells me that I will know nothing untill I time my whole app, I would just like to mention that the only real success I've had applying these latency numbers is in the case of these trivial micro loops. ha.
Post 07 Aug 2023, 04:41
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20413
Location: In your JS exploiting you and your system
revolution 07 Aug 2023, 05:54
I'm glad the message is starting to get through. Smile

I hate to see people waste time on the wrong and give up on assembly out of frustration when their code is no faster than a good C compiler.
Post 07 Aug 2023, 05:54
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4060
Location: vpcmpistri
bitRAKE 07 Aug 2023, 13:12
I was born a C compiler then I had a growth spurt around Pascal.
tthsqe wrote:
why didn't I hear anyone complaining that add and mul now have the same latency? Smile
Because most code is compiled and the aggregate result was an increase in performance. This is why these new instructions will work - because they are compiler puzzle pieces, and can be made faster than existing instructions. So, rather than find ways to combine existing instructions (which compilers struggle to navigate correctly), Intel invests silicon in better tools for the compiler.

If no bottlenecks exist in fetch/decode/retire then instruction binning into execution ports drives throughput. (Assuming no dependencies.)

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 07 Aug 2023, 13:12
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.