flat assembler
Message board for the users of flat assembler.

Index > Main > AVX-512 transitions

Author
Thread Post new topic Reply to topic
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7783
Location: Kraków, Poland
Tomasz Grysztar
A really interesting article about AVX-512 performance (with experimental approach) posted today:
Gathering Intel on Intel AVX-512 Transitions.
Post 17 Jan 2020, 21:22
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17639
Location: In your JS exploiting you and your system
revolution
I think the outcome there is if you need top performance then execute the wide AVX instructions in batches, and not intersperse them amongst other code.
Post 18 Jan 2020, 03:14
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7783
Location: Kraków, Poland
Tomasz Grysztar
Another amazing article: AVX-512 Mask Registers, Again.
Post 26 May 2020, 16:22
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17639
Location: In your JS exploiting you and your system
revolution
Tomasz Grysztar wrote:
Another amazing article: AVX-512 Mask Registers, Again.
So we need to save the FPU state before using AVX-512? And the opposite, we need to save the AVX-512 state before using the FPU?

What happens if I mix FPU instructions with AVX-512 instructions in a single stream? Does the OS have to do a context save/restore each time the new instruction type is executed? So not only do we get a forced down-clock, we also get a forced context change. Yuck.
Post 27 May 2020, 05:08
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1514
Furs
revolution wrote:
So we need to save the FPU state before using AVX-512? And the opposite, we need to save the AVX-512 state before using the FPU?

What happens if I mix FPU instructions with AVX-512 instructions in a single stream? Does the OS have to do a context save/restore each time the new instruction type is executed? So not only do we get a forced down-clock, we also get a forced context change. Yuck.
That's not what I understood. It's about the physical register space being shared. Which is quite large (128 registers) due to renaming.

So you'll simply have less x87 registers available for renaming if you also use mask registers (and vice-versa). This is a performance loss, of course, but far different than what you think (context switches) and is completely transparent to software.
Post 27 May 2020, 15:41
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17639
Location: In your JS exploiting you and your system
revolution
Furs wrote:
That's not what I understood. It's about the physical register space being shared. Which is quite large (128 registers) due to renaming.

So you'll simply have less x87 registers available for renaming if you also use mask registers (and vice-versa). This is a performance loss, of course, but far different than what you think (context switches) and is completely transparent to software.
Okay, yes. I guess that makes sense. Thanks for the explanation.
Post 28 May 2020, 11:18
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.