flat assembler
Message board for the users of flat assembler.

Index > Main > SSE mixing float with integer instructions

Author
Thread Post new topic Reply to topic
Overclick



Joined: 11 Jul 2020
Posts: 669
Location: Ukraine
Overclick 02 Jul 2022, 01:34
Hi guys!
As we know that mixing have some penalties but some of instructions are very sweet. MOVHPD for example, Isn't some small penalty better than extra instruction to load/unload upper integer qword to regular register? Don't we have to modify MXCSR before use to prevent NaN exception or it checks for arithmetics only?
Post 02 Jul 2022, 01:34
View user's profile Send private message Visit poster's website Reply with quote
macgub



Joined: 11 Jan 2006
Posts: 350
Location: Poland
macgub 19 Sep 2022, 17:15
Personally I use many times instructions as movhlps, movlhps, movhps for integer data. I guess overall performance depend upon instuctions above and below mentioned. - If they are for floats or integer, too.
Post 19 Sep 2022, 17:15
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20423
Location: In your JS exploiting you and your system
revolution 20 Sep 2022, 06:43
macgub wrote:
I guess overall performance depend upon ...
Yes. It depends upon many factors.

If you have a performance problem then profile/benchmark and use that to direct your attention to the problem areas.
Post 20 Sep 2022, 06:43
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2544
Furs 20 Sep 2022, 13:04
Overclick wrote:
Hi guys!
As we know that mixing have some penalties but some of instructions are very sweet. MOVHPD for example, Isn't some small penalty better than extra instruction to load/unload upper integer qword to regular register? Don't we have to modify MXCSR before use to prevent NaN exception or it checks for arithmetics only?
I suggest you read Agner Fog's optimization manuals because it depends on the CPU in question.

Far more useful than benchmarking which (1) is only for CPUs you have access to and (2) you're unlikely to get the precise results someone else meticulously analyzed.

In short: it depends on the CPU. Some have extra latency when crossing the "domain" and some don't, for example. (usually newer ones don't so you can freely mix)
Post 20 Sep 2022, 13:04
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.