flat assembler
Message board for the users of flat assembler.

Index > Main > About vgatherdps

Author
Thread Post new topic Reply to topic
Roman



Joined: 21 Apr 2012
Posts: 1763
Roman 13 Dec 2022, 12:53
In fasm docs https://flatassembler.net/docs.php?article=manual
Code:
vgatherdps xmm0{k1},[eax+xmm1]    ; gather four floats
    

Another place I found
Code:
vgatherdps xmm0,[eax+xmm1],xmm3 ; gather four floats
    


What is variant right ?

And not clearly understood how loops work and why eax+xmm1 and xmm3
And what means gather ? Its plus all floats to one or store to memory ?


Last edited by Roman on 14 Dec 2022, 14:41; edited 1 time in total
Post 13 Dec 2022, 12:53
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20292
Location: In your JS exploiting you and your system
revolution 13 Dec 2022, 13:00
It's complicated.
The instruction conditionally loads up to 4 or 8 single-precision floating-point values from memory addresses specified by the memory operand (the second operand) and using dword indices. The memory operand uses the VSIB form of the SIB byte to specify a general purpose register operand as the common base, a vector register for an array of indices relative to the base and a constant scale factor.

The mask operand (the third operand) specifies the conditional load operation from each memory address and the corresponding update of each data element of the destination operand (the first operand). Conditionality is specified by the most significant bit of each data element of the mask register. If an element’s mask bit is not set, the corresponding element of the destination register is left unchanged. The width of data element in the destination register and mask register are identical. The entire mask register will be set to zero by this instruction unless the instruction causes an exception.

Using qword indices, the instruction conditionally loads up to 2 or 4 single-precision floating-point values from the VSIB addressing memory operand, and updates the lower half of the destination register. The upper 128 or 256 bits of the destination register are zero’ed with qword indices.

This instruction can be suspended by an exception if at least one element is already gathered (i.e., if the exception is triggered by an element other than the rightmost one with its mask bit set). When this happens, the destination register and the mask operand are partially updated; those elements that have been gathered are placed into the destination register and have their mask bits set to zero. If any traps or interrupts are pending from already gathered elements, they will be delivered in lieu of the exception; in this case, EFLAG.RF is set to one so an instruction breakpoint is not re-triggered when the instruction is continued.

If the data size and index size are different, part of the destination register and part of the mask register do not correspond to any elements being gathered. This instruction sets those parts to zero. It may do this to one or both of those registers even if the instruction triggers an exception, and even if the instruction triggers the exception before gathering any elements.
Detailed operational pseudo-code in the link.
Post 13 Dec 2022, 13:00
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1763
Roman 13 Dec 2022, 13:07
Show me simple asm x86 code explaned how work vgatherdps.
And I clearly understood this.
Post 13 Dec 2022, 13:07
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1763
Roman 13 Dec 2022, 13:48
Post 13 Dec 2022, 13:48
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2493
Furs 13 Dec 2022, 14:33
So basically what this does is it fills your destination's data types (dwords for example) with values from memory, but each index in the memory operand is a different one (specified by the vector register in the memory address!) so it need not be sequential.

I never understood why it sets the mask to zeros, or why it modifies it in the first place. Couldn't they just make it an input operand?
Post 13 Dec 2022, 14:33
View user's profile Send private message Reply with quote
Overclick



Joined: 11 Jul 2020
Posts: 669
Location: Ukraine
Overclick 13 Dec 2022, 17:14
Here is much better explanation espessially for VSIB but still complicated.

https://www.amd.com/system/files/TechDocs/26568.pdf

VSIB.base (Bits [5:3]). This field is concatenated with the complement of the VEX.B bit ({B,base}) to specify the general-purpose register (base GPR) that contains the base address base to be used in the computation of each of the effective addresses.

How to modify it or what "general-purpose register" used by default?
Post 13 Dec 2022, 17:14
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.