flat assembler
Message board for the users of flat assembler.
Index
> Main > Intel released SSE4 documentation |
Author |
|
Hunter 25 Sep 2007, 10:38
Intel released SSE4 documentation:
http://download.intel.com/design/processor/manuals/D91561.pdf Is it planned to support it by FASM? |
|||
25 Sep 2007, 10:38 |
|
asmfan 25 Sep 2007, 12:25
Hunter wrote: Intel released SSE4 documentation This is already past day... cuz AMD released SSE5))) AMD64 Architecture Tech Docs AMD64 Technology 128-Bit SSE5 Instruction Set _________________ Any offers? |
|||
25 Sep 2007, 12:25 |
|
Tomasz Grysztar 25 Sep 2007, 12:44
Seems like a lot of fun to add the DREX support into fasm.
|
|||
25 Sep 2007, 12:44 |
|
MazeGen 25 Sep 2007, 13:09
Yeah, it seems that AMD has a relish for changing the instruction encoding, which can kill whole SSE5
|
|||
25 Sep 2007, 13:09 |
|
tom tobias 25 Sep 2007, 17:29
Tomasz wrote: Seems like a lot of fun to add the DREX support into fasm http://amdzone.com/index.php?name=PNphpBB2&file=viewtopic&t=11125&postdays=0&postorder=asc&start=60 from that reference: AMD wrote:
|
|||
25 Sep 2007, 17:29 |
|
peter 23 Nov 2007, 14:51
Had anybody tried to learn how to program for SSE4? I tried to implement strcmp with SSE4.2 instructions:
http://smallcode.weblogs.us/2007/11/23/strcmp-and-strlen-using-sse-42-instructions/ The problem is that you cannot test it, because there is no processors with SSE4 support yet . |
|||
23 Nov 2007, 14:51 |
|
LocoDelAssembly 23 Nov 2007, 16:19
http://softwarecommunity.intel.com/articles/eng/1193.htm
Quote: SDK for 45nm Next Generation Intel® Core™2 Processor Family and Intel® SSE4 (Penryn SDK): A collection of documentation and tools for developing software for Penryn and Intel SSE4. The Penryn SDK includes presentations and whitepapers on Penryn and Intel SSE4, and an Intel SSE4 emulator that allows you to start developing applications with Intel SSE4 today! |
|||
23 Nov 2007, 16:19 |
|
peter 23 Nov 2007, 23:50
LocoDelAssembly, thanks for the link.
|
|||
23 Nov 2007, 23:50 |
|
LocoDelAssembly 24 Nov 2007, 01:12
But unfortunally it will not emulate at cycle level, in fact, the library just install an invalid opcode exception handler to emulate SSE4 instructions. It is only useful to check if your algorithm accomplishes its specification, but for timings you will have to wait for Penryn
PS: And actually Penryn will implement a subset of SSE4 |
|||
24 Nov 2007, 01:12 |
|
peter 24 Nov 2007, 08:44
Unfortunately, the emulator does not support string-processing instructions (SSE 4.2). Only SSE4 is supported .
Well, the processors with SSE4.2 will be available some day. We should learn how to use them. It's not easy, as you can see from Intel papers and my article, but the instructions looks very tempting. Imagine CRC32 calculation or strlen in hardware . |
|||
24 Nov 2007, 08:44 |
|
asmfan 24 Nov 2007, 08:50
peter wrote: Imagine CRC32 calculation or strlen in hardware . I doubt they will have good speed. And again RISC vs. CISC. _________________ Any offers? |
|||
24 Nov 2007, 08:50 |
|
peter 24 Nov 2007, 10:07
There will be dedicated hardware for CRC32 and string-manipulation, so it should be really fast. See this discussion.
|
|||
24 Nov 2007, 10:07 |
|
LocoDelAssembly 24 Nov 2007, 14:22
If I'm right, the CRC calculation it does is not the most conventional, it also reflects the bits first.
Code: CRC32 instruction for 64-bit source operand and 64-bit destination operand: TEMP1[63-0] = BIT_REFLECT64 (SRC[63-0]) TEMP2[31-0] = BIT_REFLECT32 (DEST[31-0]) TEMP3[95-0] = TEMP1[63-0] << 32 TEMP4[95-0] = TEMP2[31-0] << 64 TEMP5[95-0] = TEMP3[95-0] XOR TEMP4[95-0] TEMP6[31-0] = TEMP5[95-0] MOD2 11EDC6F41H DEST[31-0] = BIT_REFLECT (TEMP6[31-0]) DEST[63-32] = 00000000H CRC32 instruction for 32-bit source operand and 32-bit destination operand: TEMP1[31-0] = BIT_REFLECT32 (SRC[31-0]) TEMP2[31-0] = BIT_REFLECT32 (DEST[31-0]) TEMP3[63-0] = TEMP1[31-0] << 32 TEMP4[63-0] = TEMP2[31-0] << 32 TEMP5[63-0] = TEMP3[63-0] XOR TEMP4[63-0] TEMP6[31-0] = TEMP5[63-0] MOD2 11EDC6F41H DEST[31-0] = BIT_REFLECT (TEMP6[31-0]) CRC32 instruction for 16-bit source operand and 32-bit destination operand: TEMP1[15-0] = BIT_REFLECT16 (SRC[15-0]) TEMP2[31-0] = BIT_REFLECT32 (DEST[31-0]) TEMP3[47-0] = TEMP1[15-0] << 32 TEMP4[47-0] = TEMP2[31-0] << 16 TEMP5[47-0] = TEMP3[47-0] XOR TEMP4[47-0] TEMP6[31-0] = TEMP5[47-0] MOD2 11EDC6F41H DEST[31-0] = BIT_REFLECT (TEMP6[31-0]) CRC32 instruction for 8-bit source operand and 64-bit destination operand: TEMP1[7-0] = BIT_REFLECT8(SRC[7-0]) TEMP2[31-0] = BIT_REFLECT32 (DEST[31-0]) TEMP3[39-0] = TEMP1[7-0] << 32 TEMP4[39-0] = TEMP2[31-0] << 8 TEMP5[39-0] = TEMP3[39-0] XOR TEMP4[39-0] TEMP6[31-0] = TEMP5[39-0] MOD2 11EDC6F41H DEST[31-0] = BIT_REFLECT (TEMP6[31-0]) DEST[63-32] = 00000000H CRC32 instruction for 8-bit source operand and 32-bit destination operand: TEMP1[7-0] = BIT_REFLECT8(SRC[7-0]) TEMP2[31-0] = BIT_REFLECT32 (DEST[31-0]) TEMP3[39-0] = TEMP1[7-0] << 32 TEMP4[39-0] = TEMP2[31-0] << 8 TEMP5[39-0] = TEMP3[39-0] XOR TEMP4[39-0] TEMP6[31-0] = TEMP5[39-0] MOD2 11EDC6F41H DEST[31-0] = BIT_REFLECT (TEMP6[31-0]) Notes: BIT_REFLECT64: DST[63-0] = SRC[0-63] BIT_REFLECT32: DST[31-0] = SRC[0-31] BIT_REFLECT16: DST[15-0] = SRC[0-15] BIT_REFLECT8: DST[7-0] = SRC[0-7] MOD2: Remainder from Polynomial division modulus 2 |
|||
24 Nov 2007, 14:22 |
|
f0dder 24 Nov 2007, 14:31
Iirc the .PDF mentions things like software iSCSI as possible uses for the CRC32 instructions... so I hope/assume intel hasn't fucked up this (very) CISCy instruction.
|
|||
24 Nov 2007, 14:31 |
|
peter 25 Nov 2007, 02:11
Bit reflection is used in all CRC32 algorithms, so it's not unusual.
What's actually bad is that they used polynomial 11EDC6F41 from iSCSI standard. Another polynomial (04C11DB7, reflected EDB88320) is used in almost all programs (WinZip, WinRar, WinHex, etc) and in Ethernet standard. Last edited by peter on 25 Nov 2007, 02:34; edited 1 time in total |
|||
25 Nov 2007, 02:11 |
|
f0dder 25 Nov 2007, 02:16
peter wrote: Bit reflection is used in all CRC32 algorithms, so it's not unusual. I guess that shows what they intend the instructions for Is the poly hardcoded in the cpu (along with LUTs), or can it be set programatically? _________________ - carpe noctem |
|||
25 Nov 2007, 02:16 |
|
peter 25 Nov 2007, 02:35
It's hardcoded.
The list of CRC32 polys: http://homepages.tesco.net/~rainstorm/crc-catalogue.htm#crc.cat.crc-32. |
|||
25 Nov 2007, 02:35 |
|
Madis731 26 Nov 2007, 08:17
I actually hoped that the poly can be set up somewhere or at least give a DWORD parameter with the instruction, oh well
|
|||
26 Nov 2007, 08:17 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.