flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
Hunter 25 Sep 2007, 10:38
Intel released SSE4 documentation:
http://download.intel.com/design/processor/manuals/D91561.pdf Is it planned to support it by FASM? |
|||
![]() |
|
asmfan 25 Sep 2007, 12:25
Hunter wrote: Intel released SSE4 documentation This is already past day... cuz AMD released SSE5))) AMD64 Architecture Tech Docs AMD64 Technology 128-Bit SSE5 Instruction Set _________________ Any offers? |
|||
![]() |
|
Tomasz Grysztar 25 Sep 2007, 12:44
Seems like a lot of fun to add the DREX support into fasm.
![]() |
|||
![]() |
|
MazeGen 25 Sep 2007, 13:09
Yeah, it seems that AMD has a relish for changing the instruction encoding, which can kill whole SSE5
![]() |
|||
![]() |
|
tom tobias 25 Sep 2007, 17:29
Tomasz wrote: Seems like a lot of fun to add the DREX support into fasm http://amdzone.com/index.php?name=PNphpBB2&file=viewtopic&t=11125&postdays=0&postorder=asc&start=60 from that reference: AMD wrote:
|
|||
![]() |
|
peter 23 Nov 2007, 14:51
Had anybody tried to learn how to program for SSE4? I tried to implement strcmp with SSE4.2 instructions:
http://smallcode.weblogs.us/2007/11/23/strcmp-and-strlen-using-sse-42-instructions/ The problem is that you cannot test it, because there is no processors with SSE4 support yet ![]() |
|||
![]() |
|
LocoDelAssembly 23 Nov 2007, 16:19
http://softwarecommunity.intel.com/articles/eng/1193.htm
Quote: SDK for 45nm Next Generation Intel® Core™2 Processor Family and Intel® SSE4 (Penryn SDK): A collection of documentation and tools for developing software for Penryn and Intel SSE4. The Penryn SDK includes presentations and whitepapers on Penryn and Intel SSE4, and an Intel SSE4 emulator that allows you to start developing applications with Intel SSE4 today! |
|||
![]() |
|
peter 23 Nov 2007, 23:50
LocoDelAssembly, thanks for the link.
|
|||
![]() |
|
LocoDelAssembly 24 Nov 2007, 01:12
But unfortunally it will not emulate at cycle level, in fact, the library just install an invalid opcode exception handler to emulate SSE4 instructions. It is only useful to check if your algorithm accomplishes its specification, but for timings you will have to wait for Penryn
![]() PS: And actually Penryn will implement a subset of SSE4 |
|||
![]() |
|
peter 24 Nov 2007, 08:44
Unfortunately, the emulator does not support string-processing instructions (SSE 4.2). Only SSE4 is supported
![]() Well, the processors with SSE4.2 will be available some day. We should learn how to use them. It's not easy, as you can see from Intel papers and my article, but the instructions looks very tempting. Imagine CRC32 calculation or strlen in hardware ![]() |
|||
![]() |
|
asmfan 24 Nov 2007, 08:50
peter wrote: Imagine CRC32 calculation or strlen in hardware I doubt they will have good speed. And again RISC vs. CISC. _________________ Any offers? |
|||
![]() |
|
peter 24 Nov 2007, 10:07
There will be dedicated hardware for CRC32 and string-manipulation, so it should be really fast. See this discussion.
|
|||
![]() |
|
LocoDelAssembly 24 Nov 2007, 14:22
If I'm right, the CRC calculation it does is not the most conventional, it also reflects the bits first.
Code: CRC32 instruction for 64-bit source operand and 64-bit destination operand: TEMP1[63-0] = BIT_REFLECT64 (SRC[63-0]) TEMP2[31-0] = BIT_REFLECT32 (DEST[31-0]) TEMP3[95-0] = TEMP1[63-0] << 32 TEMP4[95-0] = TEMP2[31-0] << 64 TEMP5[95-0] = TEMP3[95-0] XOR TEMP4[95-0] TEMP6[31-0] = TEMP5[95-0] MOD2 11EDC6F41H DEST[31-0] = BIT_REFLECT (TEMP6[31-0]) DEST[63-32] = 00000000H CRC32 instruction for 32-bit source operand and 32-bit destination operand: TEMP1[31-0] = BIT_REFLECT32 (SRC[31-0]) TEMP2[31-0] = BIT_REFLECT32 (DEST[31-0]) TEMP3[63-0] = TEMP1[31-0] << 32 TEMP4[63-0] = TEMP2[31-0] << 32 TEMP5[63-0] = TEMP3[63-0] XOR TEMP4[63-0] TEMP6[31-0] = TEMP5[63-0] MOD2 11EDC6F41H DEST[31-0] = BIT_REFLECT (TEMP6[31-0]) CRC32 instruction for 16-bit source operand and 32-bit destination operand: TEMP1[15-0] = BIT_REFLECT16 (SRC[15-0]) TEMP2[31-0] = BIT_REFLECT32 (DEST[31-0]) TEMP3[47-0] = TEMP1[15-0] << 32 TEMP4[47-0] = TEMP2[31-0] << 16 TEMP5[47-0] = TEMP3[47-0] XOR TEMP4[47-0] TEMP6[31-0] = TEMP5[47-0] MOD2 11EDC6F41H DEST[31-0] = BIT_REFLECT (TEMP6[31-0]) CRC32 instruction for 8-bit source operand and 64-bit destination operand: TEMP1[7-0] = BIT_REFLECT8(SRC[7-0]) TEMP2[31-0] = BIT_REFLECT32 (DEST[31-0]) TEMP3[39-0] = TEMP1[7-0] << 32 TEMP4[39-0] = TEMP2[31-0] << 8 TEMP5[39-0] = TEMP3[39-0] XOR TEMP4[39-0] TEMP6[31-0] = TEMP5[39-0] MOD2 11EDC6F41H DEST[31-0] = BIT_REFLECT (TEMP6[31-0]) DEST[63-32] = 00000000H CRC32 instruction for 8-bit source operand and 32-bit destination operand: TEMP1[7-0] = BIT_REFLECT8(SRC[7-0]) TEMP2[31-0] = BIT_REFLECT32 (DEST[31-0]) TEMP3[39-0] = TEMP1[7-0] << 32 TEMP4[39-0] = TEMP2[31-0] << 8 TEMP5[39-0] = TEMP3[39-0] XOR TEMP4[39-0] TEMP6[31-0] = TEMP5[39-0] MOD2 11EDC6F41H DEST[31-0] = BIT_REFLECT (TEMP6[31-0]) Notes: BIT_REFLECT64: DST[63-0] = SRC[0-63] BIT_REFLECT32: DST[31-0] = SRC[0-31] BIT_REFLECT16: DST[15-0] = SRC[0-15] BIT_REFLECT8: DST[7-0] = SRC[0-7] MOD2: Remainder from Polynomial division modulus 2 |
|||
![]() |
|
f0dder 24 Nov 2007, 14:31
Iirc the .PDF mentions things like software iSCSI as possible uses for the CRC32 instructions... so I hope/assume intel hasn't fucked up this (very) CISCy instruction.
|
|||
![]() |
|
peter 25 Nov 2007, 02:11
Bit reflection is used in all CRC32 algorithms, so it's not unusual.
What's actually bad is that they used polynomial 11EDC6F41 from iSCSI standard. Another polynomial (04C11DB7, reflected EDB88320) is used in almost all programs (WinZip, WinRar, WinHex, etc) and in Ethernet standard. Last edited by peter on 25 Nov 2007, 02:34; edited 1 time in total |
|||
![]() |
|
f0dder 25 Nov 2007, 02:16
peter wrote: Bit reflection is used in all CRC32 algorithms, so it's not unusual. I guess that shows what they intend the instructions for ![]() Is the poly hardcoded in the cpu (along with LUTs), or can it be set programatically? _________________ ![]() |
|||
![]() |
|
peter 25 Nov 2007, 02:35
It's hardcoded.
The list of CRC32 polys: http://homepages.tesco.net/~rainstorm/crc-catalogue.htm#crc.cat.crc-32. |
|||
![]() |
|
Madis731 26 Nov 2007, 08:17
I actually hoped that the poly can be set up somewhere or at least give a DWORD parameter with the instruction, oh well
![]() |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.