flat assembler
Message board for the users of flat assembler.
Index
> Main > Implementing SFMD functions the RISC way |
Author |
|
mattst88 14 May 2008, 05:11
I'm learning Alpha assembly, and in keeping with the RISC way of doing things, there are only very basic SIMD instructions: minimum and maximum of signed/unsigned words/bytes. That is, {min,max}{u,s}{b,w}
Whereas SSE provides instructions for everything and walks your dog for you, complicated instructions have to be built from basic instructions on the Alpha. For instance, the SSE/MMX instructions paddusb, psubusb can be implemented as Code: static __inline uint64_t __paddusb8(uint64_t m1, uint64_t m2) { return m1 + __minub8(m2, ~m1); } static __inline uint64_t __psubusb8(uint64_t m1, uint64_t m2) { return m1 - __minub8(m2, m1); } When I saw these, I thought they were kind of clever. Another interesting one (that I have not confirmed as working) is paddw: Code: __paddw(uint64_t m1, uint64_t m2) { uint64_t signs = (m1 ^ m2) & 0x8000800080008000; m1 &= ~signs; m2 &= ~signs; m1 += m2; m1 ^= signs; return m1; } Here's an exercise for you: Using only the following instructions implement an SIMD instruction (SSE/MMX) that operates on byte or word integers. Available instructions: and, or, xor, not, SIMD min/max for bytes/words, addition/subtraction. Note that the available add/subtract instructions are not SIMD aware and treat operands as single integers. _________________ My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions. Assembly Programmer's Journal Last edited by mattst88 on 20 Jun 2008, 17:17; edited 1 time in total |
|||
14 May 2008, 05:11 |
|
LocoDelAssembly 14 May 2008, 16:55
Here it is one http://board.flatassembler.net/topic.php?p=65768#65768
|
|||
14 May 2008, 16:55 |
|
bitRAKE 15 May 2008, 02:31
Have you seen HACKMEM?
Fun read for any programmer, imho. _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
15 May 2008, 02:31 |
|
mattst88 17 May 2008, 18:51
Code: __paddw(uint64_t m1, uint64_t m2) { uint64_t signs = (m1 ^ m2) & 0x8000800080008000; m1 &= ~signs; m2 &= ~signs; m1 += m2; m1 ^= signs; return m1; } Sure enough, this does seem to work. This function is the equivalent of paddw, even on systems without MMX. Just one question, can anyone explain the logic to me? I'm guessing here, but I think it figures out what the resulting words' signs will be, saves them, erases them from the operands, adds the operands, and restores the signs. _________________ My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions. Assembly Programmer's Journal |
|||
17 May 2008, 18:51 |
|
bitRAKE 18 May 2008, 02:59
mattst88 wrote: Just one question, can anyone explain the logic to me? m2 &= ~signs; These only clear the sign bits when m1 and m2 are different - meaning a carry takes place if both are set - which screws up the result of higher order words (i.e. it doesn't work exactly). Using SpeQ I wrote a script to test values: Code: m1=Hex(0xFFF0) m1 = 0xFFF0 m2=Hex(0xFFF7) m2 = 0xFFF7 signs=Hex((m1 || m2) & 0x8000) signs = 0x0 m1=Hex(m1 & (signs||0xFFFF)) m1 = 0xFFF0 m2=Hex(m2 & (signs||0xFFFF)) m2 = 0xFFF7 m1=Hex((m1+m2)||signs) m1 = 0x1FFE7 Code: ; match other code names m1 equ eax m2 equ edx signs equ ecx ; some test values mov m1,$FFF0FFF0 mov m2,$FFF7FFF7 ; signs = (m1 ^ m2) & 0x8000800080008000; mov signs,m1 xor signs,m2 and signs,$80008000 ; m1 &= ~signs; not signs and m1,signs ; m2 &= ~signs; and m2,signs ; m1 += m2; add m1,m2 ; m1 ^= signs; not signs ; reverse above NOT xor m1,signs This works: Code: __paddw(uint64_t m1, uint64_t m2) { uint64_t d1 = m2 & 0x0000FFFF0000FFFF uint64_t d2 = m1 & 0xFFFF0000FFFF0000 m1 = m1 & 0x0000FFFF0000FFFF m2 = m2 & 0xFFFF0000FFFF0000 m1 += d1 m2 += d2 m1 = m1 & 0x0000FFFF0000FFFF m2 = m2 & 0xFFFF0000FFFF0000 return m1+m2; } _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
18 May 2008, 02:59 |
|
LocoDelAssembly 18 May 2008, 17:27
mattst88, from the link I posted check the version previous to bitRAKE's suggestion, maybe that one is clearer to see how and why it works. Adder (maybe helpful to understand the rounding part).
[edit]BTW, should I move this thread Heap or another subforum?[/edit] |
|||
18 May 2008, 17:27 |
|
mattst88 18 May 2008, 17:32
LocoDelAssembly wrote: mattst88, from the link I posted check the version previous to bitRAKE's suggestion, maybe that one is clearer to see how and why it works. Adder (maybe helpful to understand the rounding part). Thanks for the code. Quote: [edit]BTW, should I move this thread Heap or another subforum?[/edit] I didn't post it there because the Heap has become where all the bs philosophy and clueless physics topics are posted. _________________ My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions. Assembly Programmer's Journal |
|||
18 May 2008, 17:32 |
|
LocoDelAssembly 18 May 2008, 17:51
OK, moved to Main instead but please try to not keep it in a complete Cish way because it will need to be moved again then.
|
|||
18 May 2008, 17:51 |
|
mattst88 18 May 2008, 19:03
LocoDelAssembly wrote: OK, moved to Main instead but please try to not keep it in a complete Cish way because it will need to be moved again then. Ehh.. what? _________________ My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions. Assembly Programmer's Journal |
|||
18 May 2008, 19:03 |
|
mattst88 18 May 2008, 21:17
If I'd posted assembly, it would have been for alpha, and therefore useless for the vast majority of this user group...
|
|||
18 May 2008, 21:17 |
|
revolution 19 May 2008, 02:56
mattst88 wrote: If I'd posted assembly, it would have been for alpha, and therefore useless for the vast majority of this user group... |
|||
19 May 2008, 02:56 |
|
mattst88 14 Jun 2008, 21:43
I talked with the guy who wrote the original code and pointed out the overflow bug. He suggested changing one line, which I believe fixes the issue.
Code: __paddw(uint64_t m1, uint64_t m2) { uint64_t signs = (m1 ^ m2) & 0x8000800080008000L; m1 &= ~0x8000800080008000L; m2 &= ~signs; m1 += m2; m1 ^= signs; return m1; } See if you can find any bugs, or otherwise confirm it is working |
|||
14 Jun 2008, 21:43 |
|
vid 15 Jun 2008, 08:27
As long as we discuss algorithm or general ideas, I think posting C code is okay. Things are sometimes bit more readable that way.
|
|||
15 Jun 2008, 08:27 |
|
AlexP 19 Jun 2008, 02:40
Quote: Have you seen HACKMEM? |
|||
19 Jun 2008, 02:40 |
|
edfed 19 Jun 2008, 10:48
Code:
Single Instruction, Multiple Datas.
Vs Code: Reduced Instruction Set Cpu. they are not the same thing. if you try to implement it in a risc way, then, it is not SIMD. it is possible to do Single Function Multiple Datas in RISC way. but not SIMD in RISC. OKAY? only boolean intructions can be for MD. OP1 and OP2 OP1 or OP2 OP1 xor OP2 not OP1 else are math instructions and are like IIR filter, each elements (bits) depends on the state of its neighbourhood. OP1 + OP2 OP1 - OP2 neg OP1 OP1 * OP2 OP1 / OP2 these instructions cannot be SIMDed in the RISC way. speaking about SIMD in RISC like it: Code: mov signs,m1 xor signs,m2 and signs,$80008000 is a lack of culture and an error. words exist for what they mean. there, you can only speak about multiple datas threaded together. sorry for my boring attitude, but it is very important to not change the meaning of some words and defnitions |
|||
19 Jun 2008, 10:48 |
|
mattst88 20 Jun 2008, 15:08
edfed wrote:
I know these are different, but thanks for clarifying ...? edfed wrote: if you try to implement it in a risc way, then, it is not SIMD. That's getting kind of picky. Pedantic, even. But yes, I'm interested in implementing functions using multiple instructions to emulate an MMX/SSE style SIMD instruction for a performance benefit. edfed wrote: OKAY? OKAY! edfed wrote: ... is a lack of culture and an error. I beg to differ. I think I'm quite cultured. In fact, my 10th grade english teacher told me so when I played The Ballad of John and Yoko for my project on poetry. Aside: For fuck's sake, it's a Beatles song. It hit #1 on the charts. It's not some obscure song or genre that I'm introducing the world to. edfed wrote: words exist for what they mean. Would changing the topic to one about SFMD (single function, multiple data) appease you? _________________ My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions. Assembly Programmer's Journal |
|||
20 Jun 2008, 15:08 |
|
edfed 20 Jun 2008, 16:03
Quote: Would changing the topic to one about SFMD (single function, multiple data) appease you? yes, and sorry for the fun. |
|||
20 Jun 2008, 16:03 |
|
baldr 26 Jul 2008, 19:15
General idea is to prevent carry from MSB of [packed] word to LSB of the adjacent one (I mean, from bit 15 to 16, bit 31 to 32 and bit 47 to 48). So code
1) adds MSBs; Code: signs = (m1 ^ m2) & 0x8000800080008000L; 2) masks them out; Code: m1 &= 0x7FFF7FFF7FFF7FFFL; m2 &= 0x7FFF7FFF7FFF7FFFL; 3) adds the rest bits with carry; Code: m1 += m2; 4) adds (1) to (3) without carry; Code: m1 ^= signs; Seems to me, everything is correct. |
|||
26 Jul 2008, 19:15 |
|
MCD 13 Aug 2008, 13:41
bitRAKE wrote:
I can confirm this both mattst88 wrote:
doesn't seem to work for me, when I set m1= 0x0000 0000 0000 0003 and m2= 0x0000 0000 0000 8002, for example. The result should be 0x0000 0000 0000 8005, but I'm getting 0x0000 0000 0000 0000? Can anyone confirm this or are my computations errorneous? |
|||
13 Aug 2008, 13:41 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.