flat assembler
Message board for the users of flat assembler.

Index > Main > Implementing SFMD functions the RISC way

Author
Thread Post new topic Reply to topic
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 14 May 2008, 05:11
I'm learning Alpha assembly, and in keeping with the RISC way of doing things, there are only very basic SIMD instructions: minimum and maximum of signed/unsigned words/bytes. That is, {min,max}{u,s}{b,w}

Whereas SSE provides instructions for everything and walks your dog for you, complicated instructions have to be built from basic instructions on the Alpha.

For instance, the SSE/MMX instructions paddusb, psubusb can be implemented as

Code:
static __inline uint64_t
__paddusb8(uint64_t m1, uint64_t m2) {
        return m1 + __minub8(m2, ~m1);
}

static __inline uint64_t
__psubusb8(uint64_t m1, uint64_t m2) {
        return m1 - __minub8(m2, m1);
}    


When I saw these, I thought they were kind of clever. Another interesting one (that I have not confirmed as working) is paddw:

Code:
__paddw(uint64_t m1, uint64_t m2) {
        uint64_t signs = (m1 ^ m2) & 0x8000800080008000;
        m1 &= ~signs;
        m2 &= ~signs;
        m1 += m2;
        m1 ^= signs;
        return m1;
}    


Here's an exercise for you: Using only the following instructions implement an SIMD instruction (SSE/MMX) that operates on byte or word integers.

Available instructions: and, or, xor, not, SIMD min/max for bytes/words, addition/subtraction. Note that the available add/subtract instructions are not SIMD aware and treat operands as single integers.

_________________
My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions.
Assembly Programmer's Journal


Last edited by mattst88 on 20 Jun 2008, 17:17; edited 1 time in total
Post 14 May 2008, 05:11
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 14 May 2008, 16:55
Post 14 May 2008, 16:55
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4075
Location: vpcmpistri
bitRAKE 15 May 2008, 02:31
Have you seen HACKMEM?
Fun read for any programmer, imho.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 15 May 2008, 02:31
View user's profile Send private message Visit poster's website Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 17 May 2008, 18:51
Code:
__paddw(uint64_t m1, uint64_t m2) {
        uint64_t signs = (m1 ^ m2) & 0x8000800080008000;
        m1 &= ~signs;
        m2 &= ~signs;
        m1 += m2;
        m1 ^= signs;
        return m1;
}    


Sure enough, this does seem to work. This function is the equivalent of paddw, even on systems without MMX.

Just one question, can anyone explain the logic to me?

I'm guessing here, but I think it figures out what the resulting words' signs will be, saves them, erases them from the operands, adds the operands, and restores the signs.

_________________
My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions.
Assembly Programmer's Journal
Post 17 May 2008, 18:51
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4075
Location: vpcmpistri
bitRAKE 18 May 2008, 02:59
mattst88 wrote:
Just one question, can anyone explain the logic to me?
m1 &= ~signs;
m2 &= ~signs;

These only clear the sign bits when m1 and m2 are different - meaning a carry takes place if both are set - which screws up the result of higher order words (i.e. it doesn't work exactly).

Using SpeQ I wrote a script to test values:
Code:
m1=Hex(0xFFF0)
  m1 = 0xFFF0
m2=Hex(0xFFF7)
       m2 = 0xFFF7
signs=Hex((m1 || m2) & 0x8000)
   signs = 0x0
m1=Hex(m1 & (signs||0xFFFF))
     m1 = 0xFFF0
m2=Hex(m2 & (signs||0xFFFF))
     m2 = 0xFFF7
m1=Hex((m1+m2)||signs)
       m1 = 0x1FFE7    
I don't like it and would rather just program in x86:
Code:
; match other code names
m1 equ eax
m2 equ edx
signs equ ecx

; some test values
mov m1,$FFF0FFF0
mov m2,$FFF7FFF7

; signs = (m1 ^ m2) & 0x8000800080008000;
mov signs,m1
xor signs,m2
and signs,$80008000
; m1 &= ~signs;
not signs
and m1,signs
; m2 &= ~signs;
and m2,signs
; m1 += m2;
add m1,m2
; m1 ^= signs;
not signs ; reverse above NOT
xor m1,signs    
...the high and low word are different.

This works:
Code:
__paddw(uint64_t m1, uint64_t m2) {
     uint64_t d1 = m2 & 0x0000FFFF0000FFFF
   uint64_t d2 = m1 & 0xFFFF0000FFFF0000
   m1 = m1 & 0x0000FFFF0000FFFF
    m2 = m2 & 0xFFFF0000FFFF0000
    m1 += d1
    m2 += d2

        m1 = m1 & 0x0000FFFF0000FFFF
    m2 = m2 & 0xFFFF0000FFFF0000
    return m1+m2;
}    

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 18 May 2008, 02:59
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 18 May 2008, 17:27
mattst88, from the link I posted check the version previous to bitRAKE's suggestion, maybe that one is clearer to see how and why it works. Adder (maybe helpful to understand the rounding part).

[edit]BTW, should I move this thread Heap or another subforum?[/edit]
Post 18 May 2008, 17:27
View user's profile Send private message Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 18 May 2008, 17:32
LocoDelAssembly wrote:
mattst88, from the link I posted check the version previous to bitRAKE's suggestion, maybe that one is clearer to see how and why it works. Adder (maybe helpful to understand the rounding part).


Thanks for the code. Smile

Quote:
[edit]BTW, should I move this thread Heap or another subforum?[/edit]


I didn't post it there because the Heap has become where all the bs philosophy and clueless physics topics are posted.

_________________
My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions.
Assembly Programmer's Journal
Post 18 May 2008, 17:32
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 18 May 2008, 17:51
OK, moved to Main instead but please try to not keep it in a complete Cish way because it will need to be moved again then.
Post 18 May 2008, 17:51
View user's profile Send private message Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 18 May 2008, 19:03
LocoDelAssembly wrote:
OK, moved to Main instead but please try to not keep it in a complete Cish way because it will need to be moved again then.


Ehh.. what?

_________________
My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions.
Assembly Programmer's Journal
Post 18 May 2008, 19:03
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 18 May 2008, 21:01
mattst88 wrote:
LocoDelAssembly wrote:
OK, moved to Main instead but please try to not keep it in a complete Cish way because it will need to be moved again then.


Ehh.. what?


I wonder the same Confused

I wanted to say that since this subforum is for general ASSEMBLY programming stuff it would be best to provide Assembly code that only use non-SIMD instructions and keep HLL contributions at minimun. No problem if the HLL code is paired with its Assembly counterpart though.
Post 18 May 2008, 21:01
View user's profile Send private message Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 18 May 2008, 21:17
If I'd posted assembly, it would have been for alpha, and therefore useless for the vast majority of this user group...
Post 18 May 2008, 21:17
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 19 May 2008, 02:56
mattst88 wrote:
If I'd posted assembly, it would have been for alpha, and therefore useless for the vast majority of this user group...
Hehe, if you had posted C, it would have been for HLL, and therefore useless for the vast majority of this user group..
Post 19 May 2008, 02:56
View user's profile Send private message Visit poster's website Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 14 Jun 2008, 21:43
I talked with the guy who wrote the original code and pointed out the overflow bug. He suggested changing one line, which I believe fixes the issue.

Code:
__paddw(uint64_t m1, uint64_t m2) {
        uint64_t signs = (m1 ^ m2) & 0x8000800080008000L;
        m1 &= ~0x8000800080008000L;
        m2 &= ~signs;
        m1 += m2;
        m1 ^= signs;
        return m1;
}    


See if you can find any bugs, or otherwise confirm it is working Smile
Post 14 Jun 2008, 21:43
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 15 Jun 2008, 08:27
As long as we discuss algorithm or general ideas, I think posting C code is okay. Things are sometimes bit more readable that way.
Post 15 Jun 2008, 08:27
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 19 Jun 2008, 02:40
Quote:
Have you seen HACKMEM?
I love how their articles use giant pictures out of ASCII characters to make math symbols, like a root or sigma! That's a very nice reference.
Post 19 Jun 2008, 02:40
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 19 Jun 2008, 10:48
Code:
Single Instruction, Multiple Datas.
    

Vs
Code:
Reduced Instruction Set Cpu.
    


they are not the same thing.

if you try to implement it in a risc way, then, it is not SIMD.
it is possible to do Single Function Multiple Datas in RISC way.
but not SIMD in RISC.

OKAY?

only boolean intructions can be for MD.

OP1 and OP2
OP1 or OP2
OP1 xor OP2
not OP1

else are math instructions and are like IIR filter, each elements (bits) depends on the state of its neighbourhood.
OP1 + OP2
OP1 - OP2
neg OP1
OP1 * OP2
OP1 / OP2

these instructions cannot be SIMDed in the RISC way.

speaking about SIMD in RISC like it:
Code:
mov signs,m1 
xor signs,m2 
and signs,$80008000 
    

is a lack of culture and an error.

words exist for what they mean.
there, you can only speak about multiple datas threaded together.

Laughing
sorry for my boring attitude, but it is very important to not change the meaning of some words and defnitions
Post 19 Jun 2008, 10:48
View user's profile Send private message Visit poster's website Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 20 Jun 2008, 15:08
edfed wrote:
Code:
Single Instruction, Multiple Datas.
    

Vs
Code:
Reduced Instruction Set Cpu.
    


they are not the same thing.


I know these are different, but thanks for clarifying ...?

edfed wrote:
if you try to implement it in a risc way, then, it is not SIMD.
it is possible to do Single Function Multiple Datas in RISC way.
but not SIMD in RISC.


That's getting kind of picky. Pedantic, even. But yes, I'm interested in implementing functions using multiple instructions to emulate an MMX/SSE style SIMD instruction for a performance benefit.

edfed wrote:
OKAY?


OKAY!

edfed wrote:
... is a lack of culture and an error.

I beg to differ. I think I'm quite cultured. In fact, my 10th grade english teacher told me so when I played The Ballad of John and Yoko for my project on poetry. Aside: For fuck's sake, it's a Beatles song. It hit #1 on the charts. It's not some obscure song or genre that I'm introducing the world to.

edfed wrote:
words exist for what they mean.


Would changing the topic to one about SFMD (single function, multiple data) appease you?

_________________
My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions.
Assembly Programmer's Journal
Post 20 Jun 2008, 15:08
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 20 Jun 2008, 16:03
Quote:
Would changing the topic to one about SFMD (single function, multiple data) appease you?


yes, and sorry for the fun.

Smile
Post 20 Jun 2008, 16:03
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 26 Jul 2008, 19:15
General idea is to prevent carry from MSB of [packed] word to LSB of the adjacent one (I mean, from bit 15 to 16, bit 31 to 32 and bit 47 to 48). So code
1) adds MSBs;
Code:
signs = (m1 ^ m2) & 0x8000800080008000L;    

2) masks them out;
Code:
m1 &= 0x7FFF7FFF7FFF7FFFL;
m2 &= 0x7FFF7FFF7FFF7FFFL;    

3) adds the rest bits with carry;
Code:
m1 += m2;    

4) adds (1) to (3) without carry;
Code:
m1 ^= signs;    

Seems to me, everything is correct.
Post 26 Jul 2008, 19:15
View user's profile Send private message Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 13 Aug 2008, 13:41
bitRAKE wrote:

This works:
Code:
__paddw(uint64_t m1, uint64_t m2) {
    uint64_t d1 = m2 & 0x0000FFFF0000FFFF
   uint64_t d2 = m1 & 0xFFFF0000FFFF0000
   m1 = m1 & 0x0000FFFF0000FFFF
    m2 = m2 & 0xFFFF0000FFFF0000
    m1 += d1
    m2 += d2

        m1 = m1 & 0x0000FFFF0000FFFF
    m2 = m2 & 0xFFFF0000FFFF0000
    return m1+m2;
}    

I can confirm this

both
mattst88 wrote:

Code:
__paddw(uint64_t m1, uint64_t m2) {
        uint64_t signs = (m1 ^ m2) & 0x8000800080008000;
        m1 &= ~signs;
        m2 &= ~signs;
        m1 += m2;
        m1 ^= signs;
        return m1;
}    


Code:
__paddw(uint64_t m1, uint64_t m2) {
        uint64_t signs = (m1 ^ m2) & 0x8000800080008000L;
        m1 &= ~0x8000800080008000L;
        m2 &= ~signs;
        m1 += m2;
        m1 ^= signs;
        return m1;
}    


doesn't seem to work for me, when I set m1= 0x0000 0000 0000 0003 and m2= 0x0000 0000 0000 8002, for example.
The result should be 0x0000 0000 0000 8005, but I'm getting 0x0000 0000 0000 0000?

Can anyone confirm this or are my computations errorneous?
Post 13 Aug 2008, 13:41
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.