flat assembler
Message board for the users of flat assembler.
  
|  Index
      > DOS > Shortest code to spread a WORD to SSE reg | 
| Author | 
 | 
| Tomasz Grysztar 18 Oct 2019, 09:26 What is your target CPU generation? With AVX this becomes trivial:     Code: mov [si],ax vpbroadcastw xmm0,[si] | |||
|  18 Oct 2019, 09:26 | 
 | 
| Kuemmel 18 Oct 2019, 12:39 ...just as far as I know, the target (FreeDOS, or any DOS) at real mode doesn't seem to support AVX even if the CPU supports it...I'll test again when I've access to my computer, but that's what I googled IIRC. Last edited by Kuemmel on 18 Oct 2019, 12:45; edited 1 time in total | |||
|  18 Oct 2019, 12:39 | 
 | 
| Tomasz Grysztar 18 Oct 2019, 12:45 Oh, right, I forgot that VEX prefix only works in 16-bit protected mode, not in real mode or V86. | |||
|  18 Oct 2019, 12:45 | 
 | 
| Kuemmel 18 Oct 2019, 12:48 ...I read something here about activating AVX, but seems to have a lot of overhead then for sizecoding...
 http://masm32.com/board/index.php?topic=4134.0 | |||
|  18 Oct 2019, 12:48 | 
 | 
| Tomasz Grysztar 18 Oct 2019, 12:55 And even when you enable AVX for protected mode, the VEX-prefixed instruction are still not going to work in real mode, by design. | |||
|  18 Oct 2019, 12:55 | 
 | 
| Tomasz Grysztar 18 Oct 2019, 13:03 Anyway, as you seem to have at least SSE2, then this is something that comes to my mind:     Code: mov di,si ; 2 bytes stosw ; 1 byte stosw ; 1 byte pshufd xmm0,[si],0 ; 5 bytes | |||
|  18 Oct 2019, 13:03 | 
 | 
| Kuemmel 18 Oct 2019, 15:56 Thanks for the hint   ! With shufps (which can also used for integer and is 1 byte shorter than pshufd) we can go down one byte, as I need to preserve DI => Code: push di ; 1 byte mov di,si ; 2 bytes stosw ; 1 byte stosw ; 1 byte shufps xmm0,[si],0 ; 4 bytes pop di ; 1 byte | |||
|  18 Oct 2019, 15:56 | 
 | 
| Tomasz Grysztar 18 Oct 2019, 17:03 I'm afraid SHUFPS is not going to work here, since it can put values from the second operand only into high portion of the destination register.
 But, since you need to preserve DI, I have another idea: Code: push ax ; 1 byte push ax ; 1 byte pop dword [si] ; 3 bytes pshufd xmm0,[si],0 ; 5 bytes Last edited by Tomasz Grysztar on 18 Oct 2019, 19:10; edited 1 time in total | |||
|  18 Oct 2019, 17:03 | 
 | 
| Kuemmel 18 Oct 2019, 17:57 Thanks ! Totally forgot about the issue with shufps. Just one remark, "pop dword[si]" assembles to 3 bytes due to real mode (0x66 prefix), but still one byte saved    EDIT: I did some speed profiling. It seems the push/pop seems a bit slower compared to the following which uses the same amount of bytes in total: Code: mov word[si],ax mov word[si+2],ax pshufd xmm0,[si],0 | |||
|  18 Oct 2019, 17:57 | 
 | 
| bitRAKE 19 Oct 2019, 01:55 A rare corner case would allow the following code:     Code: movd xmm0,eax pshufb xmm0,[si] | |||
|  19 Oct 2019, 01:55 | 
 | 
| < Last Thread | Next Thread > | 
| Forum Rules: 
 | 
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.