[IDEA] Abusing PCLMULQDQ to improve algorithms

Index > Projects and Ideas > [IDEA] Abusing PCLMULQDQ to improve algorithms

Author

Thread

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 25 Aug 2008, 09:05

Intro:
There's a new AVX coming out in erm...2010? AVX
Okay its all fine to move/shuffle/add 256 bits of data at once, but I was thinking about something completely new.
Now to the point:
Lets take this idea of magic numbers and develop it (Terje's algorithm)
There are other algorithm samples here on the board that I care not to search for now. What is important is to see
the carryless multiplication (PCLMULQDQ) as a "supershuffling agent under cover". When a regular multiplication
shifts the result far from the original location, the carryless variant might save us a few shifts or rotations (or might not?).

Some examples (updating from time to time):

Code:

; 1) http://www.jjj.de/fxt/ the pdf chapters 1.3 and 1.4 and related.
; Detecting bitstream transitions 0>1, 1>0 when multiplying carrylessly with
; 3 (11b) the answer will tell you the edges where the stream of bits toggle.
movdqa xmm0, const_111100001101101000b
pclmulqdq xmm0,const_11b
          ; xmm0= 1000100010110111000b

; 2) Using carryless multiply to square the number (multiply with itself) the
; answer will be a bitwise unpack doubling the number of bits.
; (Compare with PUNPCKBW/WD/DQ (x)mm,(x)mm/m64/128)
movdqa xmm0, const_111100001101101000b
pclmulqdq xmm0,const_11b
;xmm0= 010101010000000001010001010001000000b

; 3) The parity check can be achieved by (carryless-) multiplying with all the
; interesting bits set to one and then shifting:
movdqa xmm0, const_111100001101101000b
pclmulqdq xmm0,const_11111111b
psrldq xmm0,8-1 ; interesting bits - 1
;if this instruction used GPRs then it would be possible to shift into carry!
;counting only lower 8 bits which are 01101000b
;xmm0[0] = 0 xor 1 xor 1 xor 0 xor 1 xor 0 xor 0 xor 0 = 1

_________________
My updated idol Very Happy

http://www.agner.org/optimize/

Last edited by Madis731 on 28 Aug 2008, 10:37; edited 2 times in total

25 Aug 2008, 09:05

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20700
Location: In your JS exploiting you and your system

revolution 25 Aug 2008, 11:18

Carryless means it is used for polynomial multiplications. That means each individual bit has no carry to the next bit. Good for CRC and Base-2 Elliptic curve computations. I doubt it can be used for a replacement of regular multiplication. But if I am wrong then I would be happy to learn how to use it for general multiply also.

25 Aug 2008, 11:18

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 25 Aug 2008, 11:32

The problem with its use is that I can't find any description of how to use it or how its used other than "for doing some encryption & stuff".
I'd like to see some pseudocode or some VHDL to describe it. I haven't seen carryless multiply mentioned ever before Intel Sad

25 Aug 2008, 11:32

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20700
Location: In your JS exploiting you and your system

revolution 25 Aug 2008, 11:43

Carryless means no carry, yes, really, it is simpler than normal multiply. All the bits are just XORed since the carry is not needed. You can write out the normal long hand multiply but just "forget" to carry over anything into the next column and then you have carryless multiply, or base-2 polynomial multiply for the fancy name.

25 Aug 2008, 11:43

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 25 Aug 2008, 16:07

I will give it a try:

Code:

   *101
   1001
 ======
+   101
   000
  000
 101
 ======
 101101 ;normal and carryless are the same

   *101
   1101
 ======
+   101
   000
  101
 101
 ======
1000001 ;normal
 111101 ;carryless?

Wow, from APRIL Neutral

link
Sorry about "no info on the topic" Sad

EDIT2:
This image was created by applying "display if CLMUL(A,B) != MUL(A,B)"

EDIT3:
I won't show you the binary output here, but it will reveal some magic already. For example:
1) When you CLMUL with 3 (11b) then the answer will tell you the edges where the stream of bits will toggle 0=>1 or 1=>0.
2) CLMUL with the number itself yields a bitwise unpack
Compare ~ PUNPCKbit-twobit xmm0,xmm0

Code:

;Ex.1:
 111100001101101000b CLMUL 11b=
1000100010110111000b
;Ex.2:
 1111010111b CLMUL 1111010111b=
 01010101000100010101b
;ASCII 
;   1101 CLMUL 1101
;  / || \
;  | | | |
; 01010001

Last edited by Madis731 on 25 Aug 2008, 18:46; edited 1 time in total

25 Aug 2008, 16:07

LocoDelAssembly
Your code has a bug

Joined: 06 May 2005
Posts: 4623
Location: Argentina

LocoDelAssembly 25 Aug 2008, 18:18

Code:

xor  eax, eax
mov  edx, multiplicand
mov  ecx, multiplier

.loop:
  shr  ecx, 1
  sbb  ebx, ebx
  and  ebx, edx
  xor  eax, ebx
  shl  edx, 1

  test ecx, ecx
  jnz  .loop

; EAX = Result (mod 2^32)

25 Aug 2008, 18:18

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20700
Location: In your JS exploiting you and your system

revolution 25 Aug 2008, 20:19

Madis731 & LocoDelAssembly: Both of your suggestions look perfectly fine to me. You both seem to have grasped the concept quickly.

Also, you might like to read about Galios Field (GF), since that is what you two have implemented above.

Once the fancy names like GF are added many people seem to switch off the brain and think "ooh, this is getting too hard". But once the basic idea is grasped the fancy names are no longer scary.

25 Aug 2008, 20:19

edfed

Joined: 20 Feb 2006
Posts: 4350
Location: Now

edfed 26 Aug 2008, 19:48

one more communication security algorithm?

26 Aug 2008, 19:48

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20700
Location: In your JS exploiting you and your system

revolution 26 Aug 2008, 19:54

edfed wrote:

one more communication security algorithm?

What do you mean?

26 Aug 2008, 19:54

edfed

Joined: 20 Feb 2006
Posts: 4350
Location: Now

edfed 26 Aug 2008, 20:01

CRC, MD5, and CLMUL.

as it will signal the edges of a frame.

nowadays, communication is always made with serial signals.
theses signals can be securized using a machester code.
or a CLMUL3?

i don't know more about, but it limits the possible errors.

MANCHESTER CODE

Quote:

00=0101
01=0110
10=1001
11=1010

26 Aug 2008, 20:01

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20700
Location: In your JS exploiting you and your system

revolution 26 Aug 2008, 20:15

I thought Manchester Code was primarily used to increase the reliability of the signal, at the cost of bandwidth. AFAIK there is no error check defined as part of Manchester Code. But there are many types of error codes that can be added to a data stream. Depending upon the characteristics of the communication channel different schemes would be used to maximise the efficiency of signal recovery.

26 Aug 2008, 20:15

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 27 Aug 2008, 07:23

edfed wrote:

CRC, MD5, and CLMUL.

as it will signal the edges of a frame.

nowadays, communication is always made with serial signals.
theses signals can be securized using a machester code.
or a CLMUL3?

i don't know more about, but it limits the possible errors.

MANCHESTER CODE

Quote:

00=0101
01=0110
10=1001
11=1010

I don't get where does the security come from and where does error-tolerancy come from?

You can't even definitely say upto one bit changed which one of these four it should be. 0111 - is it 0101 or is it 0110?
Usually redundancy is applied through streams of uneven number of the same bit (like 000 or 0 or 111 for 1). 010 and 001 can be corrected to 0.

EDIT: Will update first post from now on!

_________________
My updated idol Very Happy

http://www.agner.org/optimize/

27 Aug 2008, 07:23

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum