flat assembler
Message board for the users of flat assembler.

Index > Projects and Ideas > [IDEA] Abusing PCLMULQDQ to improve algorithms

Author
Thread Post new topic Reply to topic
Madis731



Joined: 25 Sep 2003
Posts: 2145
Location: Estonia
Madis731
Intro:
There's a new AVX coming out in erm...2010? AVX
Okay its all fine to move/shuffle/add 256 bits of data at once, but I was thinking about something completely new.
Now to the point:
Lets take this idea of magic numbers and develop it (Terje's algorithm)
There are other algorithm samples here on the board that I care not to search for now. What is important is to see
the carryless multiplication (PCLMULQDQ) as a "supershuffling agent under cover". When a regular multiplication
shifts the result far from the original location, the carryless variant might save us a few shifts or rotations (or might not?).

Some examples (updating from time to time):
Code:
; 1) http://www.jjj.de/fxt/ the pdf chapters 1.3 and 1.4 and related.
; Detecting bitstream transitions 0>1, 1>0 when multiplying carrylessly with
; 3 (11b) the answer will tell you the edges where the stream of bits toggle.
movdqa xmm0, const_111100001101101000b
pclmulqdq xmm0,const_11b
          ; xmm0= 1000100010110111000b

; 2) Using carryless multiply to square the number (multiply with itself) the
; answer will be a bitwise unpack doubling the number of bits.
; (Compare with PUNPCKBW/WD/DQ (x)mm,(x)mm/m64/128)
movdqa xmm0, const_111100001101101000b
pclmulqdq xmm0,const_11b
;xmm0= 010101010000000001010001010001000000b

; 3) The parity check can be achieved by (carryless-) multiplying with all the
; interesting bits set to one and then shifting:
movdqa xmm0, const_111100001101101000b
pclmulqdq xmm0,const_11111111b
psrldq xmm0,8-1 ; interesting bits - 1
;if this instruction used GPRs then it would be possible to shift into carry!
;counting only lower 8 bits which are 01101000b
;xmm0[0] = 0 xor 1 xor 1 xor 0 xor 1 xor 0 xor 0 xor 0 = 1

    

_________________
My updated idol Very Happy http://www.agner.org/optimize/


Last edited by Madis731 on 28 Aug 2008, 10:37; edited 2 times in total
Post 25 Aug 2008, 09:05
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16892
Location: In your JS exploiting you and your system
revolution
Carryless means it is used for polynomial multiplications. That means each individual bit has no carry to the next bit. Good for CRC and Base-2 Elliptic curve computations. I doubt it can be used for a replacement of regular multiplication. But if I am wrong then I would be happy to learn how to use it for general multiply also.
Post 25 Aug 2008, 11:18
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2145
Location: Estonia
Madis731
The problem with its use is that I can't find any description of how to use it or how its used other than "for doing some encryption & stuff".
I'd like to see some pseudocode or some VHDL to describe it. I haven't seen carryless multiply mentioned ever before Intel Sad
Post 25 Aug 2008, 11:32
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16892
Location: In your JS exploiting you and your system
revolution
Carryless means no carry, yes, really, it is simpler than normal multiply. All the bits are just XORed since the carry is not needed. You can write out the normal long hand multiply but just "forget" to carry over anything into the next column and then you have carryless multiply, or base-2 polynomial multiply for the fancy name.
Post 25 Aug 2008, 11:43
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2145
Location: Estonia
Madis731
I will give it a try:
Code:
   *101
   1001
 ======
+   101
   000
  000
 101
 ======
 101101 ;normal and carryless are the same

   *101
   1101
 ======
+   101
   000
  101
 101
 ======
1000001 ;normal
 111101 ;carryless?
    

Wow, from APRIL Neutral
link
Sorry about "no info on the topic" Sad

EDIT2:
This image was created by applying "display if CLMUL(A,B) != MUL(A,B)"
Image

EDIT3:
I won't show you the binary output here, but it will reveal some magic already. For example:
1) When you CLMUL with 3 (11b) then the answer will tell you the edges where the stream of bits will toggle 0=>1 or 1=>0.
2) CLMUL with the number itself yields a bitwise unpack
Compare ~ PUNPCKbit-twobit xmm0,xmm0
Code:
;Ex.1:
 111100001101101000b CLMUL 11b=
1000100010110111000b
;Ex.2:
 1111010111b CLMUL 1111010111b=
 01010101000100010101b
;ASCII Smile
;   1101 CLMUL 1101
;  / || \
;  | | | |
; 01010001
    


Last edited by Madis731 on 25 Aug 2008, 18:46; edited 1 time in total
Post 25 Aug 2008, 16:07
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Code:
xor  eax, eax
mov  edx, multiplicand
mov  ecx, multiplier

.loop:
  shr  ecx, 1
  sbb  ebx, ebx
  and  ebx, edx
  xor  eax, ebx
  shl  edx, 1

  test ecx, ecx
  jnz  .loop

; EAX = Result (mod 2^32)
    
Question
Post 25 Aug 2008, 18:18
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16892
Location: In your JS exploiting you and your system
revolution
Madis731 & LocoDelAssembly: Both of your suggestions look perfectly fine to me. You both seem to have grasped the concept quickly.

Also, you might like to read about Galios Field (GF), since that is what you two have implemented above.

Once the fancy names like GF are added many people seem to switch off the brain and think "ooh, this is getting too hard". But once the basic idea is grasped the fancy names are no longer scary.
Post 25 Aug 2008, 20:19
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4225
Location: 2018
edfed
one more communication security algorithm?
Post 26 Aug 2008, 19:48
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16892
Location: In your JS exploiting you and your system
revolution
edfed wrote:
one more communication security algorithm?
What do you mean?
Post 26 Aug 2008, 19:54
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4225
Location: 2018
edfed
CRC, MD5, and CLMUL.

as it will signal the edges of a frame.

nowadays, communication is always made with serial signals.
theses signals can be securized using a machester code.
or a CLMUL3?

i don't know more about, but it limits the possible errors.

MANCHESTER CODE
Quote:

00=0101
01=0110
10=1001
11=1010
Post 26 Aug 2008, 20:01
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16892
Location: In your JS exploiting you and your system
revolution
I thought Manchester Code was primarily used to increase the reliability of the signal, at the cost of bandwidth. AFAIK there is no error check defined as part of Manchester Code. But there are many types of error codes that can be added to a data stream. Depending upon the characteristics of the communication channel different schemes would be used to maximise the efficiency of signal recovery.
Post 26 Aug 2008, 20:15
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2145
Location: Estonia
Madis731
edfed wrote:
CRC, MD5, and CLMUL.

as it will signal the edges of a frame.

nowadays, communication is always made with serial signals.
theses signals can be securized using a machester code.
or a CLMUL3?

i don't know more about, but it limits the possible errors.

MANCHESTER CODE
Quote:

00=0101
01=0110
10=1001
11=1010

I don't get where does the security come from and where does error-tolerancy come from?

You can't even definitely say upto one bit changed which one of these four it should be. 0111 - is it 0101 or is it 0110?
Usually redundancy is applied through streams of uneven number of the same bit (like 000 or 0 or 111 for 1). 010 and 001 can be corrected to 0.


EDIT: Will update first post from now on!

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 27 Aug 2008, 07:23
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2019, Tomasz Grysztar.

Powered by rwasa.