flat assembler
Message board for the users of flat assembler.

Index > Main > What is faster Div or SSE2 divss ?

Author
Thread Post new topic Reply to topic
Roman



Joined: 21 Apr 2012
Posts: 561
Roman
What is faster Div or SSE2 divss ?
And how write asm code work like Div ?
And asm code faster than Div (asm comand)
Post 09 Aug 2013, 09:42
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 721
tthsqe
The timings of division instructions (integer, single, double) depend on the divisor. Why don't you run some tests on you own computer?
Post 09 Aug 2013, 10:10
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 561
Roman
tthsqe
ok !
PS: А так хотелось поговорить,пообщяться. Smile
Post 09 Aug 2013, 11:18
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2829
Location: dank orb
bitRAKE
A) What is faster Div or SSE2 divss ?

They use the same execution units.

B) And how write asm code work like Div ?

a. Binary: Shift and subtract until zero.
b. Approximate and check/modify. (Newton's method)

C) And asm code faster than Div (asm comand)

Only possible for constant cases. At runtime it might be useful to calculate 1/x and multiply. Often algorithms can be setup in that fashion - to eliminate or reduce division use.

Easiest method:
Code:
mul [_1dX] ; 2^32 / X
; EDX is integer, EAX is fraction    


Obviously, these also apply to remainder or modulus.
Post 09 Aug 2013, 14:02
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2145
Location: Estonia
Madis731
Your questions cannot be answered because DIV/IDIV work on integers, but DIVSS/DIVPS work on floating point numbers. They are generally equally fast, integer one taking 20-27 clocks while floating point taking 10-14 clock to execute (these are the latencies for Sandy Bridge CPU). While comparing DIVSS to FDIV you lose precision. FDIV takes 10-24 clocks. So you see they are competing in the same class, measured in "tens of clocks".

There are some tricks (with integer arithmetric):
http://www.azillionmonkeys.com/qed/adiv.html
For example I often find myself using (x*0x55555556) shr 32 instead of x/3
because in simple cases it works and is a lot faster. You can easily extend this
to division by 6, 12 etc by shifting. Useful in 24-bi / 32-bit RGB / RGBA conversions.
Post 12 Aug 2013, 05:29
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.