flat assembler
Message board for the users of flat assembler.

Index > Main > Exponential instruction ?

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
Mino



Joined: 14 Jan 2018
Posts: 163
Mino 30 Apr 2018, 00:21
Hello,
I would like to know if there was an instruction dedicated to exhibitors' calculations. There are add, sub, div, ... but is there any instruction for exponential?

_________________
The best way to predict the future is to invent it.
Post 30 Apr 2018, 00:21
View user's profile Send private message Reply with quote
alexfru



Joined: 23 Mar 2014
Posts: 80
alexfru 30 Apr 2018, 01:18
f2xm1?
Post 30 Apr 2018, 01:18
View user's profile Send private message Reply with quote
donn



Joined: 05 Mar 2010
Posts: 321
donn 30 Apr 2018, 01:36
Yeah, I don't think there is a single exponential instruction on newer instruction sets?

x87 had the logarithmic instructions: f2xm1, fscale, fyl2x, fyl2xp1, as mentioned, but I think the x87 instructions are usually discouraged.

f2xm1:
Quote:
This instruction, when used in conjunction with the FYL2X instruction, can be applied to calculate z=x y by taking advantage of the log property xy =2y*log2x.


When the exponent is an integer, mul can be used in an accumulating loop. There are square root functions and reciprocal functions when dealing with exponents as .5 and negative exponents. rsqrtss is an example. They use the Newton-Raphson method to approximate the result. I was just reading about Newton-Raphson in an elasticity/mechanics book by J.T. Oden, and want to try to compute it. Aside from x87, and values of .5, not sure if there is a fractional exponent instruction?

If you didn't mean exponential functions, but euler's number, I guess you could just load it as a constant, or calculate that as well with factorials or other approaches.
Post 30 Apr 2018, 01:36
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20306
Location: In your JS exploiting you and your system
revolution 30 Apr 2018, 02:37
donn wrote:
... I think the x87 instructions are usually discouraged.
I can't see them being dropped from the CPUs any time soon. And even if they are dropped from the hardware sometime in the future, all the major OSes will emulate them. Emulation is not hard, it was being done a long time ago when the x87 chips were an optional extra. So I don't think there is any issue with using the x87 instructions.
Post 30 Apr 2018, 02:37
View user's profile Send private message Visit poster's website Reply with quote
donn



Joined: 05 Mar 2010
Posts: 321
donn 30 Apr 2018, 04:23
In terms of compatibility, the x87 instructions are probably safe to use way down the road. They also support double-extended precision, 80 bits of accuracy on scalar types.

These special x87 features are good examples to use X87 instead of SSE:

From Intel's optimization manual:
Quote:
Assembly/Compiler Coding Rule 62. (M impact, M generality) Use Streaming SIMD Extensions 2 or Streaming SIMD Extensions unless you need an x87 feature. Most SSE2 arithmetic operations have shorter latency then their X87 counterpart and they eliminate the overhead associated with the management of the X87 register stack.


Aside from the compatibility consideration, there is the performance consideration also. Maybe this is why the transcendental (trigonometric and exponential) functions were not carried over to the SSE/AVX instruction sets in full. You can optimize for latency or throughput differently with different accuracy.

Intel says:

Quote:
Although x87 supports transcendental instructions, software library implementation of transcendental function can be faster in many cases.
They also recommend inlining instead of calling the function for performance. Someone provided a neat lookup-table based method of computing sine cosine here, and I've implemented sine cosine, tangent and arctangent with Taylor/Maclaurin series in only a couple instructions, but with less accuracy.
Post 30 Apr 2018, 04:23
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4024
Location: vpcmpistri
bitRAKE 30 Apr 2018, 05:08
Another example of what donn is saying:

https://stackoverflow.com/questions/47025373/fastest-implementation-of-exponential-function-using-sse
(see njuffa's post for different precision routines)
Post 30 Apr 2018, 05:08
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20306
Location: In your JS exploiting you and your system
revolution 30 Apr 2018, 05:21
bitRAKE wrote:
Another example of what donn is saying:

https://stackoverflow.com/questions/47025373/fastest-implementation-of-exponential-function-using-sse
(see njuffa's post for different precision routines)
That is good. But it is complex and uses more of the Icache. If you are doing millions of them each second then it might be worthwhile to invest the time to code it. Else I'd just stick with the x87 single instruction.
Post 30 Apr 2018, 05:21
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 30 Apr 2018, 07:21
AVX-512 ER comes with VEXP2PS/VEXP2PD - described as "Approximation to the Exponential 2^x of Packed Double-Precision Floating-Point Values with Less Than 2^-23 Relative Error".
Post 30 Apr 2018, 07:21
View user's profile Send private message Visit poster's website Reply with quote
Mino



Joined: 14 Jan 2018
Posts: 163
Mino 30 Apr 2018, 07:41
Thank you for all your answers, it's great Very Happy

So, if I understood correctly: FASM does not have arithmetic instructions to calculate exponents. However, you can use x86 instructions for such uses, for example this one :
f2xm1
However, is it only allows to calculate the given exponent, with a base already defined at 2?

Would you recommend me to create a dedicated function ( https://pastebin.com/r9TUmUuT ), as shown in the links of your answers, or use this instruction (or other(s) )?

_________________
The best way to predict the future is to invent it.
Post 30 Apr 2018, 07:41
View user's profile Send private message Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 30 Apr 2018, 11:13
Mino wrote:
FASM does not have arithmetic instructions to calculate exponents.

Instructions are implemented in CPU, FASM doesn’t really have an xadd.

Mino wrote:
Would you recommend me to create a dedicated function ( https://pastebin.com/r9TUmUuT ), as shown in the links of your answers, or use this instruction (or other(s) )?

There will always be things that are not readily available.
Post 30 Apr 2018, 11:13
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20306
Location: In your JS exploiting you and your system
revolution 30 Apr 2018, 11:16
DimonSoft wrote:
Mino wrote:
FASM does not have arithmetic instructions to calculate exponents.

Instructions are implemented in CPU, FASM doesn’t really have an xadd.
I expect Mino meant "FASM does not have arithmetic operators to calculate exponents".
Post 30 Apr 2018, 11:16
View user's profile Send private message Visit poster's website Reply with quote
donn



Joined: 05 Mar 2010
Posts: 321
donn 30 Apr 2018, 15:36
Right, the quickest and simplest way to get an exponential function implemented (mul iterations could be used if you are just squaring, for instance) is:

Quote:
Else I'd just stick with the x87


You can put your base (x) in st0, exponent (y) in st1, call fyl2x, call f2xm1, and then add 1, I believe: x^y = z.

I'm a bit tired today, but I think those are the steps, which could be verified (or disproved) with a calculator or testing. If you want a base other than 2, you can use these two instructions in conjunction, as the AMD docs mention.

If you want to tune or improve performance, inlining could be the next step, or the AVX-512 ER VEXP2PS/VEXP2PD instructions if you have those.

Again, I'm a bit tired today so someone may correct me if what I'm saying is off, but I've received a lot of good advice from these guys here, so their recommendations are pretty safe bets for learning and in practice. Also, I've found the cvtss2si instructions helpful (there are few combinations documented in the manuals) for debugging if you don't have a way to view floating point numbers yet.
Post 30 Apr 2018, 15:36
View user's profile Send private message Reply with quote
Mino



Joined: 14 Jan 2018
Posts: 163
Mino 30 Apr 2018, 20:01
Thank you very much for those explanations and clarifications. This will probably be very useful for the rest of my project Wink
Post 30 Apr 2018, 20:01
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 02 May 2018, 02:07
Check Free Pascal's rtl/i386/math.inc for function fpc_exp_real (etc etc).
Post 02 May 2018, 02:07
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2493
Furs 02 May 2018, 20:08
FYI x87 is part of the ABI on both x86 and x64 (for the latter it's when you use the long double type in C). I don't think it will be dropped. It would be a real shame if they did drop it, since 80-bit precision is really useful for certain cases (especially when working with math on 64-bit integers, which double can't -- I mean "math" like sqrt on them and stuff like that).

It's also unique with the register stack, and compact in encoding, which is pretty cool concept for me (stack-machines are interesting also). I know I'm against the "trend" or pissing against the wind since everyone seems to hate the register stack and prefer straight (and bloated encoding) of registers with 3 or more operands... because of "simplicity" (that's arguable, to me stack-based machine is MUCH simpler to implement, and x87 is far simpler than SSE, but whatever) or due to retarded compilers' generated code... meh Confused
Post 02 May 2018, 20:08
View user's profile Send private message Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 02 May 2018, 22:03
Furs wrote:
FYI x87 is part of the ABI on both x86 and x64 (for the latter it's when you use the long double type in C). I don't think it will be dropped. It would be a real shame if they did drop it, since 80-bit precision is really useful for certain cases (especially when working with math on 64-bit integers, which double can't -- I mean "math" like sqrt on them and stuff like that).

It's also unique with the register stack, and compact in encoding, which is pretty cool concept for me (stack-machines are interesting also). I know I'm against the "trend" or pissing against the wind since everyone seems to hate the register stack and prefer straight (and bloated encoding) of registers with 3 or more operands... because of "simplicity" (that's arguable, to me stack-based machine is MUCH simpler to implement, and x87 is far simpler than SSE, but whatever) or due to retarded compilers' generated code... meh Confused

It won’t be dropped for compatibility reasons. Although they can try some day but that is going to be a big failure.

And, speaking about SSE, let me say that instruction names are overbloated there. It’s always a pain to look for the right instruction, especially if you want to target, say, SSE2–, not the very recent version.
Post 02 May 2018, 22:03
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 05 May 2018, 00:17
I don't think "standard" C supports "long double". IIRC, anything using MSVCRT.DLL (e.g. TinyC) only supports "long double" same as "double" (64-bit). OpenWatcom too, IIRC, but I could be remembering incorrectly. Not sure if all standard(s) are the same on this, though (C99 vs. C11 or C17). Things do change sometimes. Just because GCC supports it doesn't mean everyone else does. (Not sure about MinGW either, which probably still "mostly" relies on MSVCRT.DLL.)

Another problem with the FPU is that it's always misaligned, but I guess the OS can (sometimes) be smart enough to use (late P2-era) FXSAVE, which is reputedly faster than FNSAVE.

IIRC, many architectures (DEC Alpha?) didn't support beyond "double" (64-bit) anyways. IIRC, Oberon usually supports "REAL" and "LONGREAL" but nothing else. Turbo Pascal/Delphi/FPC all have Extended, but I don't know how (or if) it fully supports that on AMD64/SSE2 or whatever other platforms.

I always got the feeling that Intel wanted to deprecate/remove x87 entirely in lieu of SSE2. Nowadays, with AVX-512, who knows?? GCC always assumed an FPU, but I swear many compilers have become "SSE2 only" anyways. So while I may not agree, I do think it's definitely deprecated and shunned and won't be supported forever.
Post 05 May 2018, 00:17
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2493
Furs 05 May 2018, 23:37
long double exists since C89 and improved in C99 even with standard library functions (the 'l' suffix versions for the math functions). As usual, it's msvcrt which is non-compliant, but that's nothing new.

long double is part of the Linux ABI and calling convention in both 32-bit and 64-bit. For example in x64 (which assumes SSE2, that's why I'm mentioning it), if you make a function returning long double, it's required to place the value in x87. Even returning a "complex" long double is defined in the ABI -- where it's mandated to return the real part in st0 and imaginary in st1.

But the type itself also usually forces code to use x87. Yeah, GCC is great here since you can fine-tune the fpmath to use with a setting, even without long double, but I think it works at least with Clang and ICC, no idea about Visual Studio's compiler.
Post 05 May 2018, 23:37
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 06 May 2018, 01:11
Post 06 May 2018, 01:11
View user's profile Send private message Visit poster's website Reply with quote
Mino



Joined: 14 Jan 2018
Posts: 163
Mino 06 May 2018, 09:42
Furs wrote:
As usual, it's msvcrt which is non-compliant, but that's nothing new.


Is it then a "bad" thing to use it in our projects? Is there another library (compatible with Linux by the way) that is more "powerful" and compliant?

_________________
The best way to predict the future is to invent it.
Post 06 May 2018, 09:42
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.