flat assembler
Message board for the users of flat assembler.
Index
> Main > Exponential instruction ? Goto page 1, 2 Next |
Author |
|
alexfru 30 Apr 2018, 01:18
f2xm1?
|
|||
30 Apr 2018, 01:18 |
|
donn 30 Apr 2018, 01:36
Yeah, I don't think there is a single exponential instruction on newer instruction sets?
x87 had the logarithmic instructions: f2xm1, fscale, fyl2x, fyl2xp1, as mentioned, but I think the x87 instructions are usually discouraged. f2xm1: Quote: This instruction, when used in conjunction with the FYL2X instruction, can be applied to calculate z=x y by taking advantage of the log property xy =2y*log2x. When the exponent is an integer, mul can be used in an accumulating loop. There are square root functions and reciprocal functions when dealing with exponents as .5 and negative exponents. rsqrtss is an example. They use the Newton-Raphson method to approximate the result. I was just reading about Newton-Raphson in an elasticity/mechanics book by J.T. Oden, and want to try to compute it. Aside from x87, and values of .5, not sure if there is a fractional exponent instruction? If you didn't mean exponential functions, but euler's number, I guess you could just load it as a constant, or calculate that as well with factorials or other approaches. |
|||
30 Apr 2018, 01:36 |
|
revolution 30 Apr 2018, 02:37
donn wrote: ... I think the x87 instructions are usually discouraged. |
|||
30 Apr 2018, 02:37 |
|
donn 30 Apr 2018, 04:23
In terms of compatibility, the x87 instructions are probably safe to use way down the road. They also support double-extended precision, 80 bits of accuracy on scalar types.
These special x87 features are good examples to use X87 instead of SSE: From Intel's optimization manual: Quote: Assembly/Compiler Coding Rule 62. (M impact, M generality) Use Streaming SIMD Extensions 2 or Streaming SIMD Extensions unless you need an x87 feature. Most SSE2 arithmetic operations have shorter latency then their X87 counterpart and they eliminate the overhead associated with the management of the X87 register stack. Aside from the compatibility consideration, there is the performance consideration also. Maybe this is why the transcendental (trigonometric and exponential) functions were not carried over to the SSE/AVX instruction sets in full. You can optimize for latency or throughput differently with different accuracy. Intel says: Quote: Although x87 supports transcendental instructions, software library implementation of transcendental function can be faster in many cases. |
|||
30 Apr 2018, 04:23 |
|
bitRAKE 30 Apr 2018, 05:08
Another example of what donn is saying:
https://stackoverflow.com/questions/47025373/fastest-implementation-of-exponential-function-using-sse (see njuffa's post for different precision routines) |
|||
30 Apr 2018, 05:08 |
|
revolution 30 Apr 2018, 05:21
bitRAKE wrote: Another example of what donn is saying: |
|||
30 Apr 2018, 05:21 |
|
Tomasz Grysztar 30 Apr 2018, 07:21
AVX-512 ER comes with VEXP2PS/VEXP2PD - described as "Approximation to the Exponential 2^x of Packed Double-Precision Floating-Point Values with Less Than 2^-23 Relative Error".
|
|||
30 Apr 2018, 07:21 |
|
Mino 30 Apr 2018, 07:41
Thank you for all your answers, it's great
So, if I understood correctly: FASM does not have arithmetic instructions to calculate exponents. However, you can use x86 instructions for such uses, for example this one : f2xm1 However, is it only allows to calculate the given exponent, with a base already defined at 2? Would you recommend me to create a dedicated function ( https://pastebin.com/r9TUmUuT ), as shown in the links of your answers, or use this instruction (or other(s) )? _________________ The best way to predict the future is to invent it. |
|||
30 Apr 2018, 07:41 |
|
DimonSoft 30 Apr 2018, 11:13
Mino wrote: FASM does not have arithmetic instructions to calculate exponents. Instructions are implemented in CPU, FASM doesn’t really have an xadd. Mino wrote: Would you recommend me to create a dedicated function ( https://pastebin.com/r9TUmUuT ), as shown in the links of your answers, or use this instruction (or other(s) )? There will always be things that are not readily available. |
|||
30 Apr 2018, 11:13 |
|
revolution 30 Apr 2018, 11:16
DimonSoft wrote:
|
|||
30 Apr 2018, 11:16 |
|
donn 30 Apr 2018, 15:36
Right, the quickest and simplest way to get an exponential function implemented (mul iterations could be used if you are just squaring, for instance) is:
Quote: Else I'd just stick with the x87 You can put your base (x) in st0, exponent (y) in st1, call fyl2x, call f2xm1, and then add 1, I believe: x^y = z. I'm a bit tired today, but I think those are the steps, which could be verified (or disproved) with a calculator or testing. If you want a base other than 2, you can use these two instructions in conjunction, as the AMD docs mention. If you want to tune or improve performance, inlining could be the next step, or the AVX-512 ER VEXP2PS/VEXP2PD instructions if you have those. Again, I'm a bit tired today so someone may correct me if what I'm saying is off, but I've received a lot of good advice from these guys here, so their recommendations are pretty safe bets for learning and in practice. Also, I've found the cvtss2si instructions helpful (there are few combinations documented in the manuals) for debugging if you don't have a way to view floating point numbers yet. |
|||
30 Apr 2018, 15:36 |
|
Mino 30 Apr 2018, 20:01
Thank you very much for those explanations and clarifications. This will probably be very useful for the rest of my project
|
|||
30 Apr 2018, 20:01 |
|
rugxulo 02 May 2018, 02:07
Check Free Pascal's rtl/i386/math.inc for function fpc_exp_real (etc etc).
|
|||
02 May 2018, 02:07 |
|
Furs 02 May 2018, 20:08
FYI x87 is part of the ABI on both x86 and x64 (for the latter it's when you use the long double type in C). I don't think it will be dropped. It would be a real shame if they did drop it, since 80-bit precision is really useful for certain cases (especially when working with math on 64-bit integers, which double can't -- I mean "math" like sqrt on them and stuff like that).
It's also unique with the register stack, and compact in encoding, which is pretty cool concept for me (stack-machines are interesting also). I know I'm against the "trend" or pissing against the wind since everyone seems to hate the register stack and prefer straight (and bloated encoding) of registers with 3 or more operands... because of "simplicity" (that's arguable, to me stack-based machine is MUCH simpler to implement, and x87 is far simpler than SSE, but whatever) or due to retarded compilers' generated code... meh |
|||
02 May 2018, 20:08 |
|
DimonSoft 02 May 2018, 22:03
Furs wrote: FYI x87 is part of the ABI on both x86 and x64 (for the latter it's when you use the long double type in C). I don't think it will be dropped. It would be a real shame if they did drop it, since 80-bit precision is really useful for certain cases (especially when working with math on 64-bit integers, which double can't -- I mean "math" like sqrt on them and stuff like that). It won’t be dropped for compatibility reasons. Although they can try some day but that is going to be a big failure. And, speaking about SSE, let me say that instruction names are overbloated there. It’s always a pain to look for the right instruction, especially if you want to target, say, SSE2–, not the very recent version. |
|||
02 May 2018, 22:03 |
|
rugxulo 05 May 2018, 00:17
I don't think "standard" C supports "long double". IIRC, anything using MSVCRT.DLL (e.g. TinyC) only supports "long double" same as "double" (64-bit). OpenWatcom too, IIRC, but I could be remembering incorrectly. Not sure if all standard(s) are the same on this, though (C99 vs. C11 or C17). Things do change sometimes. Just because GCC supports it doesn't mean everyone else does. (Not sure about MinGW either, which probably still "mostly" relies on MSVCRT.DLL.)
Another problem with the FPU is that it's always misaligned, but I guess the OS can (sometimes) be smart enough to use (late P2-era) FXSAVE, which is reputedly faster than FNSAVE. IIRC, many architectures (DEC Alpha?) didn't support beyond "double" (64-bit) anyways. IIRC, Oberon usually supports "REAL" and "LONGREAL" but nothing else. Turbo Pascal/Delphi/FPC all have Extended, but I don't know how (or if) it fully supports that on AMD64/SSE2 or whatever other platforms. I always got the feeling that Intel wanted to deprecate/remove x87 entirely in lieu of SSE2. Nowadays, with AVX-512, who knows?? GCC always assumed an FPU, but I swear many compilers have become "SSE2 only" anyways. So while I may not agree, I do think it's definitely deprecated and shunned and won't be supported forever. |
|||
05 May 2018, 00:17 |
|
Furs 05 May 2018, 23:37
long double exists since C89 and improved in C99 even with standard library functions (the 'l' suffix versions for the math functions). As usual, it's msvcrt which is non-compliant, but that's nothing new.
long double is part of the Linux ABI and calling convention in both 32-bit and 64-bit. For example in x64 (which assumes SSE2, that's why I'm mentioning it), if you make a function returning long double, it's required to place the value in x87. Even returning a "complex" long double is defined in the ABI -- where it's mandated to return the real part in st0 and imaginary in st1. But the type itself also usually forces code to use x87. Yeah, GCC is great here since you can fine-tune the fpmath to use with a setting, even without long double, but I think it works at least with Clang and ICC, no idea about Visual Studio's compiler. |
|||
05 May 2018, 23:37 |
|
rugxulo 06 May 2018, 01:11
|
|||
06 May 2018, 01:11 |
|
Mino 06 May 2018, 09:42
Furs wrote: As usual, it's msvcrt which is non-compliant, but that's nothing new. Is it then a "bad" thing to use it in our projects? Is there another library (compatible with Linux by the way) that is more "powerful" and compliant? _________________ The best way to predict the future is to invent it. |
|||
06 May 2018, 09:42 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.