flat assembler
Message board for the users of flat assembler.

flat assembler > Linux > How to math with XMM REGS ?

Author
Thread Post new topic Reply to topic
catafest



Joined: 05 Aug 2010
Posts: 111
I try to use this : COS(X)+COS(x)
The goal is to understand how to build math formula with XMM REGS:
I used 50 radians value , 11 precissions
The result of my source code is :
Code:
[mythcat@desk fasm]$ ./ss
0.56944480223    

-
This is my raw source code :
Code:
format elf64
extrn printf
extrn cos

section '.data' writeable align 16
rad dq 50.0
fmt db "%.11lf",0ah,0

section '.text' executable align 16
public main
main:
    push rbp
    mov rbp,rsp

    pxor xmm0,xmm0
    movq xmm0,[rad]
    call cos
    movdqa xmm3,xmm0
    pxor xmm1,xmm1
    movq xmm1,[rad]
    call cos
    addss xmm1,xmm3
    mov rax,1
    mov rdi,fmt
    call printf

    mov rsp,rbp
    pop rbp
    ret
    
Post 25 Oct 2018, 16:22
View user's profile Send private message Visit poster's website Yahoo Messenger Reply with quote
donn



Joined: 05 Mar 2010
Posts: 125
What are you expecting to get, 1.9299320569?
cos(50) + cos(50)?

Also, which cos() library function are you linking with, gcc, c std lib? I believe they work on double-precision numbers, so

Code:
addsd    

is probably the appropriate add instruction.

There's also the cvtsd2ss set of instructions, which may be helpful. Once I'm at a Linux machine, I may have a chance to try assembling this.
Post 25 Oct 2018, 17:46
View user's profile Send private message Reply with quote
catafest



Joined: 05 Aug 2010
Posts: 111
donn wrote:
What are you expecting to get, 1.9299320569?
cos(50) + cos(50)?

Also, which cos() library function are you linking with, gcc, c std lib? I believe they work on double-precision numbers, so

Code:
addsd    

is probably the appropriate add instruction.

There's also the cvtsd2ss set of instructions, which may be helpful. Once I'm at a Linux machine, I may have a chance to try assembling this.

1. I expect a correct result of the function cos()+cos() - the issue is to send the result of the call of cos and make the sum.
See the cos function working well.
Code:
format elf64
extrn printf
extrn cos

section '.data' writeable align 16
rad dq 50.0
fmt db "%.11lf",0ah,0

section '.text' executable align 16
public main
main:
    push rbp
    mov rbp,rsp

    pxor xmm0,xmm0
    movq xmm0,[rad]
    call cos
    mov rax,1
    mov rdi,fmt
    call printf

    mov rsp,rbp
    pop rbp
    ret    

2. I use gcc:
Code:
$./fasm file.asm
$gcc -s file.o -o file -lm    
Post 29 Oct 2018, 21:48
View user's profile Send private message Visit poster's website Yahoo Messenger Reply with quote
donn



Joined: 05 Mar 2010
Posts: 125
Cool, I think I understand where you're trying to get to. My Linux setup is not very reliable at the moment, which is why I have delayed replying. The Linux 64-bit calling conventions are detailed in Agner's docs:
https://www.agner.org/optimize/calling_conventions.pdf (See Table 6)

In 64-bit mode, floating point params are passed in xmm0 and higher, and returned in xmm0. When calling functions, some registers are saved, such as rbx, rbp, r12-r15. You can put your initial result there, save it on the stack, or save it in memory.


Then, you can add the result with the next cos() call with


Code:

movsd xmm1, (place you stored the first result safely)

addsd xmm0, xmm1

    


The sum will then be in xmm0.
Post 31 Oct 2018, 01:28
View user's profile Send private message Reply with quote
catafest



Joined: 05 Aug 2010
Posts: 111
The goal of this post is to solve the way of working with xmm into some conditions:
Now:
I make many tests and I need to use the rules of parameters addressing.

1. using extern (linux) create some problems with xmm regs:
- change badly some xmm regs ;
- the result is put xmm0 ;
2. the gdb and edb debuggers cannot work with the output executable file created by fasm ( I use Fedora ) I don't know why ?!

3. the flow is different into xmm regs depends by way of programming ...

just this rows is true!

donn wrote:
Cool, I think
...
The sum will then be in xmm0.
Post 07 Nov 2018, 08:39
View user's profile Send private message Visit poster's website Yahoo Messenger Reply with quote
donn



Joined: 05 Mar 2010
Posts: 125
OK, well my computer is completely busted now and needs to be replaced so I can't assemble this locally. An online example is here:

https://gcc.godbolt.org/z/QD0Zss

As you can see, xmm0 is the double precision return and the first parameter. Calling functions will probably overwrite the register contents so you need to save them somewhere. The AMD and Intel processor docs list the register, memory, and datatype combinations. Some options are movsd and movq. Agner's doc has information on the registers that may be overwritten.


If that's not addressing what you are trying to solve, then maybe someone else can step in.
Post 07 Nov 2018, 13:50
View user's profile Send private message Reply with quote
catafest



Joined: 05 Aug 2010
Posts: 111
1. I try to test using that example https://gcc.godbolt.org/z/QD0Zss using the stack (24 and 8 precision - see SSE manual precision) using the default example without input data :
Code:
format elf64
extrn printf
extrn cos
section '.data' writeable align 16
rad dq 10.0
rad2 dq 90.0
rez dq 0.0
fmt db "%.30lf",0ah,0

section '.text' executable align 16
public main
main:
    push rbp
    mov rbp,rsp
    sub rsp,32
    movsd QWORD PTR [rbp-24],xmm0
    movsd xmm0,QWORD PTR [rdp-24]
    call cos
    movq rax,xmm0
    mov QWORD PTR [rbp-8],rax
    movsd xmm0,QWORD PTR [rbp-24]
    call cos 
    addsd xmm0,QWORD PTR [rbp-8]

    mov rdi,fmt
    call printf

    mov rsp,rbp
    pop rbp
    leave
    ret          


the result is:

Code:
flat assembler  version 1.73.04  (16384 kilobytes memory)
test_005.asm [16]:
    mov QWORD PTR [rbp-24],xmm0
processed: mov QWORD PTR[rbp-24],xmm0    


---

I try to use the ebx register (see the SSE manual examples) but I got the error on run.

The gdb debugger tell me:

Program received signal SIGSEGV, Segmentation fault.

same for
Code:
section '.text' executable      


2. according to the fasm manual ( 2.1.15 SSE instructions, see ebx examples) :

the xmm register cannot be accessed with the stack addressing with external function into Linux OS, even the calling convention :
The stack is aligned by 4 in 32-bit Windows.
The 64 bit systems keep the stack aligned by 16. The stack word size is 8 bytes, but the
stack must be aligned by 16 before any call instruction. Consequently, the value of the stack
10
pointer is always 8 modulo 16 at the entry of a procedure. A procedure must subtract an
odd multiple of 8 from the stack pointer before any call instruction. A procedure can rely on
these rules when storing XMM data that require 16-byte alignment. This applies to all 64 bit
systems (Windows, Linux, BSD).


Without a good example I stop here.
Post 07 Nov 2018, 17:40
View user's profile Send private message Visit poster's website Yahoo Messenger Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1293
Remove the PTR, that's not Fasm syntax:
Code:
mov qword [rbp-24], xmm0    
What's the value of RIP when the segfault happens? What instruction does it point to?

Note that for printf you need to specify the amount of xmm regs in rax AFAIK, since it's a vararg function.
Post 07 Nov 2018, 18:13
View user's profile Send private message Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 540
Catafest,

You cannot save your temporary result to xmm3, because it will be cleared by the second call to COS. I think it's AMD64 calling convention stuff. Save your first result to a variable then add it to XMM0 in the second result.
Post 07 Nov 2018, 23:53
View user's profile Send private message Visit poster's website Reply with quote
donn



Joined: 05 Mar 2010
Posts: 125
Yes, and remember the godbolt example was running inside a minimal callable function, not your standard main() function:

Code:
double cosSum(double num)    


Using heap variables like you originally had were good:

Code:
movq xmm1,[rad]    


The point of the godbolt example was to show how to minimally chain multiple cos() calls together, while preserving xmm register values in gcc.

Also, if you don't have an instruction reference, https://www.amd.com/system/files/TechDocs/26568.pdf is great. Page 222 MOVQ shows some ways you can get scalar values into xmm registers (scalar values do not take up the whole xmm register).

When you said:
Quote:
The result of my source code is :
Code:
[mythcat@desk fasm]$ ./ss
0.56944480223
it sounded like things were initially running. If that's not the case, then that's another story.
Post 08 Nov 2018, 16:54
View user's profile Send private message Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 540
I think cos(50.0) radian emits 0.96xxx, not 0.569xxxx.

Here's my old example (using SIN);
https://board.flatassembler.net/topic.php?t=20426

the OP almost got it the first time if it wasn't for saving to XMM3. Suggested solution (not tested because I don't have linux righ now)

Code:
        format elf64
        public main

        extrn cos
        extrn printf

        section '.data' writeable
rad     dq 50.0
rez     dq 0.0
fmt     db '%.11lf',0ah,0

        section '.text' executable
main:
        sub     rsp,8
        movq    xmm0,[rad]
        call    cos
        movq     [rez],xmm0   ;Don't save to XMM3 because...

        movq    xmm0,[rad]
        call    cos          ;it will be scratched by this

        addsd   xmm0,[rez]   ;sum it up to XMM0
        mov     eax,1
        mov     rdi,fmt
        call    printf
        add     rsp,8
        ret    


Hope that helps
Post 11 Nov 2018, 05:19
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2018, Tomasz Grysztar.

Powered by rwasa.