flat assembler
Message board for the users of flat assembler.
![]() Goto page 1, 2 Next |
Author |
|
LocoDelAssembly 26 Feb 2013, 21:11
Do you use the exact same asm code when you pass the structures by value?
The emms instruction shouldn't be there, you need it for MMX stuff but you are using SSE. PS: Consider adding a fourth dummy field in the structure, since if the structure is not aligned to 16 bytes, it could cross a page boundary and cause a page fault if there is no memory allocated there. |
|||
![]() |
|
jmcclane 26 Feb 2013, 21:43
Yes I use that code...
This is not optimized for boundary that i must do... Thanks for emms! |
|||
![]() |
|
r22 27 Feb 2013, 17:45
My guess is when you use ByVal for the structure instead of ByRef VB.NET is sending some runtime optimized/mangled version of the structure to your DLL. Using the LayoutKind.Sequential attribute property might help.
Code: [StructLayout(LayoutKind.Sequential)] Public Structure vec3_t Public x As Single Public y As Single Public z As Single Private pad As Single End Structure |
|||
![]() |
|
jmcclane 27 Feb 2013, 20:38
I tried but nothing .... thanks anyway
I think that is the same problem in c# |
|||
![]() |
|
ProphetOfDoom 28 Feb 2013, 00:45
Hi,
I only use 64-bit Linux and am not familiar with VB or the "proc" macro so I might be completely wrong but... As I understand it, passing by value means the entire structure is copied onto the stack. Thus your assembly function should expect ( 4 + 12 + 12 = 28 ) bytes on the stack. I doubt the "proc" macro takes this into account? Assuming the old EBP and the return address are also placed on the stack, I suspect you'd find v1 at [ESP + 12] and v2 at [ESP + 24]. So try: Code: lea ecx, [esp + 24] lea eax, [esp + 12] Not sure tho. |
|||
![]() |
|
LocoDelAssembly 28 Feb 2013, 03:00
I just checked that it is as ProphetOfDoom says, the struct is copied to the stack instead of giving you a pointer to a copy (just by the copy alone I'd recommend you stick to ByRef unless you have a very good reason not to).
In case you want to get this working like this anyway here is how I did it: Code: Option Strict On Imports System.Runtime.InteropServices Module Main Public Structure vec3_t Public x As Single Public y As Single Public z As Single Private pad As Single Public Overrides Function ToString() As String Return String.Format("<{0},{1},{2}>", Me.x, Me.y, Me.z) End Function End Structure <DllImport("math.dll", EntryPoint:="VectorAdd", CharSet:=CharSet.Auto, ExactSpelling:=True, CallingConvention:=CallingConvention.StdCall)> _ Public Sub VectorAdd(ByRef vD As vec3_t, ByVal vA As vec3_t, ByVal vB As vec3_t) End Sub Sub Main() Dim a, b, d As vec3_t a.x = 1 a.y = 2 a.z = 3 b.x = 0 b.y = -2 b.z = 7 VectorAdd(d, a, b) Console.WriteLine("a = " & a.ToString()) Console.WriteLine("b = " & b.ToString()) Console.WriteLine("d = " & d.ToString() & " (a + b)") Console.ReadLine() End Sub End Module Code: format PE GUI 4.0 DLL include 'win32wxp.inc' section '.text' code readable executable proc DllEntryPoint hinstDLL,fdwReason,lpvReserved mov eax,TRUE ret endp struct vec3_t x dd ? y dd ? z dd ? rd 1 ends VectorAdd: virtual at esp .oldIP dd ? .pDest dd ? .vA vec3_t .vB vec3_t .size = $-$$ end virtual mov edx, [.pDest] lea ecx, [.vB] lea eax, [.vA] movups xmm0, [ecx] movups xmm1, [eax] addps xmm0, xmm1 movups [edx], xmm0 ret .size - 4 dd VectorAdd ; To force relocations since I don't remember now the magic code to force the relocs section to be non empty. section '.edata' export data readable export 'Math.dll',\ VectorAdd, 'VectorAdd' section '.reloc' fixups data readable discardable .end DllEntryPoint Console output: Code: a = <1,2,3> b = <0,-2,7> d = <1,0,10> (a + b) |
|||
![]() |
|
jmcclane 28 Feb 2013, 15:29
Thanks a lot for help...,
here is another sample but now I use vector4 struct Code: <StructLayout(LayoutKind.Sequential)> Public Structure vec4_t Public x As Single Public y As Single Public z As Single Public w As Single End Structure <DllImport("C:\math.dll", EntryPoint:="VectorAdd4", CharSet:=CharSet.Auto, ExactSpelling:=True, CallingConvention:=CallingConvention.StdCall)> _ Public Shared Sub VectorAdd4(ByRef vD As vec4_t, ByRef vA As vec4_t, ByRef vB As vec4_t) End Sub Code: proc VectorAdd4 vD, vA, vB mov edx, [vD] ;dest mov ecx, [vB] ;v2 mov eax, [vA] ;v1 movaps XMM0, [ecx] movaps XMM1, [eax] addps XMM0, XMM1 movaps [edx], XMM0 ret endp and when I replace muvaps with movups it's work otherwise not....?? but vec4 struct is aligned... [EDIT by Loco]Added code tags[/edit]
Last edited by jmcclane on 28 Feb 2013, 18:13; edited 1 time in total |
|||||||||||
![]() |
|
LocoDelAssembly 28 Feb 2013, 16:18
How are you guaranteeing alignment? Having a structure of 16 bytes doesn't imply its base address is multiple of 16.
|
|||
![]() |
|
jmcclane 28 Feb 2013, 18:17
Try to put align 16...? don't know
|
|||
![]() |
|
LocoDelAssembly 28 Feb 2013, 19:14
jmcclane wrote: Try to put align 16...? don't know In case it is not obvious, you'll have to make procedures that accept arrays as parameters, otherwise you are very likely to be making your code slower than just doing addition from VB.Net side because of the pinvoke overhead. |
|||
![]() |
|
jmcclane 28 Feb 2013, 21:58
Thanks a lot LocoDelAssembly!!!
I'm looking for instructions on net already... but I think it's a problem for vb and c # Maybe I will found something... thanks again |
|||
![]() |
|
comrade 01 Mar 2013, 10:51
Yippee-ki-yay, motherfucker!
![]() |
|||
![]() |
|
jmcclane 01 Mar 2013, 21:23
Bruce Willis will align:)
|
|||
![]() |
|
jmcclane1 16 Aug 2021, 23:55
Why doesn't it work in x64?
Code: format PE64 GUI 4.0 DLL entry DllEntryPoint include '\include\win64a.inc' section '.bss' data readable writeable align 16 hi: dd 0x00000000,0x00000000,0x00000000,0xffffffff lo: dd 0xffffffff,0xffffffff,0xffffffff,0x00000000 section '.text' code readable executable proc DllEntryPoint hinstDLL,fdwReason,lpvReserved mov eax,TRUE ret endp proc AddSse vD, vA, vB mov rax,[vD] ;Vector Destination mov rdx,[vB] ;Vector B mov rcx,[vA] ;Vector A movups xmm2,[rax] movups xmm0,[rdx] ;vB.xyz# {# Bz By Bx} movups xmm1,[rcx] ;vA.xyz# {# Az Ay Ax} andps xmm2,[hi] ;{ # 0 0 0 } addps xmm0,xmm1 ;{#+# Az+Bz Ay+By Ax+Bx} andps xmm0,[lo] ;{0 Az+Bz Ay+By Ax+Bx} orps xmm0,xmm2 ;{# Az+Bz Ay+By Ax+Bx} movups [rax],xmm0 ;{# Az+Bz Ay+By Ax+Bx} ret endp section '.idata' import data readable writeable ;library kernel32,'KERNEL32.DLL' section '.edata' export data readable export 'sse.dll',\ AddSse,'AddSse' section '.reloc' fixups data readable discardable if $=$$ dd 0,8 ; if there are no fixups, generate dummy entry end if |
|||
![]() |
|
revolution 17 Aug 2021, 00:39
What code do you use to call AddSse? What do you mean by "doesn't work"?
If you use the normal fastcall to call AddSse then the first four parameters are not on the stack, they are in RCX, RDX, R8 and R9 in that order. |
|||
![]() |
|
revolution 17 Aug 2021, 00:41
So maybe something like:
Code: ;... proc AddSse vD, vA, vB mov rax, rcx ;Vector Destination mov rdx, rdx ;Vector B mov rcx, r8 ;Vector A movups xmm2,[rax] ;... |
|||
![]() |
|
jmcclane1 17 Aug 2021, 09:57
This is how I call both functions and now they work ok
Thanks a lot for the advice! If you have any more advice on how to speed up please tell me. ;Public Const AsmLib As String = "......sse64.dll" ;<DllImport(AsmLib, EntryPoint:="AddSse", CharSet:=CharSet.Auto, CallingConvention:=CallingConvention.StdCall)> ;Public Shared Sub AddSse(ByRef dst As Vector3, ByVal V1 As Vector3, ByVal V2 As Vector3) ;End Sub ;-Or- ;<DllImport(AsmLib, EntryPoint:="AddSse", CharSet:=CharSet.Auto, CallingConvention:=CallingConvention.StdCall)> ;Public Shared Function AddSse(ByVal V1 As Vector3, ByVal V2 As Vector3) As Vector3 ;End Function Code: format PE64 GUI 4.0 DLL entry DllEntryPoint include '\include\win64a.inc' section '.bss' data readable writeable align 16 hi: dd 0x00000000,0x00000000,0x00000000,0xffffffff lo: dd 0xffffffff,0xffffffff,0xffffffff,0x00000000 section '.text' code readable executable proc DllEntryPoint hinstDLL,fdwReason,lpvReserved mov eax,TRUE ret endp proc AddSse vD, vA, vB mov rax, rcx ;Vector Destination mov rdx, rdx ;Vector B mov rcx, r8 ;Vector A movups xmm2,[rax] movups xmm0,[rdx] ;vB.xyz# {# Bz By Bx} movups xmm1,[rcx] ;vA.xyz# {# Az Ay Ax} andps xmm2,[hi] ;{ # 0 0 0 } addps xmm0,xmm1 ;{#+# Az+Bz Ay+By Ax+Bx} andps xmm0,[lo] ;{0 Az+Bz Ay+By Ax+Bx} orps xmm0,xmm2 ;{# Az+Bz Ay+By Ax+Bx} movups [rax],xmm0 ;{# Az+Bz Ay+By Ax+Bx} ret endp section '.idata' import data readable writeable ;library kernel32,'KERNEL32.DLL' section '.edata' export data readable export 'sse64.dll',\ AddSse,'AddSse' section '.reloc' fixups data readable discardable if $=$$ dd 0,8 ; if there are no fixups, generate dummy entry end if
Last edited by jmcclane1 on 17 Aug 2021, 10:01; edited 1 time in total |
|||||||||||
![]() |
|
revolution 17 Aug 2021, 10:09
Glad you got it working.
Now you can "optimise" it by eliminating the extraneous register assignments. Code: ; mov rax, rcx ;Vector Destination ; mov rdx, rdx ;Vector B ; mov rcx, r8 ;Vector A ; use the registers directly below movups xmm2,[rcx] movups xmm0,[r8] ;vB.xyz# {# Bz By Bx} movups xmm1,[rdx] ;vA.xyz# {# Az Ay Ax} |
|||
![]() |
|
jmcclane1 17 Aug 2021, 10:25
Thank's! And a return would be
movups [rcx],xmm0 ;{# Az+Bz Ay+By Ax+Bx} ret How to put this in 32-bit? movups xmm2,[rcx] movups xmm0,[r8] ;vB.xyz# {# Bz By Bx} movups xmm1,[rdx] ;vA.xyz# {# Az Ay Ax} 32bit code .... mov eax,[vD] ;Vector Destination mov edx,[vB] ;Vector B mov ecx,[vA] ;Vector A movups xmm2,[eax] movups xmm0,[edx] ;vB.xyz# {# Bz By Bx} movups xmm1,[ecx] ;vA.xyz# {# Az Ay Ax} |
|||
![]() |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.