flat assembler
Message board for the users of flat assembler.

Index > Main > fasm & vb

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
jmcclane



Joined: 17 Feb 2013
Posts: 14
jmcclane
I have some problems so please help
here's the code... simple vector3 add

proc VectorAdd vD, vA, vB

mov edx, [vD] ; dest
mov ecx, [vB] ; vec b ; lea ecx, vB
mov eax, [vA] ; vec a ; lea eax, vA

movups XMM0, [ecx]
movups XMM1, [eax]

addps XMM0, XMM1

movups [edx], XMM0

emms
ret
endp

After I compile .dll and call function in vb.net like this.... see Byref, Byref, Byref .... it's works

<DllImport("c:\math.dll", EntryPoint:="VectorAdd", CharSet:=CharSet.Auto, ExactSpelling:=True, CallingConvention:=CallingConvention.StdCall)> _
Public Shared Sub VectorAdd(ByRef vD As vec3_t, ByRef vA As vec3_t, ByRef vB As vec3_t)
End Sub

But if I change order in Byref, Byval, Byval..... I have (stack) error... and I don't know what to do... I tray change in asm code mov with lea...
then I don't have stack error but there is some stupid result....

<DllImport("c:\math.dll", EntryPoint:="VectorAdd", CharSet:=CharSet.Auto, ExactSpelling:=True, CallingConvention:=CallingConvention.StdCall)> _
Public Shared Sub VectorAdd(ByRef vD As vec3_t, ByVal vA As vec3_t, ByVal vB As vec3_t)
End Sub

Here is function that I wont in vb

Public Structure vec3_t
Public x As Single
Public y As Single
Public z As Single
End Structure

Public Sub VectorAdd (ByRef vOut As vec3_t, ByVal v1 As vec3_t, ByVal v2 As vec3_t)

vOut.x = v1.x + v2.x
vOut.y = v1.y + v2.y
vOut.z = v1.z + v2.z

End Sub


Please help and give me some tips
Thanks in advance
Post 26 Feb 2013, 18:32
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly
Do you use the exact same asm code when you pass the structures by value?

The emms instruction shouldn't be there, you need it for MMX stuff but you are using SSE.

PS: Consider adding a fourth dummy field in the structure, since if the structure is not aligned to 16 bytes, it could cross a page boundary and cause a page fault if there is no memory allocated there.
Post 26 Feb 2013, 21:11
View user's profile Send private message Reply with quote
jmcclane



Joined: 17 Feb 2013
Posts: 14
jmcclane
Yes I use that code...
This is not optimized for boundary that i must do...
Thanks for emms!
Post 26 Feb 2013, 21:43
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
My guess is when you use ByVal for the structure instead of ByRef VB.NET is sending some runtime optimized/mangled version of the structure to your DLL. Using the LayoutKind.Sequential attribute property might help.
Code:
[StructLayout(LayoutKind.Sequential)]
Public Structure vec3_t
    Public x As Single
    Public y As Single
    Public z As Single
    Private pad As Single
End Structure
    
Post 27 Feb 2013, 17:45
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
jmcclane



Joined: 17 Feb 2013
Posts: 14
jmcclane
I tried but nothing .... thanks anyway
I think that is the same problem in c#
Post 27 Feb 2013, 20:38
View user's profile Send private message Reply with quote
ProphetOfDoom



Joined: 08 Aug 2008
Posts: 120
Location: UK
ProphetOfDoom
Hi,

I only use 64-bit Linux and am not familiar with VB or the "proc" macro so I might be completely wrong but...

As I understand it, passing by value means the entire structure is copied onto the stack. Thus your assembly function should expect ( 4 + 12 + 12 = 28 ) bytes on the stack. I doubt the "proc" macro takes this into account?

Assuming the old EBP and the return address are also placed on the stack, I suspect you'd find v1 at [ESP + 12] and v2 at [ESP + 24].

So try:

Code:
lea ecx, [esp + 24]
lea eax, [esp + 12]
    


Not sure tho.
Post 28 Feb 2013, 00:45
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly
I just checked that it is as ProphetOfDoom says, the struct is copied to the stack instead of giving you a pointer to a copy (just by the copy alone I'd recommend you stick to ByRef unless you have a very good reason not to).

In case you want to get this working like this anyway here is how I did it:
Code:
Option Strict On
Imports System.Runtime.InteropServices

Module Main
        Public Structure vec3_t
                Public x As Single
                Public y As Single
                Public z As Single
                Private pad As Single

                Public Overrides Function ToString() As String
                        Return String.Format("<{0},{1},{2}>", Me.x, Me.y, Me.z)
                End Function
        End Structure

        <DllImport("math.dll", EntryPoint:="VectorAdd", CharSet:=CharSet.Auto, ExactSpelling:=True, CallingConvention:=CallingConvention.StdCall)> _
        Public Sub VectorAdd(ByRef vD As vec3_t, ByVal vA As vec3_t, ByVal vB As vec3_t)
        End Sub

        Sub Main()
                Dim a, b, d As vec3_t
                a.x = 1
                a.y = 2
                a.z = 3
                b.x = 0
                b.y = -2
                b.z = 7
                VectorAdd(d, a, b)

                Console.WriteLine("a = " & a.ToString())
                Console.WriteLine("b = " & b.ToString())
                Console.WriteLine("d = " & d.ToString() & " (a + b)")

                Console.ReadLine()
        End Sub

End Module    

Code:
format PE GUI 4.0 DLL

include 'win32wxp.inc'

section '.text' code readable executable

proc DllEntryPoint hinstDLL,fdwReason,lpvReserved
        mov     eax,TRUE
        ret
endp

struct vec3_t
  x dd ?
  y dd ?
  z dd ?
    rd 1
ends

VectorAdd:
virtual at esp
  .oldIP dd ?
  .pDest dd ?
  .vA vec3_t
  .vB vec3_t
.size = $-$$
end virtual

  mov edx, [.pDest]
  lea ecx, [.vB]
  lea eax, [.vA]

  movups xmm0, [ecx]
  movups xmm1, [eax]

  addps xmm0, xmm1

  movups [edx], xmm0

  ret .size - 4

dd VectorAdd ; To force relocations since I don't remember now the magic code to force the relocs section to be non empty.

section '.edata' export data readable

  export 'Math.dll',\
         VectorAdd, 'VectorAdd'

section '.reloc' fixups data readable discardable

.end DllEntryPoint    


Console output:
Code:
a = <1,2,3>
b = <0,-2,7>
d = <1,0,10> (a + b)    
Post 28 Feb 2013, 03:00
View user's profile Send private message Reply with quote
jmcclane



Joined: 17 Feb 2013
Posts: 14
jmcclane
Thanks a lot for help...,

here is another sample but now I use vector4 struct

Code:
<StructLayout(LayoutKind.Sequential)>
    Public Structure vec4_t
        Public x As Single
        Public y As Single
        Public z As Single
        Public w As Single
    End Structure

  <DllImport("C:\math.dll", EntryPoint:="VectorAdd4", CharSet:=CharSet.Auto, ExactSpelling:=True, CallingConvention:=CallingConvention.StdCall)> _
    Public Shared Sub VectorAdd4(ByRef vD As vec4_t, ByRef vA As vec4_t, ByRef vB As vec4_t)
    End Sub    



Code:
proc VectorAdd4 vD, vA, vB

             mov  edx, [vD]   ;dest
             mov  ecx, [vB]    ;v2
             mov  eax, [vA]   ;v1

             movaps  XMM0, [ecx]
             movaps  XMM1, [eax]

             addps   XMM0,  XMM1

             movaps  [edx], XMM0
     ret
endp    


and when I replace muvaps with movups it's work otherwise not....??
but vec4 struct is aligned...

[EDIT by Loco]Added code tags[/edit]


Description:
Download
Filename: Test.zip
Filesize: 65.01 KB
Downloaded: 237 Time(s)



Last edited by jmcclane on 28 Feb 2013, 18:13; edited 1 time in total
Post 28 Feb 2013, 15:29
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly
How are you guaranteeing alignment? Having a structure of 16 bytes doesn't imply its base address is multiple of 16.
Post 28 Feb 2013, 16:18
View user's profile Send private message Reply with quote
jmcclane



Joined: 17 Feb 2013
Posts: 14
jmcclane
Try to put align 16...? don't know
Post 28 Feb 2013, 18:17
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly
jmcclane wrote:
Try to put align 16...? don't know
I mean from the VB.Net side. I see you are passing stack variables, so unless I'm missing something you are only guaranteed a 4 bytes alignment there. You need to consult VB.Net documentation about how to force specific alignments and packing.

In case it is not obvious, you'll have to make procedures that accept arrays as parameters, otherwise you are very likely to be making your code slower than just doing addition from VB.Net side because of the pinvoke overhead.
Post 28 Feb 2013, 19:14
View user's profile Send private message Reply with quote
jmcclane



Joined: 17 Feb 2013
Posts: 14
jmcclane
Thanks a lot LocoDelAssembly!!!
I'm looking for instructions on net already... but I think it's a problem for vb and c #
Maybe I will found something...
thanks again
Post 28 Feb 2013, 21:58
View user's profile Send private message Reply with quote
comrade



Joined: 16 Jun 2003
Posts: 1138
Location: Russian Federation
comrade
Yippee-ki-yay, motherfucker!

Image
Post 01 Mar 2013, 10:51
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
jmcclane



Joined: 17 Feb 2013
Posts: 14
jmcclane
Bruce Willis will align:)
Post 01 Mar 2013, 21:23
View user's profile Send private message Reply with quote
jmcclane1



Joined: 16 Aug 2021
Posts: 6
jmcclane1
Why doesn't it work in x64?


Code:
format PE64 GUI 4.0 DLL
entry DllEntryPoint


include '\include\win64a.inc'


section '.bss' data readable writeable

   align 16
   hi: dd    0x00000000,0x00000000,0x00000000,0xffffffff
   lo: dd    0xffffffff,0xffffffff,0xffffffff,0x00000000

section '.text' code readable executable

proc DllEntryPoint hinstDLL,fdwReason,lpvReserved
        mov     eax,TRUE
        ret
endp


proc AddSse vD, vA, vB

        mov    rax,[vD]           ;Vector Destination
        mov    rdx,[vB]           ;Vector B
        mov    rcx,[vA]           ;Vector A

        movups  xmm2,[rax]
        movups  xmm0,[rdx]         ;vB.xyz# {# Bz By Bx}
        movups  xmm1,[rcx]         ;vA.xyz# {# Az Ay Ax}

        andps   xmm2,[hi]     ;{ #    0     0    0   }
        addps   xmm0,xmm1          ;{#+# Az+Bz Ay+By Ax+Bx}

        andps   xmm0,[lo]     ;{0 Az+Bz Ay+By Ax+Bx}

        orps    xmm0,xmm2          ;{# Az+Bz Ay+By Ax+Bx}

        movups  [rax],xmm0         ;{# Az+Bz Ay+By Ax+Bx}

        ret
endp



section '.idata' import data readable writeable

  ;library kernel32,'KERNEL32.DLL'



section '.edata' export data readable

  export 'sse.dll',\
         AddSse,'AddSse'


section '.reloc' fixups data readable discardable

  if $=$$
       dd 0,8              ; if there are no fixups, generate dummy entry
  end if    
Edit by revolution: Added code tags
Post 16 Aug 2021, 23:55
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18222
Location: In your JS exploiting you and your system
revolution
What code do you use to call AddSse? What do you mean by "doesn't work"?

If you use the normal fastcall to call AddSse then the first four parameters are not on the stack, they are in RCX, RDX, R8 and R9 in that order.
Post 17 Aug 2021, 00:39
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18222
Location: In your JS exploiting you and your system
revolution
So maybe something like:
Code:
;...
proc AddSse vD, vA, vB

        mov    rax, rcx           ;Vector Destination
        mov    rdx, rdx           ;Vector B
        mov    rcx, r8            ;Vector A

        movups  xmm2,[rax]
;...    
Post 17 Aug 2021, 00:41
View user's profile Send private message Visit poster's website Reply with quote
jmcclane1



Joined: 16 Aug 2021
Posts: 6
jmcclane1
This is how I call both functions and now they work ok
Thanks a lot for the advice!
If you have any more advice on how to speed up please tell me.



;Public Const AsmLib As String = "......sse64.dll"

;<DllImport(AsmLib, EntryPoint:="AddSse", CharSet:=CharSet.Auto, CallingConvention:=CallingConvention.StdCall)>
;Public Shared Sub AddSse(ByRef dst As Vector3, ByVal V1 As Vector3, ByVal V2 As Vector3)
;End Sub

;-Or-

;<DllImport(AsmLib, EntryPoint:="AddSse", CharSet:=CharSet.Auto, CallingConvention:=CallingConvention.StdCall)>
;Public Shared Function AddSse(ByVal V1 As Vector3, ByVal V2 As Vector3) As Vector3
;End Function



Code:
format PE64 GUI 4.0 DLL
entry DllEntryPoint

include '\include\win64a.inc'



section '.bss' data readable writeable

   align 16
   hi: dd    0x00000000,0x00000000,0x00000000,0xffffffff
   lo: dd    0xffffffff,0xffffffff,0xffffffff,0x00000000



section '.text' code readable executable

proc DllEntryPoint hinstDLL,fdwReason,lpvReserved
        mov     eax,TRUE
        ret
endp

proc AddSse vD, vA, vB

        mov    rax, rcx            ;Vector Destination
        mov    rdx, rdx            ;Vector B
        mov    rcx, r8             ;Vector A

        movups  xmm2,[rax]
        movups  xmm0,[rdx]         ;vB.xyz# {# Bz By Bx}
        movups  xmm1,[rcx]         ;vA.xyz# {# Az Ay Ax}

        andps   xmm2,[hi]     ;{ #    0     0    0   }
        addps   xmm0,xmm1          ;{#+# Az+Bz Ay+By Ax+Bx}

        andps   xmm0,[lo]     ;{0 Az+Bz Ay+By Ax+Bx}

        orps    xmm0,xmm2          ;{# Az+Bz Ay+By Ax+Bx}

        movups  [rax],xmm0         ;{# Az+Bz Ay+By Ax+Bx}

        ret                                             
endp



section '.idata' import data readable writeable

  ;library kernel32,'KERNEL32.DLL'



section '.edata' export data readable

  export 'sse64.dll',\
          AddSse,'AddSse'


section '.reloc' fixups data readable discardable

  if $=$$
       dd 0,8      ; if there are no fixups, generate dummy entry
  end if    
Edit by revolution: Added code tags


Description:
Download
Filename: sse64.asm
Filesize: 2.04 KB
Downloaded: 76 Time(s)



Last edited by jmcclane1 on 17 Aug 2021, 10:01; edited 1 time in total
Post 17 Aug 2021, 09:57
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18222
Location: In your JS exploiting you and your system
revolution
Glad you got it working.

Now you can "optimise" it by eliminating the extraneous register assignments.
Code:
;        mov    rax, rcx            ;Vector Destination
;        mov    rdx, rdx            ;Vector B
;        mov    rcx, r8             ;Vector A

; use the registers directly below
        movups  xmm2,[rcx]
        movups  xmm0,[r8]          ;vB.xyz# {# Bz By Bx}
        movups  xmm1,[rdx]         ;vA.xyz# {# Az Ay Ax}
    
Post 17 Aug 2021, 10:09
View user's profile Send private message Visit poster's website Reply with quote
jmcclane1



Joined: 16 Aug 2021
Posts: 6
jmcclane1
Thank's! And a return would be

movups [rcx],xmm0 ;{# Az+Bz Ay+By Ax+Bx}

ret

How to put this in 32-bit?
movups xmm2,[rcx]
movups xmm0,[r8] ;vB.xyz# {# Bz By Bx}
movups xmm1,[rdx] ;vA.xyz# {# Az Ay Ax}

32bit code
....
mov eax,[vD] ;Vector Destination
mov edx,[vB] ;Vector B
mov ecx,[vA] ;Vector A

movups xmm2,[eax]
movups xmm0,[edx] ;vB.xyz# {# Bz By Bx}
movups xmm1,[ecx] ;vA.xyz# {# Az Ay Ax}
Post 17 Aug 2021, 10:25
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.