flat assembler
Message board for the users of flat assembler.

Index > Projects and Ideas > Flexer (FASM Lexer)

Author
Thread Post new topic Reply to topic
yakupcemilk



Joined: 23 Aug 2024
Posts: 1
yakupcemilk 24 Aug 2024, 18:29
Thank you.


Last edited by yakupcemilk on 24 Aug 2024, 19:27; edited 2 times in total
Post 24 Aug 2024, 18:29
View user's profile Send private message Reply with quote
ProMiNick



Joined: 24 Mar 2012
Posts: 798
Location: Russian Federation, Sochi
ProMiNick 24 Aug 2024, 19:17
when thou have no brain AI will not help.
AI sources looks like assembly code but they differ from asm code:
1. usefull logic is absent
2. processor requirements for instructions & operands not met
Code:
mov dword [tokens], 'NUMBER'    

3. formatter directives have no sense
Code:
format binary as "lexer.bin"    
and
Code:
section '.text' code readable executable    

AI place in a garbage bin.
Post 24 Aug 2024, 19:17
View user's profile Send private message Send e-mail Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4024
Location: vpcmpistri
bitRAKE 24 Aug 2024, 21:23
Claude 3.5 Sonnet can do some basic functions:
Code:
; Function to calculate the dot product of two 3D vectors
; and return the result as a floating-point number
; 
; Arguments:
;   RCX: Pointer to the first vector (3 double-precision floats)
;   RDX: Pointer to the second vector (3 double-precision floats)
;
; Returns:
;   XMM0: Dot product result (double-precision float)

dot_product_3d:
    ; Adjust stack to preserve XMM6-XMM8 (16 bytes each)
    sub rsp, 48
    movdqu [rsp], xmm6
    movdqu [rsp + 16], xmm7
    movdqu [rsp + 32], xmm8

    ; Load vector components
    movsd xmm0, [rcx]           ; x1
    movsd xmm1, [rcx + 8]       ; y1
    movsd xmm2, [rcx + 16]      ; z1
    movsd xmm6, [rdx]           ; x2
    movsd xmm7, [rdx + 8]       ; y2
    movsd xmm8, [rdx + 16]      ; z2

    ; Perform element-wise multiplication and accumulate
    mulsd xmm0, xmm6            ; x1 * x2
    mulsd xmm1, xmm7            ; y1 * y2
    mulsd xmm2, xmm8            ; z1 * z2
    addsd xmm0, xmm1
    addsd xmm0, xmm2

    ; Result is now in xmm0

    ; Restore preserved registers and adjust stack
    movdqu xmm8, [rsp + 32]
    movdqu xmm7, [rsp + 16]
    movdqu xmm6, [rsp]
    add rsp, 48
    ret    
... you really need to babysit the AI though - it will generate absolute garbage code. Look how pretty the comments are though. I type one sentence and AI gives 50 lines of useable code? (* sometimes *)

The x86 has gotten much better over the last year.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 24 Aug 2024, 21:23
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4024
Location: vpcmpistri
bitRAKE 24 Aug 2024, 21:29
Me wrote:
What about a sphere-sphere intersection test?
Claude:
Code:
; Function to test intersection between two spheres
; 
; Arguments:
;   RCX: Pointer to first sphere (x, y, z, radius - 4 doubles)
;   RDX: Pointer to second sphere (x, y, z, radius - 4 doubles)
;
; Returns:
;   RAX: 1 if spheres intersect, 0 otherwise

sphere_intersection:
    ; Preserve XMM6-XMM9 as per Win64 ABI
    sub rsp, 64
    movdqu [rsp], xmm6
    movdqu [rsp + 16], xmm7
    movdqu [rsp + 32], xmm8
    movdqu [rsp + 48], xmm9

    ; Load sphere 1 data
    movsd xmm0, [rcx]       ; x1
    movsd xmm1, [rcx + 8]   ; y1
    movsd xmm2, [rcx + 16]  ; z1
    movsd xmm3, [rcx + 24]  ; r1

    ; Load sphere 2 data
    movsd xmm4, [rdx]       ; x2
    movsd xmm5, [rdx + 8]   ; y2
    movsd xmm6, [rdx + 16]  ; z2
    movsd xmm7, [rdx + 24]  ; r2

    ; Calculate distance vector
    subsd xmm0, xmm4        ; dx = x1 - x2
    subsd xmm1, xmm5        ; dy = y1 - y2
    subsd xmm2, xmm6        ; dz = z1 - z2

    ; Calculate squared distance
    mulsd xmm0, xmm0        ; dx^2
    mulsd xmm1, xmm1        ; dy^2
    mulsd xmm2, xmm2        ; dz^2
    addsd xmm0, xmm1
    addsd xmm0, xmm2        ; xmm0 now contains squared distance

    ; Calculate sum of radii
    addsd xmm3, xmm7        ; r1 + r2
    mulsd xmm3, xmm3        ; (r1 + r2)^2

    ; Compare squared distance with squared sum of radii
    comisd xmm0, xmm3
    setbe al                ; Set AL to 1 if distance <= sum of radii, 0 otherwise
    movzx rax, al           ; Zero-extend AL to RAX

    ; Restore preserved XMM registers
    movdqu xmm9, [rsp + 48]
    movdqu xmm8, [rsp + 32]
    movdqu xmm7, [rsp + 16]
    movdqu xmm6, [rsp]
    add rsp, 64

    ret    
... obviously, I'm asking it for stuff with a billion examples on the internet. It doesn't create code.
Post 24 Aug 2024, 21:29
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4024
Location: vpcmpistri
bitRAKE 24 Aug 2024, 21:41
Imagine the flood of code with random errors at every level, but then there is: https://www.swebench.com/ AI to resolve software errors. The best current systems are about 20% and improving. Mostly, high-level language - which they are more accurate in.
Post 24 Aug 2024, 21:41
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4024
Location: vpcmpistri
bitRAKE 25 Aug 2024, 03:30
The important thing to understand about the LLMs is that language use drives the output. If the question is a beginner question or the terminology is incorrect, this warps the perspective of the conversation.

Above I specifically asked for a demonstration of the Windows 64-bit ABI - the model does that regardless of efficiency. Then I refined the context to just create isolated functions.

The user might need to reset the interface to clear the perspective - once the problem is refined. Especially, with the long context models.

Usually, I can prime the model with the first few lines of AVX2 code - how I want to load the registers, the order I want the data to be processed in; and the model will continue using those constraints.
Code:
vmovapd ymm0, [rdi]           ; Load x, y, z, and radius1 into ymm0
vmovapd ymm1, [rsi]           ; Load x', y', z', and radius2 into ymm1

vaddsd xmm3, xmm0, xmm1       ; radius1 + radius2 in lower part of ymm7
vsubpd ymm0, ymm0, ymm1       ; Compute x1-x2, y1-y2, z1-z2, (r1-r2 is discarded)

vmulpd ymm0, ymm0, ymm0       ; Square the differences

; Horizontal addition to sum squared differences for the distance
vextractf128 xmm1, ymm0, 1    ; Extract upper half of ymm0 into xmm2
vaddpd xmm0, xmm0, xmm1       ; Add the high and low parts of ymm0
vpermilpd xmm1, xmm0, 0b01    ; Shuffle to get the z component into lower xmm0
vaddsd xmm0, xmm0, xmm1       ; Final sum: x^2 + y^2 + z^2 in xmm0

vmulsd xmm3, xmm3, xmm3       ; Square (radius1 + radius2)

vucomisd xmm0, xmm3           ; Compare distance squared (xmm0) with radius squared (xmm7)
setbe al                      ; Set AL if distance squared is less than or equal    
(I'm sure you'd test the above fragment - it's wrong, but easy to fix.)
Post 25 Aug 2024, 03:30
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4024
Location: vpcmpistri
bitRAKE 06 Sep 2024, 11:54
Anthropic wrote:
Some of Anthropic's prompt engineering experts—Amanda Askell (Alignment Finetuning), Alex Albert (Developer Relations), David Hershey (Applied AI), and Zack Witten (Prompt Engineering)—reflect on how prompt engineering has evolved, practical tips, and thoughts on how prompting might change as AI capabilities grow.
AI prompt engineering: A deep dive
... many interesting perspective on getting better responses from the models.
Post 06 Sep 2024, 11:54
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1769
Roman 06 Sep 2024, 15:41
sphere-sphere intersection sse.
Code:
;data
align 16
        Sfer1    dd  5.0,4.0,6.0,0
        Sfer2    dd -5.0,4.0,4.0,0
        radius   dd 9.0,2.0          ;radius1 & radius2

; Load sphere 1 data
    movaps xmm0,dqword [ecx]
    subps xmm0,dqword [edx]
    movss xmm1, [radius]  ; r1

    ; Load radius1 & radius2
    addss xmm1, [radius+4]  ; r1+r2
    mulss xmm1, xmm1        ; (r1 + r2)^2


    mulps xmm0,xmm0
    haddps xmm0,xmm0
    haddps xmm0,xmm0

    ; Compare squared distance with squared sum of radii
    comiss xmm0, xmm1    
    setbe al                ; Set AL to 1 if distance <= sum of radii, 0 otherwise
    movzx eax, al           ; Zero-extend AL to RAX   
    
Post 06 Sep 2024, 15:41
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4024
Location: vpcmpistri
bitRAKE 07 Sep 2024, 14:14
SSE
Code:
{const:16} .A dd 1.0,-1.0,-1.0,-1.0

movaps xmm0,dqword [rcx] ; {r1, x1, y1, z1}
movaps xmm1,dqword [rdx] ; {r2, x2, y2, z2}

mulps xmm1, dqword[.A]  ; {1.0,-1.0,-1.0,-1.0}
addps xmm0, xmm1        ; {r1+r2, x2-x1, y2-y1, z2-z1}
mulps xmm0, xmm0        ; Square all elements
dpps xmm0, dqword[.A], 11110001b ; position result with low nibble    
Very Happy

... this code works very well because we want to unroll and gather many intersections. For example, if we had millions of spheres. We do AABB partitioning into subgroups and then intersection testing. 25% less memory bandwidth.

(Of course, we have vfmadd231ps on later processors.)

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup


Last edited by bitRAKE on 09 Sep 2024, 00:37; edited 2 times in total
Post 07 Sep 2024, 14:14
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20303
Location: In your JS exploiting you and your system
revolution 07 Sep 2024, 14:25
In case anyone isn't aware: dpps requires SSE4.1 support.
Post 07 Sep 2024, 14:25
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1769
Roman 07 Sep 2024, 16:25
Dpps little slow.
But nice for coding.
Sad not exist sse cross product one asm command.
Intel should have created instruction cross, but not dpps.
My opinion.
Post 07 Sep 2024, 16:25
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20303
Location: In your JS exploiting you and your system
revolution 08 Sep 2024, 01:16
Roman wrote:
Dpps little slow.
It might, or might not, be slow. It depends upon the usage and the system it runs on.

For example it can help with I-cache thrashing because it can make the code smaller. Also, if it gets used more Intel/AMD might allocate more silicon for it to improve the performance in future CPUs.

There isn't any way to know from simply reading a line of code whether it will be "slow" or not. Always test your assumptions.
Post 08 Sep 2024, 01:16
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1769
Roman 08 Sep 2024, 01:22
Post 08 Sep 2024, 01:22
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20303
Location: In your JS exploiting you and your system
revolution 08 Sep 2024, 01:37
You can't judge performance by looking at timings for a single instruction. Don't be misled by the numbers, they mean nothing in isolation.

Always test your assumptions. Don't blindly read a number and assume it is valid for everything everywhere.
Post 08 Sep 2024, 01:37
View user's profile Send private message Visit poster's website Reply with quote
uu



Joined: 20 Jul 2024
Posts: 39
uu 08 Sep 2024, 08:27
I admire those who can write code in SSE and AVX.

I only used two SSE instructions before in my programs.

Code:
    xorps xmm1,xmm1
    movups [rsi],xmm1     


That was beause I could move 16-byte data with a single operation.
Post 08 Sep 2024, 08:27
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.