flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
Roman 03 Aug 2023, 11:19
Code: vminps zmm0, zmm0, qword [edx+ebp*8 + x] vminps zmm1, zmm2, dword [rax] {1to16} |
|||
![]() |
|
Jin X 03 Aug 2023, 11:29
Roman, yes, your code doesn't work too. But shouldn't because of type mismatch in 1st and wrong bit mode in 2nd.
NASM compiles my code ok but FASM can't! |
|||
![]() |
|
Roman 03 Aug 2023, 11:36
Not work this too. AMD Ryzen 3500
Code: ;aa dd 1.0 movss xmm0, dword [aa] vminps zmm1, zmm2, zmm0 ;this crash my program. IDA Pro show vminps zmm1, zmm2, zmm0 Last edited by Roman on 03 Aug 2023, 13:04; edited 2 times in total |
|||
![]() |
|
Jin X 03 Aug 2023, 11:56
Roman, "AA" and "aa" are different names
![]() Your crash can be because your CPU doesn't support AVX-512. I tried now your 'vminps' usung Intel SDE (with -future option), it works ok. |
|||
![]() |
|
Tomasz Grysztar 03 Aug 2023, 12:17
Jin X wrote: This code cannot be generated: This is an example of an oscillator problem. You can find more general information on what it is and how to deal with it in my articles about multi-pass assembly (even including a similar AVX-512 example, and also some comparison of different assemblers). There is a couple of possible approaches, although you might need to use fasmg to be able to tweak instruction encoder itself, if you need to be able to assemble this exact source without moving things around. |
|||
![]() |
|
revolution 03 Aug 2023, 15:13
I think the easiest way to solve this is to align
Code: use32 vminps zmm0, zmm0, [edx+ebp*8 + x] {1to16} ret align 4 x: |
|||
![]() |
|
Tomasz Grysztar 03 Aug 2023, 17:23
Good point, this is the most logical solution, as it ensures that immediate may be optimized ("compressed").
|
|||
![]() |
|
revolution 03 Aug 2023, 19:56
It might also help to expand or change the error message.
Code: error: pass count exceeded, code cannot be generated |
|||
![]() |
|
Tomasz Grysztar 03 Aug 2023, 20:49
fasmg/fasm2 signals it like this:
Code: flat assembler version g.k4v8
Error: could not generate code within the allowed number of passes. |
|||
![]() |
|
Jin X 04 Aug 2023, 12:25
revolution, yes, this is not a bad solution, but edx may not be multiple of 4 in some exotic cases. So this solution is not universal.
Tomasz Grysztar, what is fasm2 ? I will read your articles, thanks! |
|||
![]() |
|
revolution 04 Aug 2023, 12:30
Jin X wrote: revolution, yes, this is not a bad solution, but edx may not be multiple of 4 in some exotic cases. So this solution is not universal. |
|||
![]() |
|
Jin X 04 Aug 2023, 12:46
revolution wrote:
|
|||
![]() |
|
revolution 04 Aug 2023, 12:47
If you use align the x is guaranteed to be a multiple of 4.
|
|||
![]() |
|
revolution 04 Aug 2023, 12:50
Actually the multiple of 4 thing is perhaps distracting from the real problem. The root of the problem isn't the alignment, but is in fact the changing value of x. For example, this is also fine:
Code: use32 vminps zmm0, zmm0, [edx+ebp*8 + x] {1to16} ret align 4 rb 1 ; make x unaligned x: |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.