flat assembler
Message board for the users of flat assembler.
Index
> Main > AVX-512 - error: code cannot be generated (BUG?) |
Author |
|
Roman 03 Aug 2023, 11:19
Code: vminps zmm0, zmm0, qword [edx+ebp*8 + x] vminps zmm1, zmm2, dword [rax] {1to16} |
|||
03 Aug 2023, 11:19 |
|
Jin X 03 Aug 2023, 11:29
Roman, yes, your code doesn't work too. But shouldn't because of type mismatch in 1st and wrong bit mode in 2nd.
NASM compiles my code ok but FASM can't! |
|||
03 Aug 2023, 11:29 |
|
Roman 03 Aug 2023, 11:36
Not work this too. AMD Ryzen 3500
Code: ;aa dd 1.0 movss xmm0, dword [aa] vminps zmm1, zmm2, zmm0 ;this crash my program. IDA Pro show vminps zmm1, zmm2, zmm0 Last edited by Roman on 03 Aug 2023, 13:04; edited 2 times in total |
|||
03 Aug 2023, 11:36 |
|
Jin X 03 Aug 2023, 11:56
Roman, "AA" and "aa" are different names
Your crash can be because your CPU doesn't support AVX-512. I tried now your 'vminps' usung Intel SDE (with -future option), it works ok. |
|||
03 Aug 2023, 11:56 |
|
Tomasz Grysztar 03 Aug 2023, 12:17
Jin X wrote: This code cannot be generated: This is an example of an oscillator problem. You can find more general information on what it is and how to deal with it in my articles about multi-pass assembly (even including a similar AVX-512 example, and also some comparison of different assemblers). There is a couple of possible approaches, although you might need to use fasmg to be able to tweak instruction encoder itself, if you need to be able to assemble this exact source without moving things around. |
|||
03 Aug 2023, 12:17 |
|
revolution 03 Aug 2023, 15:13
I think the easiest way to solve this is to align
Code: use32 vminps zmm0, zmm0, [edx+ebp*8 + x] {1to16} ret align 4 x: |
|||
03 Aug 2023, 15:13 |
|
Tomasz Grysztar 03 Aug 2023, 17:23
Good point, this is the most logical solution, as it ensures that immediate may be optimized ("compressed").
|
|||
03 Aug 2023, 17:23 |
|
revolution 03 Aug 2023, 19:56
It might also help to expand or change the error message.
Code: error: pass count exceeded, code cannot be generated |
|||
03 Aug 2023, 19:56 |
|
Tomasz Grysztar 03 Aug 2023, 20:49
fasmg/fasm2 signals it like this:
Code: flat assembler version g.k4v8
Error: could not generate code within the allowed number of passes. |
|||
03 Aug 2023, 20:49 |
|
Jin X 04 Aug 2023, 12:25
revolution, yes, this is not a bad solution, but edx may not be multiple of 4 in some exotic cases. So this solution is not universal.
Tomasz Grysztar, what is fasm2 ? I will read your articles, thanks! |
|||
04 Aug 2023, 12:25 |
|
revolution 04 Aug 2023, 12:30
Jin X wrote: revolution, yes, this is not a bad solution, but edx may not be multiple of 4 in some exotic cases. So this solution is not universal. |
|||
04 Aug 2023, 12:30 |
|
Jin X 04 Aug 2023, 12:46
revolution wrote:
|
|||
04 Aug 2023, 12:46 |
|
revolution 04 Aug 2023, 12:47
If you use align the x is guaranteed to be a multiple of 4.
|
|||
04 Aug 2023, 12:47 |
|
revolution 04 Aug 2023, 12:50
Actually the multiple of 4 thing is perhaps distracting from the real problem. The root of the problem isn't the alignment, but is in fact the changing value of x. For example, this is also fine:
Code: use32 vminps zmm0, zmm0, [edx+ebp*8 + x] {1to16} ret align 4 rb 1 ; make x unaligned x: |
|||
04 Aug 2023, 12:50 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.