flat assembler
Message board for the users of flat assembler.

Index > Main > AVX-512 - error: code cannot be generated (BUG?)

Author
Thread Post new topic Reply to topic
Jin X



Joined: 06 Mar 2004
Posts: 133
Location: Russia
Jin X 03 Aug 2023, 11:08
This code cannot be generated:
Code:
use32
vminps zmm0, zmm0, [edx+ebp*8 + x] {1to16}
ret
x dd 1.0    
If I'll remove 'ret' (or replace 'x' in vminps to number) then it will be compiled.
Post 03 Aug 2023, 11:08
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1821
Roman 03 Aug 2023, 11:19
Code:
vminps zmm0, zmm0, qword [edx+ebp*8 + x]
vminps zmm1, zmm2, dword [rax] {1to16}
    
Post 03 Aug 2023, 11:19
View user's profile Send private message Reply with quote
Jin X



Joined: 06 Mar 2004
Posts: 133
Location: Russia
Jin X 03 Aug 2023, 11:29
Roman, yes, your code doesn't work too. But shouldn't because of type mismatch in 1st and wrong bit mode in 2nd.
NASM compiles my code ok but FASM can't!
Post 03 Aug 2023, 11:29
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1821
Roman 03 Aug 2023, 11:36
Not work this too. AMD Ryzen 3500
Code:
;aa dd 1.0
movss  xmm0, dword [aa]
vminps zmm1, zmm2, zmm0 ;this crash my program. IDA Pro show vminps zmm1, zmm2, zmm0     


Last edited by Roman on 03 Aug 2023, 13:04; edited 2 times in total
Post 03 Aug 2023, 11:36
View user's profile Send private message Reply with quote
Jin X



Joined: 06 Mar 2004
Posts: 133
Location: Russia
Jin X 03 Aug 2023, 11:56
Roman, "AA" and "aa" are different names Smile
Your crash can be because your CPU doesn't support AVX-512. I tried now your 'vminps' usung Intel SDE (with -future option), it works ok.
Post 03 Aug 2023, 11:56
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 03 Aug 2023, 12:17
Jin X wrote:
This code cannot be generated:
Code:
use32
vminps zmm0, zmm0, [edx+ebp*8 + x] {1to16}
ret
x dd 1.0    
If I'll remove 'ret' (or replace 'x' in vminps to number) then it will be compiled.

This is an example of an oscillator problem. You can find more general information on what it is and how to deal with it in my articles about multi-pass assembly (even including a similar AVX-512 example, and also some comparison of different assemblers).

There is a couple of possible approaches, although you might need to use fasmg to be able to tweak instruction encoder itself, if you need to be able to assemble this exact source without moving things around.
Post 03 Aug 2023, 12:17
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20410
Location: In your JS exploiting you and your system
revolution 03 Aug 2023, 15:13
I think the easiest way to solve this is to align
Code:
use32
vminps zmm0, zmm0, [edx+ebp*8 + x] {1to16}
ret
align 4
x:    
Post 03 Aug 2023, 15:13
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 03 Aug 2023, 17:23
Good point, this is the most logical solution, as it ensures that immediate may be optimized ("compressed").
Post 03 Aug 2023, 17:23
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20410
Location: In your JS exploiting you and your system
revolution 03 Aug 2023, 19:56
It might also help to expand or change the error message.
Code:
error: pass count exceeded, code cannot be generated    
Post 03 Aug 2023, 19:56
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 03 Aug 2023, 20:49
fasmg/fasm2 signals it like this:
Code:
flat assembler  version g.k4v8
Error: could not generate code within the allowed number of passes.    
Post 03 Aug 2023, 20:49
View user's profile Send private message Visit poster's website Reply with quote
Jin X



Joined: 06 Mar 2004
Posts: 133
Location: Russia
Jin X 04 Aug 2023, 12:25
revolution, yes, this is not a bad solution, but edx may not be multiple of 4 in some exotic cases. So this solution is not universal.

Tomasz Grysztar, what is fasm2 ?
I will read your articles, thanks!
Post 04 Aug 2023, 12:25
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20410
Location: In your JS exploiting you and your system
revolution 04 Aug 2023, 12:30
Jin X wrote:
revolution, yes, this is not a bad solution, but edx may not be multiple of 4 in some exotic cases. So this solution is not universal.
The assembler output is not sensitive to the value of edx (the assembler has no idea what values you put in edx). Only the address of x is important.
Post 04 Aug 2023, 12:30
View user's profile Send private message Visit poster's website Reply with quote
Jin X



Joined: 06 Mar 2004
Posts: 133
Location: Russia
Jin X 04 Aug 2023, 12:46
revolution wrote:
Jin X wrote:
revolution, yes, this is not a bad solution, but edx may not be multiple of 4 in some exotic cases. So this solution is not universal.
The assembler output is not sensitive to the value of edx (the assembler has no idea what values you put in edx). Only the address of x is important.
Yes but we have a problem when x is not multiple of 4 (but sometimes have not).
Post 04 Aug 2023, 12:46
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20410
Location: In your JS exploiting you and your system
revolution 04 Aug 2023, 12:47
If you use align the x is guaranteed to be a multiple of 4.
Post 04 Aug 2023, 12:47
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20410
Location: In your JS exploiting you and your system
revolution 04 Aug 2023, 12:50
Actually the multiple of 4 thing is perhaps distracting from the real problem. The root of the problem isn't the alignment, but is in fact the changing value of x. For example, this is also fine:
Code:
use32
vminps zmm0, zmm0, [edx+ebp*8 + x] {1to16}
ret
align 4
rb 1 ; make x unaligned
x:    
Adding align 4 was to stabilise the value of x and prevent the value changing on each pass.
Post 04 Aug 2023, 12:50
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.