flat assembler
Message board for the users of flat assembler.

Index > Non-x86 architectures > Rust generated assembly on M2 Mac - find variable

Author
Thread Post new topic Reply to topic
deito



Joined: 26 Oct 2023
Posts: 5
deito 26 Oct 2023, 08:47
I have the following simple Rust code in `main.rs`:

Code:
pub fn is_printable_ascii(name: &str) -> bool {
    name.chars().all(|c| matches!(c, ' '..='\x7d'))
}
fn main() {
    print!("{}", is_printable_ascii("Long text goes here that needs to be validated"));
    println!()
}
    


The question is: does passing ' '..='\x7f' to matches(...) macro create a new RangeInclusive<char> every time?

To answer that I wanted to look into the generated assembly to see if the pattern in question is moved to the general registers which as far as I understand would be an indication that the compiler optimised the code and the RangeInclusive<char> is only used once.

So if I run the command on the Mac (Apple Silicon M2 Pro) `rustc --emit asm main.rs` or even better run it without optimisations `rustc -C opt-level=0 --emit asm main.rs` here is the generated assembly. (see attached, main.s)

My knowledge of assembly is very limited. I know I should look for registers x0..x30 and likely a mov command, but I don't seem to be able to find out how to actually find what I need.

Thanks!


Description: rumain with stc -C opt-level=3 --emit asm main.s
Download
Filename: rumain with stc -C opt-level=3 --emit asm main.rs.s
Filesize: 5.09 KB
Downloaded: 180 Time(s)

Description: rustc --emit asm main.rs
Download
Filename: with_optimization.s
Filesize: 23.98 KB
Downloaded: 173 Time(s)

Description: file generated with `rustc -C opt-level=0 --emit asm main.rs`
Download
Filename: main.s
Filesize: 23.98 KB
Downloaded: 166 Time(s)



Last edited by deito on 27 Oct 2023, 12:38; edited 2 times in total
Post 26 Oct 2023, 08:47
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 26 Oct 2023, 14:38
It does a range test for values between 32 and 125 inclusive.
Code:
__ZN4main19is_printable_ascii228_$u7b$$u7b$closure$u7d$$u7d$17he91f4583a6508710E:
        .cfi_startproc
        sub     sp, sp, #16
        .cfi_def_cfa_offset 16
        str     w1, [sp, #8]
        subs    w8, w1, #32
        cset    w8, hs
        tbnz    w8, #0, LBB23_2
        b       LBB23_1
LBB23_1:
        strb    wzr, [sp, #15]
        b       LBB23_3
LBB23_2:
        ldr     w8, [sp, #8]
        subs    w8, w8, #125
        cset    w8, ls
        and     w8, w8, #0x1
        strb    w8, [sp, #15]
        b       LBB23_3
LBB23_3:
        ldrb    w8, [sp, #15]
        and     w0, w8, #0x1
        add     sp, sp, #16
        .cfi_def_cfa_offset 0
        ret
        .cfi_endproc
    
32 == ' ' and 125 == 0x7d

BTW: The code is awful. The whole things can be reduced to just a few lines, and it doesn't need to use the stack at all.
Post 26 Oct 2023, 14:38
View user's profile Send private message Visit poster's website Reply with quote
deito



Joined: 26 Oct 2023
Posts: 5
deito 26 Oct 2023, 19:09
Thanks @revolution!
I noticed that you mentioned the stack, I assume the reason is that optimisation was turned off. I will post the version where the optimisation is turned on, in which case I assume we should have a better version of this.

The ultimate question I am trying to figure out: does Rust optimise this code for me, or blindly accesses the same pattern with O(n) - I am pretty sure it won't but it would be nice to actually pinpoint where the optimisation happens.
Post 26 Oct 2023, 19:09
View user's profile Send private message Reply with quote
deito



Joined: 26 Oct 2023
Posts: 5
deito 26 Oct 2023, 19:10
Attached the optimised "rustc --emit asm main.rs " version.
Post 26 Oct 2023, 19:10
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 26 Oct 2023, 19:42
The "optimised" code is exactly the same.
Post 26 Oct 2023, 19:42
View user's profile Send private message Visit poster's website Reply with quote
deito



Joined: 26 Oct 2023
Posts: 5
deito 27 Oct 2023, 12:41
sorry, my bad
posted the file with optimisation level 3

The valid options are:

0: no optimizations
1: basic optimizations
2: some optimizations
3: all optimizations
"s": optimize for binary size
"z": optimize for binary size, but also turn off loop vectorization.

src: https://doc.rust-lang.org/cargo/reference/profiles.html


Just a side note, if the project has a toml file, and doesn't specify the opt-level, these are the defaults (my project has no toml):
Filename: Cargo.toml

[profile.dev]
opt-level = 0

[profile.release]
opt-level = 3
src: https://runebook.dev/en/docs/rust/book/ch14-01-release-profiles#:~:text=For%20example%2C%20here%20are%20the,optimizations%20extends%20compiling%20time
Post 27 Oct 2023, 12:41
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 27 Oct 2023, 17:20
The compiler has decided to preprocess the static text and deletes your function. It prints bool 1.
Code:
        mov     w19, #1 ; bool 1 (true)
        stp     x8, x19, [sp, #32]    
So the answer to "The question is: does passing ' '..='\x7f' to matches(...) macro create a new RangeInclusive<char> every time? " is: No, it never creates it.
Post 27 Oct 2023, 17:20
View user's profile Send private message Visit poster's website Reply with quote
deito



Joined: 26 Oct 2023
Posts: 5
deito 27 Oct 2023, 17:51
lol. I am surprised but I shouldn't be. I guess my example was pretty bad and got optimised out. If the input was more dynamic, e.g. something random or use input then it might be different I presume. I see Rust also uses LLVM, so I guess it goes through the IR step where this thing gets the optimisation.
thanks @revolution
Post 27 Oct 2023, 17:51
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 27 Oct 2023, 18:10
You can get a hint from the "no optimizations" output where It doesn't create a table, or bitmap, or LUT, it does a numeric check for [32, 125].

Maybe if you make the range check more complex with holes and gaps, then it might do something more interesting.
Post 27 Oct 2023, 18:10
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4020
Location: vpcmpistri
bitRAKE 27 Oct 2023, 18:53
Code:
pub fn is_printable_ascii(name: &str) -> bool {
    name.as_bytes().iter().all(|&c| c >= b' ' && c <= b'\x7d')
}    
... perhaps bypassing the Unicode processing might simplify the generated code. [Compiler Explorer]

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 27 Oct 2023, 18:53
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.