flat assembler
Message board for the users of flat assembler.

Index > High Level Languages > Compiler optimizations

Author
Thread Post new topic Reply to topic
FlierMate1



Joined: 31 May 2022
Posts: 118
FlierMate1 21 Aug 2022, 06:52
What are some tricks used by compiler optimizations that you know of?

For example, given the following C code:
Code:
int isHtmlWhitespace(int ch) {
   return ch == 0x0009 || ch == 0x000A ||
           ch == 0x000C || ch == 0x000D ||
           ch == 0x0020;
}    


GCC 12.1 with -O3 switch is able to produce a rather cryptic Assembly code:
Code:
isHtmlWhitespace:
        cmp     edi, 32
        ja      .L3
        movabs  rax, 4294981120
        bt      rax, rdi
        setc    al
        movzx   eax, al
        ret
.L3:
        xor     eax, eax
        ret    


It is, however, easier to comprehend if without optmization:
Code:
isHtmlWhitespace:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        cmp     DWORD PTR [rbp-4], 9
        je      .L2
        cmp     DWORD PTR [rbp-4], 10
        je      .L2
        cmp     DWORD PTR [rbp-4], 12
        je      .L2
        cmp     DWORD PTR [rbp-4], 13
        je      .L2
        cmp     DWORD PTR [rbp-4], 32
        jne     .L3
.L2:
        mov     eax, 1
        jmp     .L5
.L3:
        mov     eax, 0
.L5:
        pop     rbp
        ret    


How to learn all the tricks used by compiler?
Post 21 Aug 2022, 06:52
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18941
Location: In your JS exploiting you and your system
revolution 21 Aug 2022, 07:06
Hint: The magic number 4294981120 is 0x100003600
Post 21 Aug 2022, 07:06
View user's profile Send private message Visit poster's website Reply with quote
FlierMate1



Joined: 31 May 2022
Posts: 118
FlierMate1 21 Aug 2022, 07:23
revolution wrote:
Hint: The magic number 4294981120 is 0x100003600


I try to analyze it given the magic number.

BT wrote:
Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset (specified by the second operand) and stores the value of the bit in the CF flag.


Code:
0001 0000 0000 0000 0000 0011 0110 0000 0000
    


0x9=9th bit position
0xA=10th bit position
0xC=12th bit position
0xD=13th bit position
0x20=32th bit position

So if the selected bit is one, it will set the carry flag and stores in EAX through SETC.

Clever, isn't it? Is this designed by GCC developer, or somehow discovered by "machine learning" itself?


Last edited by FlierMate1 on 21 Aug 2022, 07:29; edited 1 time in total
Post 21 Aug 2022, 07:23
View user's profile Send private message Reply with quote
FlierMate1



Joined: 31 May 2022
Posts: 118
FlierMate1 21 Aug 2022, 07:26
Note the old Assembly code by GCC with optimization is the following, a slightly different version:
Code:
isHtmlWhitespace(int):
       cmp     edi, 32
       ja      .L42 
       movabs  rax, 4294981120
       mov     ecx, edi
       shr     rax, cl
       and     eax, 1
       ret
.L42:
       xor     eax, eax
       ret    
Post 21 Aug 2022, 07:26
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18941
Location: In your JS exploiting you and your system
revolution 21 Aug 2022, 07:37
FlierMate1 wrote:
Clever, isn't it? Is this designed by GCC developer, or somehow discovered by "machine learning" itself?
Human brain power only.

It is a very old "trick". GCC didn't invent it. It goes way back into the dim dark history of computers.
Post 21 Aug 2022, 07:37
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3511
Location: vpcmipstrm
bitRAKE 21 Aug 2022, 15:59
Here is a simple way that [*some] compiler(s) fail:
Code:
bool os_file_exists(wchar_t *name) {
    auto attrib = GetFileAttributesW(name);
    if (attrib == INVALID_FILE_ATTRIBUTES) return false;
    if (attrib & FILE_ATTRIBUTE_DIRECTORY) return false;
    return true;
}    
Simple enough, we don't want directories, but first we exclude invalid API result. We know that INVALID_FILE_ATTRIBUTES = -1 though! The second condition is sufficient but the compiler can't see it.

*gcc or msvc: on any architecture I've looked at. clang fixed it from 3.7 to present (even -O1, it's simple code elimination).


FYI: I have a whitespace test with my bit vector helper macro (following post).
The macro is more general than the compiler is capable of. Smile
Code:
align 32
ValidFileNameChars BITVECTOR 256,"!#$%&'()-@^_`{}~+,.;=[]0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

        bt [ValidFileNameChars],eax
        jnc .fail    
... seems easier to comprehend to me.

_________________
¯\(°_o)/¯ unlicense.org
Post 21 Aug 2022, 15:59
View user's profile Send private message Visit poster's website Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 180
Location: Marseille/France
sylware 23 Aug 2022, 11:30
uhoh! This is nice. I shall remember that and use it.

Ofc the least amount of branches is naively better for the speculation logic.

There is a lot of boolean tricks like that, a properly indexed and readable library of those would be actually more than appropriate.
Post 23 Aug 2022, 11:30
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3511
Location: vpcmipstrm
bitRAKE 27 Aug 2022, 05:28
Okay, the next place compilers fail: This one is a size optimization for amd64 code, that would cost zero in terms of performance.

Often we see the preservation of registers in the shadow space, and on the stack, imagine if you will (or grab a debugger and trace through any compiled code):
Code:
        mov    [rsp+8h],rbx
        mov    [rsp+10h],rsi
        mov    [rsp+18h],rdi
        mov    [rsp+20h],r9
        push   rbp r12 r13 r14 r15    
... we see it all the time, right?

Well, why isn't the compiler pushing legacy registers as a priority and store R8-R15 in the shadow space? Legacy registers don't cost a prefix byte when being pushed, but all 64-bit memory moves need a prefix byte.

No cost beyond changing the model for the machine. Yet, petabytes saved across the planet ... I don't know, maybe not - baby steps though.

_________________
¯\(°_o)/¯ unlicense.org
Post 27 Aug 2022, 05:28
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18941
Location: In your JS exploiting you and your system
revolution 27 Aug 2022, 05:56
Why not this?
Code:
pop rax
add rsp, 8 * 4
push r9 rdi rsi rbx rax rbp r12 r13 r14 r15    
Post 27 Aug 2022, 05:56
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18941
Location: In your JS exploiting you and your system
revolution 27 Aug 2022, 07:49
If your calling protocol uses rax then an alternative is this.
Code:
add rsp, 8 * 5
push r9 rdi rsi rbx
sub rsp, 8
push rbp r12 r13 r14 r15    
Post 27 Aug 2022, 07:49
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3511
Location: vpcmipstrm
bitRAKE 27 Aug 2022, 11:36
For speed, we wouldn't want to modify RSP and then PUSH - this is why MOV used in the shadow space MS ABI. Yet, certainly for size those are superior options. My point was that there is no performance cost for this size reduction.

_________________
¯\(°_o)/¯ unlicense.org
Post 27 Aug 2022, 11:36
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18941
Location: In your JS exploiting you and your system
revolution 27 Aug 2022, 12:26
bitRAKE wrote:
For speed, we wouldn't want to modify RSP and then PUSH ...
Depends upon the situation.
Post 27 Aug 2022, 12:26
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.