flat assembler
Message board for the users of flat assembler.

Index > Main > Online C/C++ -> assembler

Author
Thread Post new topic Reply to topic
redsock



Joined: 09 Oct 2009
Posts: 434
Location: Australia
redsock 15 Dec 2016, 20:57
Saw this on HN this morning during my news reading:

http://godbolt.org ... very useful IMO for seeing different compilers' ideas about optimisations, etc (and excellent followon/plaything for all of the various threads here on the board about HLL/compiler's being better than hand-coded, etc).

_________________
2 Ton Digital - https://2ton.com.au/
Post 15 Dec 2016, 20:57
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2543
Furs 16 Dec 2016, 13:34
At first I thought it only supports non-optimized output but then I saw the compiler options part...

BTW, to do 32-bit code in GCC just add -m32 if anyone is interested like me.
Post 16 Dec 2016, 13:34
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1821
Roman 19 Dec 2016, 12:40
Second many asm commands !

My example:
Code:
mov eax,Number
mov ebx,eax
imul ebx
    



Thats all !
But thanks you for this.
Some times useful look output code from c++ example.
Post 19 Dec 2016, 12:40
View user's profile Send private message Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 674
l4m2 25 Jan 2017, 13:07
Code:
int sgn(int n) {
    return n<0?-1:n>0;
}    

O3:
Code:
 cmp    DWORD PTR [esp+0x4],0x0
 jl     8048420 <sgn(int)+0x10>
 setg   al
 movzx  eax,al
 ret    
 xchg   ax,ax
 mov    eax,0xffffffff
 ret     

My solution:
Code:
cmp   edi, 0
setg  al
setl  ah
sub   al, ah
movsx eax,al    
Code:
mov   eax, esi
shr   eax, 31
neg   esi
adc   eax, eax    


Last edited by l4m2 on 25 Jan 2017, 16:26; edited 4 times in total
Post 25 Jan 2017, 13:07
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20414
Location: In your JS exploiting you and your system
revolution 25 Jan 2017, 13:12
Your solution incorrectly use or instead of adding. So it is not the same, the FLAGS can be different.

Edit: Oh wait, the ZF flag is undefined after MUL so your code is totally wrong!

Edit2: And even if ZF was defined, you might find that because of the latency from MUL the final result might not be any "faster". But you would have to check it in your app to see if it makes a measurable difference.
Post 25 Jan 2017, 13:12
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 28 Jan 2017, 23:38
Well, the modern compilers are certainly already capable of compensating some of the shortcomings of the HLL syntax by creating optimizations similar in effect to what a programmer thinking in assembly may write. My classical example of such "assembly programmer thinking" was my snippet from the '90s that computed the sum of proper divisors of a number. It looked, as far as I remember, like this:
Code:
sum_divisors: ; in: edi = number, out: eax = sum of proper divisors
        mov     ecx,1
        xor     esi,esi
     add_divisor:
        add     esi,ecx
     next_divisor:
        inc     ecx
        mov     eax,edi
        xor     edx,edx
        div     ecx
        cmp     eax,ecx
        jb      done
        je      square_root
        test    edx,edx
        jnz     next_divisor
        add     esi,eax
        jmp     add_divisor
     done:
        mov     eax,esi
        ret
     square_root:
        test    edx,edx
        jnz     done
        add     eax,esi
        ret    
There is a couple of assembly-specific constructions there, like triple branching (this is where there are two conditional jumps in a row) and the efficient use of both dividend and remainder obtained through a single division.
Out of curiosity I checked if I'd be able to write a C code that would get optimized to something similar, and what I got was often really close, though wildly varying between compilers and their versions. I think icc 16 made the fastest one (though not the nicest-looking):
Code:
sum_divisors(unsigned int):
        mov       esi, 1                                        #2.20
        mov       ecx, esi                                      #2.30
..B1.2:                         # Preds ..B1.3 ..B1.1
        inc       esi                                           #4.7
        mov       eax, edi                                      #5.20
        xor       edx, edx                                      #5.20
        div       esi                                           #5.20
        cmp       esi, eax                                      #7.16
        je        ..B1.6        # Prob 20%                      #7.16
        lea       r8d, DWORD PTR [rcx+rax]                      #2.30
        add       r8d, esi                                      #4.7
        test      edx, edx                                      #13.9
        cmove     ecx, r8d                                      #13.9
        cmp       esi, eax                                      #15.18
        jb        ..B1.2        # Prob 82%                      #15.18
..B1.5:                         # Preds ..B1.3 ..B1.6
        mov       eax, ecx                                      #16.12
        ret                                                     #16.12
..B1.6:                         # Preds ..B1.2                  # Infreq
        test      edx, edx                                      #8.15
        jne       ..B1.5        # Prob 50%                      #8.15
        add       ecx, esi                                      #8.29
        mov       eax, ecx                                      #8.29
        ret                                                     #8.29    
It appears to be even slightly faster than mine, perhaps thanks to the use of CMOV. Note that my snippet was only naïvely optimized, not taking into account any instruction sequencing rules of modern processors, just simply demonstrating the methods of thinking in assembly and structuring the code accordingly. I still believe that for larger programs the assembly-programming mindset allows to create efficient and beautiful code that would be very hard to obtain from the compilers (and that this may pay off even when no processor-specific optimizations are applied). But I acknowledge that at least on the level of functions like above one the compilers are already capable of generating code that looks almost as nice.
Post 28 Jan 2017, 23:38
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20414
Location: In your JS exploiting you and your system
revolution 29 Jan 2017, 00:12
As an aside: In ARM32 code this type of code is really easy to do nicely and concisely. All that jumping about can be eliminated with conditional predicates.

But for ARM64 code it is not so nice any more. It seems that the compilers were incapable of making good use of the ARM32 conditional predicates so ARM decided to remove most of them from the 64-bit instruction set (since they were "never" used). Sad It seems that hand-crafted assembly code is a dying art, and worse, the CPU instruction sets are being designed to match the HLL compilers capabilities. Crying or Very sad
Post 29 Jan 2017, 00:12
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 29 Jan 2017, 16:03
revolution wrote:
As an aside: In ARM32 code this type of code is really easy to do nicely and concisely. All that jumping about can be eliminated with conditional predicates.
And this is exactly what was always the most appealing to me in this architecture. Then it's a pity that it was a wasted potential.
Post 29 Jan 2017, 16:03
View user's profile Send private message Visit poster's website Reply with quote
TheRaven



Joined: 22 Apr 2008
Posts: 91
Location: U.S.A.
TheRaven 07 Jun 2017, 21:58
Compilers get better all the time; paraphrasing Ozzy Osbourne -moving forward in reverse opitimizes the "newer compiler designs" philos where they're getting reversed working back toward assembler.

Kids today get spit out of college with half the story and these simpleton view points (more opinionated than anything else) are the result ~ HLL compilers are better than assemblers totally ignorant to the fact that compilers still assemble.

Assembler is more about the CPU as a development library (API) and opcode is not really all that optional.

GCC and Gas are garbage. Read someone's post that GCC puts out really optimized code and bout fell out of my seat (that was funny sh!t right there).

My picks:
CLang is taking over with good reason.
FAsm is insanely powerful at such a tiny size and cross platform as hell.


But, yeah, that Compiler Explorer looks fun as hell. Nice post!
Post 07 Jun 2017, 21:58
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2543
Furs 09 Jun 2017, 11:35
The reason compilers don't optimize better than humans is because of the attitude of the developers. They don't find many "minor" improvements worth it, even if someone submits a patch, prefer "simpler" code or other bullshit like that and won't accept the patch because "makes code more complex for minor stuff" lmfao. And you wonder why hand-written asm is superior? Because developers don't prioritize optimizations.

But when you have a massive pile of minor improvements they tend to add up, even if one by itself won't be a "bottleneck" (it seems all bad programmers only think of easy solutions to bottlenecks). After what, 2 or 3 decades? They still can't do basic optimizations in some cases, it's laughable.


Also, GCC is bad and has developers with crappy/retarded attitudes but Clang is worse. I mean, someone patched an error message to be explicit like in GCC (GCC was superior there, back then), but it showed something like "const char* foo" which makes perfect sense.

However this one moron deliberately changed it to his so-called "correct" position of the asterisk "const char *foo" (nevermind that it's not even correct and he's full of shit). I mean Clang is even written in C++, where pointers are distinct types so it makes sense for the * to be near the type (just as you use it in casts). Who the fuck does he think he is? This is just an example, you know, even if insignificant. Clang is ran by even worse retards than GCC.

Also the name is shit.
Post 09 Jun 2017, 11:35
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20414
Location: In your JS exploiting you and your system
revolution 09 Jun 2017, 12:05
Admittedly it is a hard task to have the dev express the intent of the code through an HLL and then have the compiler try to determine the intent from the HLL code to create good assembly code.

But another problem I see people complain about is that HLL compiler on its maximum optimisation level will occasionally prune code it incorrectly thinks isn't used or can't ever be reached. That creates quite a headache for the dev to try and figure what is going on and how to fix it. Crazy times.
Post 09 Jun 2017, 12:05
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 09 Jun 2017, 12:11
revolution wrote:
But another problem I see people complain about is that HLL compiler on its maximum optimisation level will occasionally prune code it incorrectly thinks isn't used or can't ever be reached. That creates quite a headache for the dev to try and figure what is going on and how to fix it. Crazy times.
Crazy. Wouldn't that be considered an actual bug?
Post 09 Jun 2017, 12:11
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20414
Location: In your JS exploiting you and your system
revolution 09 Jun 2017, 12:12
Yes, it is a bug. It is not new either.

Works on O1 and O2, fails on O3.
Post 09 Jun 2017, 12:12
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2543
Furs 09 Jun 2017, 16:43
revolution wrote:
Admittedly it is a hard task to have the dev express the intent of the code through an HLL and then have the compiler try to determine the intent from the HLL code to create good assembly code.
I'm talking about the dev of the compiler, not the one who wants optimized code from his HLL source code. When I said they want "simpler code", I didn't mean end-users who use the compiler, I meant the compiler itself!

People who say "compilers most likely do a better job than you at optimizing" while at the same time never compile with all optimization settings on (and even if they do, the compiler is dumb on purpose to keep its code (who cares?) simpler) or are advocating "debug experience" instead... simply disgust me.

They need to STFU in optimization-related topics since they obviously treat it as second rate and stop spreading bullshit about compilers optimizing better when it's not true and that's partly because of compiler developers who suck.
Post 09 Jun 2017, 16:43
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc 11 Jun 2017, 10:35
revolution
Quote:
Yes, it is a bug. It is not new either.

Works on O1 and O2, fails on O3.

Compiler bugs aren't impossible of course and also happen to be triggered by more aggressive optimizations, but my experience with respect to C tells me that such situations are much more likely to occur because of a sloppily coded C program relying on undefined or unspecified behaviour. Most C coders (not to mention C++) just don't know the language well enough.

_________________
Faith is a superposition of knowledge and fallacy
Post 11 Jun 2017, 10:35
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2543
Furs 11 Jun 2017, 11:34
Yeah, a famous one is "strict aliasing" which refers to type-based aliasing detection. GCC does have a workaround for this without having to disable that optimization, though. You can use the may_alias attribute to a type which will force it to treat it as "any type" and thus alias with anything (if it can't prove the access range obviously). Classic example of reading a float's bits as int:
Code:
uint32_t foo(float f)
{
  typedef uint32_t __attribute__((__may_alias__)) bar;
  return *(bar*)(&f);
}    
This wouldn't work otherwise since it assumes uint32_t and float can never alias (different types). (In this case it doesn't matter since the function is too short though, but if you use it and it gets inlined guaranteed, it will matter)
Post 11 Jun 2017, 11:34
View user's profile Send private message Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc 11 Jun 2017, 11:41
Furs
The more or less standard-compliant way of aliasing these is to have a union. But it's still implementation-defined, of course, because the standard does not define the binary format used for floating-point numbers.

_________________
Faith is a superposition of knowledge and fallacy
Post 11 Jun 2017, 11:41
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.