flat assembler
Message board for the users of flat assembler.

Index > Main > Code optimization (AKA C vs. Asm)

Goto page Previous  1, 2, 3, 4, 5  Next
Author
Thread Post new topic Reply to topic
bitRAKE



Joined: 21 Jul 2003
Posts: 4081
Location: vpcmpistri
bitRAKE 18 May 2009, 04:40
Borsuc wrote:
I thought "Whole Program Optimization" was a linker's job, not compiler? Confused
To be well done it would require the cooperation of both - I'm not certain what code generation is delayed until link-time (don't some linkers operate on byte code representations?). Or as in the case of the x86 coder would require a good model for implementing the application, and strong low level optimization skills - ideally, complementing each other.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 18 May 2009, 04:40
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20460
Location: In your JS exploiting you and your system
revolution 18 May 2009, 10:23
Madis731 wrote:
2) You tell the same to the compiler, but you replace 1000000 with user-input.
You know that the user only inputs 1000000 and never anything else, but the compiler doesn't know. Compilers can do whatever optimizations, but it will always remain O(N). You can make it O(1) and beat it.

This is of course never the case, but there are problems like converting floating point to fixed because you know you won't need that precision, replacing 64-bit registers with 32- or even 16-bit ones because you don't need more values etc.
This is not really fair to the HLL compilers. You are essentially holding back secret information from the complier. A compiler can't read your thoughts, you have to tell it what you want within the limits of it's language constructs.

If you know that the user will always enter 1000000 then tell the compiler that to give it the best chance to create the proper code. If you know that you only need integers then tell the compiler that also. Once the compiler has all the relevant information that you know then you can take the next step to judge the quality of the output.
Post 18 May 2009, 10:23
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 18 May 2009, 17:30
bitRAKE wrote:
To be well done it would require the cooperation of both - I'm not certain what code generation is delayed until link-time (don't some linkers operate on byte code representations?). Or as in the case of the x86 coder would require a good model for implementing the application, and strong low level optimization skills - ideally, complementing each other.
You mean you'll need different obj files or something?

revolution wrote:
This is not really fair to the HLL compilers. You are essentially holding back secret information from the complier. A compiler can't read your thoughts, you have to tell it what you want within the limits of it's language constructs.

If you know that the user will always enter 1000000 then tell the compiler that to give it the best chance to create the proper code. If you know that you only need integers then tell the compiler that also. Once the compiler has all the relevant information that you know then you can take the next step to judge the quality of the output.
That applies a lot more to let's say, memory, or counters that you know fall within a range. And yes a lot of it can be applied directly in HLL (though it makes it more obscure), but more possibilities in asm.

_________________
Previously known as The_Grey_Beast
Post 18 May 2009, 17:30
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4081
Location: vpcmpistri
bitRAKE 18 May 2009, 19:08
(I meant instruction level by "low level" skills.
Above that are levels of algorithmic optimization, imho.)
Borsuc wrote:
You mean you'll need different obj files or something?
This article on Whole Program Optimization with Visual C++ .NET might help.
It isn't strictly x86 object code the linker is working with:
Quote:
Link time code generation (LTCG), the Visual C++ .NET framework that makes whole program optimization possible, mitigates the difficulty a compiler has in performing optimizations. As the name implies, code generation does not occur until the linking stage.
...my suggestion for the future of assembly language development is more advanced tools which provide real-time feedback about cache effects as well as pipeline contentions. Allowing the programmer to make more informed decisions - especially with regard to loops.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 18 May 2009, 19:08
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20460
Location: In your JS exploiting you and your system
revolution 18 May 2009, 19:22
bitRAKE wrote:
It isn't strictly x86 object code the linker is working with:
Quite the understatement! It isn't x86 code it is working with at all.
MS .NET nonsense wrote:
... mitigates the difficulty a compiler has in performing optimizations ...
Um, no, it transfers the difficulty to yet another layer, the .NET layer.
Post 18 May 2009, 19:22
View user's profile Send private message Visit poster's website Reply with quote
manfred



Joined: 28 Feb 2009
Posts: 43
Location: Racibórz, Poland
manfred 18 May 2009, 20:07
revolution wrote:
Um, no, it transfers the difficulty to yet another layer, the .NET layer.
Are you sure? LTCG is for native applications, not for .NET ones.

_________________
Sorry for my English...
Post 18 May 2009, 20:07
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20460
Location: In your JS exploiting you and your system
revolution 18 May 2009, 20:19
manfred wrote:
Are you sure? LTCG is for native applications, not for .NET ones.
Okay, I don't know for sure, I just thought that the VC++.NET used it's internal byte code format before generating code. This happens whether you delay the code generation until runtime (the .NET managed code app) or if you immediately generate code at compile time (the native app). The only difference being that you do the extra step locally rather than on the target. However, if it does not actually work that way then I am at a loss to imagine how .NET would represent the compiled code before generating executable code.
Post 18 May 2009, 20:19
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4081
Location: vpcmpistri
bitRAKE 18 May 2009, 23:05
Hmm....interesting. There would be a lot of overlap in functionality if the intermediate form were DotNET byte code. Though, the lack a portability has me doubting... Of course, I completely agree about another abstraction layer not helping optimization.
Post 18 May 2009, 23:05
View user's profile Send private message Visit poster's website Reply with quote
manfred



Joined: 28 Feb 2009
Posts: 43
Location: Racibórz, Poland
manfred 19 May 2009, 06:16
All (well... most of) compilers generate intermediate code before native...
Code generated by Visual (2008) is not bad, that compiler in many cases generates good code, and if it do not, you can use __asm keyword, to put hand-written assembly to program.

_________________
Sorry for my English...
Post 19 May 2009, 06:16
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 19 May 2009, 08:47
Visual doesn't support 64-bit assembly !!! That means - its unusable. You still need to use FASM or give up and go to GCC or ICL.
Post 19 May 2009, 08:47
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
manfred



Joined: 28 Feb 2009
Posts: 43
Location: Racibórz, Poland
manfred 19 May 2009, 09:36
Madis731 wrote:
Visual doesn't support 64-bit assembly !!! That means - its unusable. You still need to use FASM or give up and go to GCC or ICL.
Oh really? There is command line switch named /MACHINE:X64.

_________________
Sorry for my English...
Post 19 May 2009, 09:36
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20460
Location: In your JS exploiting you and your system
revolution 19 May 2009, 09:39
Has anyone written an SSSO version of C yet? For any other language?
Post 19 May 2009, 09:39
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 19 May 2009, 19:31
manfred wrote:
Madis731 wrote:
Visual doesn't support 64-bit assembly !!! That means - its unusable. You still need to use FASM or give up and go to GCC or ICL.
Oh really? There is command line switch named /MACHINE:X64.

Last time I checked...no!
Visual C++ Team Blog wrote:

Inline asm is not supported by Visual C++ on 64-bit machines. Therefore, if you want your code to be 64-bit compatible, you need to use intrinsics.

If that is intrinsics what you mean, then okay, but 64-bit assembly was a big no-no when new Visual Studios came out. It was even a public announcement

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 19 May 2009, 19:31
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
NEOAethyr



Joined: 20 Aug 2007
Posts: 19
NEOAethyr 20 May 2009, 22:13
I don't like c, but I know it's not going away.
I hate C# though.

100+ megs of library like files for a program that has been precompiled on a diff machine.
Screw that.
Windows has tons of api's/functions, why do you need a ton more of functions that are probably the same darn thing but slower?
Post 20 May 2009, 22:13
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 21 May 2009, 05:14
I do like C (but hate C# though), but I'm not comfortable with their non-SSSO politics. The first days of browsing through a code all you do is fix compiler-incompatibilities. I never used to get the sources to compile because there was always something missing.

I know why they are 100+ megs. That is because with WinAPI you have to test (oh that horror) on all possible Win versions. When you inject your own strcpy or memcpy routine, you know that it works. Of course that is not an excuse for the average assembly-programmer, but C-programmers are usually very lazy. Very Happy
Post 21 May 2009, 05:14
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 21 May 2009, 19:04
Madis731 wrote:
I do like C (but hate C# though), but I'm not comfortable with their non-SSSO politics. The first days of browsing through a code all you do is fix compiler-incompatibilities. I never used to get the sources to compile because there was always something missing.
I never was able to compile anything from others and sometimes if there's a small modification I have to make I disassemble a program even if it has sources available Confused

Why the hell is it so big? How can FASM compile so easily and flawlessly, but with C you have to setup tons of crap -- even if you use GCC mind you. Confused

What the freaking hell?

_________________
Previously known as The_Grey_Beast
Post 21 May 2009, 19:04
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 01 Aug 2009, 16:35
Borsuc wrote:
Madis731 wrote:
I do like C (but hate C# though), but I'm not comfortable with their non-SSSO politics. The first days of browsing through a code all you do is fix compiler-incompatibilities. I never used to get the sources to compile because there was always something missing.
I never was able to compile anything from others and sometimes if there's a small modification I have to make I disassemble a program even if it has sources available Confused

Why the hell is it so big? How can FASM compile so easily and flawlessly, but with C you have to setup tons of crap -- even if you use GCC mind you. Confused

What the freaking hell?
Maybe they are self compiled? That would explain their slowness and hugeness.
Post 01 Aug 2009, 16:35
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 29 Dec 2009, 00:51
Even with the stupid fibbonacci itteration, it is possible to beat the compilers.

f(n integer)
{
if n<=1, return n
else return f(n-1)+f(n-2)
}

I've tried it with VC and the intel compiler. The intel compiler produces fast and longer code, but in both size and speed neither compare to:

Code:
f:     ;argument passed in eax
    cmp eax,1
    jbe .1
    push ebx
    lea ebx,[eax-2]
    dec eax
    call f
    xchg eax,ebx
    call f
    add eax,ebx
    pop ebx
.1  ret  
    
Post 29 Dec 2009, 00:51
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 29 Dec 2009, 01:22
I've wrote some code about fib once to show to a fellow something (and for that reason all is in Spanish, sorry):
Code:
; Athlon64 2.0 GHz (corriendo en modo de 32 bits) Usando XCHG
;---------------------------
;Test de fib(50)
;Tiempo = 321375 ms ; Resultado = 3996334433 (Ojo, ocurrió overflow)
;---------------------------
;Test de fib(47)
;Tiempo = 75891 ms ; Resultado = 2971215073
;---------------------------
;Test de fib(40)
;Tiempo = 2625 ms ; Resultado = 102334155
;---------------------------

; Athlon64 2.0 GHz (corriendo en modo de 32 bits) Usando reemplazo para XCHG
;---------------------------
;Test de fib(50)
;---------------------------
;Tiempo = 157297 ms ; Resultado = 3996334433
;---------------------------
;Test de fib(47)
;---------------------------
;Tiempo = 37235 ms ; Resultado = 2971215073
;---------------------------
;Test de fib(40)
;---------------------------
;Tiempo = 1282 ms ; Resultado = 102334155
;---------------------------

N equ 47
include 'win32ax.inc'

start:
  invoke  GetTickCount
  push    eax

  mov     eax, N
  call    fib

  pop     edx
  push    eax ; Argumento para el prinft llamado sin macro
  push    edx

  invoke  GetTickCount
  pop     edx

  mov     edi, buff
  sub     eax, edx
  cinvoke wsprintf, edi, fmt, eax

  xor     eax, eax

@@:
  scasb
  jnz     @b

  dec     edi
  mov     esi, resultado

@@:
  lodsb
  stosb
  test    al, al
  jnz     @b

  dec     edi

  push    fmt
  push    edi
  call    dword [wsprintf]
  add     esp, 4*3

  invoke  MessageBox, 0, salida, titulo, 0

  invoke  ExitProcess, 0

align 16
fib:
  cmp     eax, 1 ; Caso base?
  jbe     .retornar_eax

  push    ebx

  mov     ebx, eax
  dec     eax
  call    fib

; xchg    ebx, eax (Una poronga, por lo menos dobla el tiempo de ejecución)
; sub     eax, 2

  lea     ecx, [ebx-2]
  mov     ebx, eax
  mov     eax, ecx
  call    fib

; eax = Fib(n-2) ; ebx = Fib(n-1)

  add     eax, ebx ; eax = Fib(n-1) + Fib(n-2)

  pop     ebx

.retornar_eax:
  ret

f:     ;argument passed in eax
    cmp eax,1
    jbe .1
    push ebx
    lea ebx,[eax-2]
    dec eax
    call f
    xchg eax,ebx
    call f
    add eax,ebx
    pop ebx
.1:  ret

match n, N
{
  titulo    db "Test de fib(" # `n # ")", 0
}
  fmt       db "%u", 0
  resultado db " ms ; Resultado = ", 0
  salida    db "Tiempo = "
  buff      rb 256 ; Más de lo necesario la verdad pero no importa total
                   ; Windows va a alocar 4 KB en total de todos modos
.end start    


Adding your code just below "fib" label revealed this timing:
Quote:
---------------------------
Test de fib(47)
---------------------------
Tiempo = 79390 ms ; Resultado = 2971215073
---------------------------
Aceptar
---------------------------

(Tiempo=time;Resultado=Result)

Your code seems to take a little longer than my commented version using XCHG, but using the code as provided gives me "only" 37 segs with N=47.

When you said the code produced by Intel and VC was bad, did you actually measured the time or just guessed based on the code length?

PS: Yes, I know fib is one of the best examples of "never use recursion", but I did it this way because it was needed to see something I can recall now. Here there is more about fib: http://board.flatassembler.net/topic.php?t=4807
Post 29 Dec 2009, 01:22
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 29 Dec 2009, 03:45
Recursion by function-calling is NEVER needed. Why push the return address, at the very least, all the time, you know it won't change. Never understood the idea behind it. Confused

If you need a hierarchic recursion, then just use a custom array to hold it -- why pass it as duplicated parameters and return addresses? Confused
Post 29 Dec 2009, 03:45
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.