flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
dogman 28 Nov 2013, 15:28
Yep! 300 lines.
|
|||
![]() |
|
AsmGuru62 28 Nov 2013, 16:25
Maybe GCC inlined every call to f()?
|
|||
![]() |
|
bitRAKE 28 Nov 2013, 19:16
|
|||
![]() |
|
dogman 29 Nov 2013, 07:17
Intel makes good compilers. I'm real happy with their Fortran.
bitrake, nice site you found! I'll compile it under a few other compilers and post the results later. I have some that are not on that site. |
|||
![]() |
|
TmX 29 Nov 2013, 10:15
dogman wrote: Intel makes good compilers. I'm real happy with their Fortran. AFAIK Intel C++ is also praised. Fortunately there are non commercial versions of their compiler. Unfortunately, no Windows version. ![]() Quite pricey, though. |
|||
![]() |
|
ASM-Man 30 Nov 2013, 15:17
dogman wrote: Intel makes good compilers. I'm real happy with their Fortran. And I with their C/C++. Vey nice really. _________________ I'm not a native speaker of the english language. So, if you find any mistake what I have written, you are free to fix for me or tell me on. ![]() |
|||
![]() |
|
dogman 30 Nov 2013, 18:12
Here's the output from 3 other compilers.
CC (Solaris Studio 12.3) Code: .file "test.c" .code32 .set .simple_nop, 0x90 .globl __1cBf6Fi_i_ .type __1cBf6Fi_i_, @function .local Dlrodata.lrodata .local Dldata.ldata .ident "iropt: Sun Compiler Common 12.3 Linux_i386 2011/11/16" .ident "ir2hf: Sun Compiler Common 12.3 Linux_i386 2011/11/16" .ident "ube: Sun Compiler Common 12.3 Linux_i386 2011/11/16" .section .text,"ax" { 1 } .align 16,.simple_nop /================================================================================ / FUNCTION __1cBf6Fi_i_ / %rbp - used as frame pointer. / arg0 "n": 4(%esp), __1cBf6Fi_i_: / BLOCK: 3, pred: 1, succ: 7 8, count: 17.3027 / FILE test.c / 1 !int f(int n) { { int[ 3] 1 } push %ebp { int[ 1] 1 } movl %esp,%ebp { int[ 3] 1 } push %ebx { int[ 3] 1 } push %esi / 2 ! if (n<2) { int[ 4] 2 } movl 8(%ebp),%ebx / sym=n { int[ 1] 2 } cmpl $2,%ebx { 2 } jl .CG2.14 { 5 } .CG3.15: / BLOCK: 8, pred: 3, succ: 10 11, count: 12.4580 / 3 ! return n; / 4 ! else / 5 ! return f(n-1)+f(n-2); { int[ 1] 5 } leal -2(%ebx),%esi { int[ 1] 2 } cmpl $2,%esi { 2 } jl .CG4.16 { 5 } .CG5.17: / BLOCK: 11, pred: 8, succ: 13 14, count: 11.2266 { int[ 1] } leal -4(%ebx),%esi { int[ 1] 2 } cmpl $2,%esi { 2 } jl .CG6.18 { 5 } .CG7.19: / BLOCK: 14, pred: 11, succ: 16 17, count: 10.1169 { int[ 1] } leal -6(%ebx),%esi { int[ 1] 2 } cmpl $2,%esi { 2 } jl .CG8.20 { 5 } .CG9.21: / BLOCK: 17, pred: 14, succ: 19 20, count: 9.11688 { int[ 1] } leal -8(%ebx),%esi { int[ 1] 2 } cmpl $2,%esi { 2 } jl .CGA.22 { 5 } .CGB.23: / BLOCK: 20, pred: 17, succ: 9, count: 4.55844 { int[ 1] 5 } subl $12,%esp { int[ 1] } leal -9(%ebx),%eax { int[ 3] 5 } push %eax { 5 } call __1cBf6Fi_i_ / BLOCK: 9, pred: 20, succ: 6, count: 4.55844 { int[ 1] 5 } addl $4,%esp { int[ 1] 5 } movl %eax,%esi { int[ 1] } leal -10(%ebx),%eax { int[ 3] 5 } push %eax { 5 } call __1cBf6Fi_i_ { int[ 1] 5 } addl $16,%esp / BLOCK: 6, pred: 9, succ: 19, count: 4.55844 { int[ 1] 5 } addl %eax,%esi { 3 } .CGA.22: { 5 } .CGC.24: / BLOCK: 21, pred: 19, succ: 5, count: 9.11688 { int[ 1] 5 } subl $12,%esp { int[ 1] } leal -7(%ebx),%eax { int[ 3] 5 } push %eax { 5 } call __1cBf6Fi_i_ { int[ 1] 5 } addl $16,%esp / BLOCK: 5, pred: 21, succ: 16, count: 9.11688 { int[ 1] 5 } addl %eax,%esi { 3 } .CG8.20: { 5 } .CGD.25: / BLOCK: 22, pred: 16, succ: 18, count: 10.1169 { int[ 1] 5 } subl $12,%esp { int[ 1] } leal -5(%ebx),%eax { int[ 3] 5 } push %eax { 5 } call __1cBf6Fi_i_ { int[ 1] 5 } addl $16,%esp / BLOCK: 18, pred: 22, succ: 13, count: 10.1169 { int[ 1] 5 } addl %eax,%esi { 3 } .CG6.18: { 5 } .CGE.26: / BLOCK: 23, pred: 13, succ: 4, count: 11.2266 { int[ 1] 5 } subl $12,%esp { int[ 1] } leal -3(%ebx),%eax { int[ 3] 5 } push %eax { 5 } call __1cBf6Fi_i_ { int[ 1] 5 } addl $16,%esp / BLOCK: 4, pred: 23, succ: 10, count: 11.2266 { int[ 1] 5 } addl %eax,%esi { 3 } .CG4.16: { 5 } .CGF.27: / BLOCK: 24, pred: 10, succ: 26, count: 12.4580 { int[ 1] 5 } subl $12,%esp { int[ 1] 5 } addl $-1,%ebx { int[ 3] 5 } push %ebx { 5 } call __1cBf6Fi_i_ { int[ 1] 5 } addl $16,%esp / BLOCK: 26, pred: 24, succ: 7, count: 12.4580 { int[ 1] 5 } leal (%eax,%esi),%ebx { 3 } .CG2.14: { 5 } .CG10.28: / BLOCK: 25, pred: 7, succ: 2, count: 4.84477 { int[ 1] 5 } movl %ebx,%eax { int[ 3] 5 } pop %esi { int[ 3] 5 } pop %ebx { 5 } leave { 5 } ret .size __1cBf6Fi_i_, . - __1cBf6Fi_i_ .CG0: .section .data,"aw" Ddata.data: / Offset 0 .section .bss,"aw" Bbss.bss: .section .bssf,"aw" .section .rodata,"a" Drodata.rodata: / Offset 0 .section .picdata,"aw" Dpicdata.picdata: / Offset 0 .section .lbss,"awh" .type Blbss.lbss, @object Blbss.lbss: .section .ldata,"awh" Dldata.ldata: / Offset 0 .type Dldata.ldata, @object .section .lrodata,"ah" Dlrodata.lrodata: / Offset 0 .type Dlrodata.lrodata, @object OpenUH (University of Houston version of Open64 compiler) Code: # /opt/openuh-3.0.29/lib/gcc-lib/x86_64-open64-linux/5.0/be::5.0 #----------------------------------------------------------- # Compiling test.c (/tmp/ccI#.s4ieOz) #----------------------------------------------------------- #----------------------------------------------------------- # Options: #----------------------------------------------------------- # Target:Wolfdale, ISA:ISA_1, Endian:little, Pointer Size:32 # -O3 (Optimization level) # -g0 (Debug level) # -m2 (Report advisories) #----------------------------------------------------------- .file 1 "/tmp/test.c" .text .align 2 .section .except_table_supp, "a",@progbits .section .except_table, "a",@progbits .section .text .p2align 5,, # Program Unit: _Z1fi .globl _Z1fi .type _Z1fi, @function _Z1fi: # 0x0 # .frame %esp, 20, %esp # _temp_gra_spill1 = 8 .loc 1 1 0 # 1 int f(int n) { .LBB1__Z1fi: .LEH_adjustsp__Z1fi: addl $-20,%esp # [0] .L_0_1282: .loc 1 2 0 # 2 if (n<2) movl 24(%esp),%edx # [0] n cmpl $1,%edx # [3] jg .Lt_0_770 # [4] .LBB3__Z1fi: .loc 1 3 0 # 3 return n; movl %edx,%eax # [0] addl $20,%esp # [0] ret # [0] .p2align 4,,15 .Lt_0_770: .loc 1 5 0 # 4 else # 5 return f(n-1)+f(n-2); movl 24(%esp),%eax # [0] n addl $-1,%eax # [3] movl %eax,0(%esp) # [4] id:8 call _Z1fi # [4] _Z1fi .LBB5__Z1fi: movl %eax,8(%esp) # [0] _temp_gra_spill1 movl 24(%esp),%eax # [0] n addl $-2,%eax # [3] movl %eax,0(%esp) # [4] id:9 call _Z1fi # [4] _Z1fi .LBB6__Z1fi: movl %eax,%edx # [0] movl 8(%esp),%eax # [0] _temp_gra_spill1 addl %edx,%eax # [3] addl $20,%esp # [3] ret # [3] .L_0_1538: .LDWend__Z1fi: .size _Z1fi, .LDWend__Z1fi-_Z1fi .section .except_table .align 0 .type .range_table._Z1fi, @object .range_table._Z1fi: # 0x0 # offset 0 .byte 255 # offset 1 .byte 0 .uleb128 .LSDATTYPEB1-.LSDATTYPED1 .LSDATTYPED1: # offset 6 .byte 1 .uleb128 .LSDACSE1-.LSDACSB1 .LSDACSB1: .uleb128 .L_0_1282-_Z1fi .uleb128 .L_0_1538-.L_0_1282 # offset 17 .uleb128 0 # offset 21 .uleb128 0 .LSDACSE1: # offset 25 .sleb128 0 # offset 29 .sleb128 0 .LSDATTYPEB1: # end of initialization for .range_table._Z1fi .section .text .align 4 .section .except_table_supp .align 4 .section .except_table .align 4 .section .eh_frame, "a",@progbits .LEHCIE: .4byte .LEHCIE_end - .LEHCIE_begin .LEHCIE_begin: .4byte 0x0 .byte 0x01, 0x7a, 0x50, 0x4c, 0x00, 0x01, 0x7c, 0x08 .byte 0x06, 0x00 .4byte __gxx_personality_v0 .byte 0x00, 0x0c, 0x04, 0x04, 0x88, 0x01 .align 4 .LEHCIE_end: .4byte .LFDE1_end - .LFDE1_begin .LFDE1_begin: .4byte .LFDE1_begin - .LEHCIE .4byte .LBB1__Z1fi .4byte .LDWend__Z1fi - .LBB1__Z1fi .byte 0x04 .4byte .range_table._Z1fi .byte 0x04 .4byte .LEH_adjustsp__Z1fi - .LBB1__Z1fi .byte 0x0e, 0x18 .align 4 .LFDE1_end: .section .debug_line, "" .section .note.GNU-stack,"",@progbits .ident "#Open64 Compiler Version 5.0 : test.c compiled with : -g0 -O3 -march=wolfdale -msse2 -msse3 -mno-3dnow -mno-sse4a -mssse3 -mno-sse41 -mno-sse42 -mno-aes -mno-pclmul -mno-avx -mno-xop -mno-fma -mno-fma4 -m32" Open64 Code: # /opt/open64-5.0/lib/gcc-lib/x86_64-open64-linux/5.0/be::5.0 #----------------------------------------------------------- # Compiling test.c (/tmp/ccI#.Bc1ETQ) #----------------------------------------------------------- #----------------------------------------------------------- # Options: #----------------------------------------------------------- # Target:Wolfdale, ISA:ISA_1, Endian:little, Pointer Size:32 # -O3 (Optimization level) # -g0 (Debug level) # -m2 (Report advisories) #----------------------------------------------------------- .file 1 "test.c" .text .align 2 .section .except_table_supp, "a",@progbits .section .except_table, "a",@progbits .section .text .p2align 5,, # Program Unit: _Z1fi .globl _Z1fi .type _Z1fi, @function _Z1fi: # 0x0 # .frame %esp, 20, %esp # _temp_gra_spill1 = 8 .loc 1 1 0 # 1 int f(int n) { .LBB1__Z1fi: .LEH_adjustsp__Z1fi: addl $-20,%esp # [0] .L_0_1282: .loc 1 2 0 # 2 if (n<2) movl 24(%esp),%edx # [0] n cmpl $1,%edx # [3] jg .Lt_0_770 # [4] .LBB3__Z1fi: .loc 1 3 0 # 3 return n; movl %edx,%eax # [0] addl $20,%esp # [0] ret # [0] .p2align 4,,15 .Lt_0_770: .loc 1 5 0 # 4 else # 5 return f(n-1)+f(n-2); movl 24(%esp),%eax # [0] n addl $-1,%eax # [3] movl %eax,0(%esp) # [4] id:8 call _Z1fi # [4] _Z1fi .LBB5__Z1fi: movl %eax,8(%esp) # [0] _temp_gra_spill1 movl 24(%esp),%eax # [0] n addl $-2,%eax # [3] movl %eax,0(%esp) # [4] id:9 call _Z1fi # [4] _Z1fi .LBB6__Z1fi: movl %eax,%edx # [0] movl 8(%esp),%eax # [0] _temp_gra_spill1 addl %edx,%eax # [3] addl $20,%esp # [3] ret # [3] .L_0_1538: .LDWend__Z1fi: .size _Z1fi, .LDWend__Z1fi-_Z1fi .section .except_table .align 0 .type .range_table._Z1fi, @object .range_table._Z1fi: # 0x0 # offset 0 .byte 255 # offset 1 .byte 0 .uleb128 .LSDATTYPEB1-.LSDATTYPED1 .LSDATTYPED1: # offset 6 .byte 1 .uleb128 .LSDACSE1-.LSDACSB1 .LSDACSB1: .uleb128 .L_0_1282-_Z1fi .uleb128 .L_0_1538-.L_0_1282 # offset 17 .uleb128 0 # offset 21 .uleb128 0 .LSDACSE1: # offset 25 .sleb128 0 # offset 29 .sleb128 0 .LSDATTYPEB1: # end of initialization for .range_table._Z1fi .section .text .align 4 .section .except_table_supp .align 4 .section .except_table .align 4 .section .eh_frame, "a",@progbits .LEHCIE: .4byte .LEHCIE_end - .LEHCIE_begin .LEHCIE_begin: .4byte 0x0 .byte 0x01, 0x7a, 0x50, 0x4c, 0x00, 0x01, 0x7c, 0x08 .byte 0x06, 0x00 .4byte __gxx_personality_v0 .byte 0x00, 0x0c, 0x04, 0x04, 0x88, 0x01 .align 4 .LEHCIE_end: .4byte .LFDE1_end - .LFDE1_begin .LFDE1_begin: .4byte .LFDE1_begin - .LEHCIE .4byte .LBB1__Z1fi .4byte .LDWend__Z1fi - .LBB1__Z1fi .byte 0x04 .4byte .range_table._Z1fi .byte 0x04 .4byte .LEH_adjustsp__Z1fi - .LBB1__Z1fi .byte 0x0e, 0x18 .align 4 .LFDE1_end: .section .debug_line, "" .section .note.GNU-stack,"",@progbits .ident "#Open64 Compiler Version 5.0 : test.c compiled with : -g0 -O3 -march=wolfdale -msse2 -msse3 -mno-3dnow -mno-sse4a -mssse3 -mno-sse41 -mno-sse42 -mno-aes -mno-pclmul -mno-avx -mno-xop -mno-fma4 -m32" All three look better than gcc at -O3... _________________ Sources? Ahahaha! We don't need no stinkin' sources! |
|||
![]() |
|
cod3b453 30 Nov 2013, 18:24
Well the only way to settle it is to race them
![]() |
|||
![]() |
|
tthsqe 06 Dec 2013, 18:52
cod3b43,
we have already had extensive speed tests (http://board.flatassembler.net/topic.php?t=10158&postdays=0&postorder=asc&start=20) for this academic example. I'm just always interested what the latest compiler tech is doing. At least for intel cpu's, I think the fastest human implementation was what I have below. On amd, you might want to replace the xch with a push and pop. Though, I wouldn't mind if you did run a speed test, to see if all of that code that gcc wrote actually does improve performance. Code: f: ;argument passed in eax cmp eax,1 jbe .1 push ebx lea ebx,[eax-2] dec eax call f xchg eax,ebx call f add eax,ebx pop ebx .1 ret |
|||
![]() |
|
bitRAKE 07 Dec 2013, 07:46
No compiler I know of will generate the simple loop using the XADD instruction.
http://www.asmcommunity.net/forums/topic/?id=14206 _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
![]() |
|
tthsqe 07 Dec 2013, 11:50
bitrake, could you give a precise rule-based deduction of the fact that your xadd loop correctly implements
Code: unsigned int f(unsigned int n) { if (n<2) return n; else return f(n-1)+f(n-2);} I think compilers technology has not focused on such optimizations. If you bag of tricks is precise enough, you should be able to teach it to the computer. |
|||
![]() |
|
bitRAKE 07 Dec 2013, 20:19
The first thing for the compiler to understand is the range of numbers involved, and the function utility (how it is used). Next is to understand the sequential nature of the recursion - caching values is beneficial. XADD is a special case of that value caching which the instruction selector should find.
|
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.