flat assembler
Message board for the users of flat assembler.

Index > Main > How to turn a GCC problem into a FASM problem

Author
Thread Post new topic Reply to topic
l4m2



Joined: 15 Jan 2015
Posts: 648
l4m2
(Title had been "STOSB runs faster than STOSD")
I ran http://codepad.org/s0KaWtZ2 in mingw on my computer and I got the data as below:
Code:
  955908  675748  163512 1006760
  228480  524324  222084  694180
  329844  639076  483316  294880
  308200  547960  129488  342044
  321091  583528  123416  272372
  267328  511612  113316  196336
  155084  513520  112216  222304
  132300  504148  164156  240692
  281688 1234776  129384  211608
  257284  553888  129392  256732
  259048  515636  119440  291152
  264096  976044 1248500  194424
  258352  522164  118084  246692
  265976  517412  118320  254252
  918516  565528  131288  262712
  267308  518020  121264  250964
  248052  519928  118416  267208
  247020  521556  123136  253964
  240560  876212  115368  283616
  194844  510512  110968  289000
  121468  509572  108728  275092
  114572  526240  111028  291084
  117020  514260  116040  297548
  224500  523272  137528  308536
  167712  541944  137676  622192
  278844  854868  143220  329372
  372884  820464  133592  330376
  341204  830128  137348  327468
  333304  843636  131416  334080
  445004  591412  134696  322104
  375592  568128  141244  585032
  388712  895360  132556  312944
  371204  880348  130064  346432
  356796  921684  141704  311772
  322804  838336  129560  384448
  373184 1034948  174800  350732
  371460  613836  129612  326932
  388244  867864  170244  333432
  408408  889896  129204  330428
  351924  749552  373476  369396
  461212  719556  133284  285164
  414508  869236  140044  288804
  347796  748564  126536  493376
  377312  867112  130288  293172
  922560  959612  144984  302208
  370352  725684  133856  483252
  420676  880860  248588  293816
  805100  545444  124832  299980
  269012  546328  208860  476216
  467940  942340  160184  318740
  418176 1191468  171672  297560
  426352  899344  130844  329292
  392496  874436  155136  307584
  560556  554556  144636  306324
  399252  775544  143064  404392
  546232  535612  121204  281684
  470624  805764  158640  326556
  531532  543836  122864  265204
  445672  916528  129780  298924
  544020  573492  136240  308136
  474264  705664  131252  304892
  408212  843540  124568  234888
  462060  577952  121880  218840
  225104  572660  117588  226804
  398436  784708  143812  181320
  394520  545872  117728  210828
  256132  800524  133188  236188
  366984  699328  151500  533652
  363464  825604  149024  264604
  358024  807460  141892  279024
  428412  832644  122760  258020
  398780  612964  179376  311084
  467012  809368  155844  269772
  592672  559248  146496  296168
  628472  541928  130764  312896
  528084  531320  147240  520360
  276756  711796  142140  294056
  215668  739000  148524  251556
  462808  631216  153656  233036
  302348  578240  132224  283496
  269296  633524  132696  329540
  355852  628716  753552  324380
  358560  736888  453888  838181
  453700  562900  121560  209980
  332352  615304  127584  249240
  318360  849508  152020  260436
  378360  596136  259007  245788
  362628  619204  144416  234072
  320728  601088  132964  277056
  368120  613524  139376 1137168
  157680  525520  148752  238064
  382560  771800  134496  290396
  499516  638644  121268  255476
  387576  548384  123508  431368
  362220  998864  224227  254328
  327964  593800  134804  282072
  342252 1006096  129632  272992
  396328  594176  124788  218108
  391548  611088  132880  253820
  379640  611764  114336  251412
  356464 1215836  121200  314808
  335460  827824  132172  296016
  319696 1157860  125916  290284
  333408  911848  132168  302288
  499952  676700  114212  312504
  295424  743968  119596  285436
  334664  903756  263364  287512
  452616  591824  183796  442340
  433324  740248  143508  320068
  339756  785592  133484  286076    

It's obvious that the column STOSB is faster than STOSD and MEMSET. How could that happen?


Last edited by l4m2 on 27 Feb 2016, 14:23; edited 1 time in total
Post 26 Feb 2016, 16:07
View user's profile Send private message Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 648
l4m2
Also MOVSB is faster than MOVSD
Post 26 Feb 2016, 16:11
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
Is your buffer aligned?
Post 26 Feb 2016, 16:24
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 648
l4m2
revolution wrote:
Is your buffer aligned?
I disassembled the executable file and got that &a=0x405060
Post 26 Feb 2016, 16:33
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
Post your assembly code so we can test it.
Post 27 Feb 2016, 01:39
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 648
l4m2
Edit by revolution: Removed the useless pointless large binary dump
Post 27 Feb 2016, 03:16
View user's profile Send private message Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 648
l4m2
It seems that a huge input will be cut so I put it on http://paste.ubuntu.com/15212249/
Post 27 Feb 2016, 03:18
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
l4m2: Your post serves no purpose.
Post 27 Feb 2016, 03:20
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 648
l4m2
Why?
Post 27 Feb 2016, 03:20
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
How do you think it would be useful?
Post 27 Feb 2016, 03:22
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 648
l4m2
revolution wrote:
l4m2: Your post serves no purpose.

I just disassemble the executable file and the "useless pointless large binary dump" is all I got
Post 27 Feb 2016, 03:23
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
That is not a disassembly file.

But anyhow, post your assembly code (source code) so we can test your theory. The C code you posted earlier won't compile in fasm. ShockedRazz


Last edited by revolution on 27 Feb 2016, 15:36; edited 2 times in total
Post 27 Feb 2016, 03:26
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 648
l4m2
You may use http://paste.ubuntu.com/15212337/ to get the file in fasm (make the extended name into .exe) and the four function are http://paste.ubuntu.com/15212339/
Code:
* Referenced by a CALL at Addresses:
|:00401C74   , :00401C8B   , :00401CA8   , :00401CCD   , :00401CEA   
|:00401D05   , :00401D22   , :00401D38   
|
:00401334 60                      pushad
:00401335 0FA2                    cpuid
:00401337 61                      popad
:00401338 0F31                    rdtsc
:0040133A C3                      ret


:0040133B 90                      nop
:0040133C 57                      push edi
:0040133D BA60504000              mov edx, 00405060
:00401342 B940420F00              mov ecx, 000F4240
:00401347 B003                    mov al, 03
:00401349 89D7                    mov edi, edx
:0040134B F3                      repz
:0040134C AA                      stosb
:0040134D 5F                      pop edi
:0040134E C3                      ret


:0040134F 90                      nop
:00401350 31D2                    xor edx, edx
:00401352 31C0                    xor eax, eax

* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00401367(C)
|
:00401354 C704956050400003030303  mov dword ptr [4*edx+00405060], 03030303
:0040135F 40                      inc eax
:00401360 89C2                    mov edx, eax
:00401362 3D90D00300              cmp eax, 0003D090
:00401367 75EB                    jne 00401354
:00401369 C3                      ret


:0040136A 6690                    nop
:0040136C 57                      push edi
:0040136D BF60504000              mov edi, 00405060
:00401372 B803000000              mov eax, 00000003
:00401377 B940420F00              mov ecx, 000F4240
:0040137C FC                      cld
:0040137D F3                      repz
:0040137E AA                      stosb
:0040137F 5F                      pop edi
:00401380 C3                      ret


:00401381 8D7600                  lea esi, dword ptr [esi+00]
:00401384 57                      push edi
:00401385 BF60504000              mov edi, 00405060
:0040138A B803030303              mov eax, 03030303
:0040138F B990D00300              mov ecx, 0003D090
:00401394 FC                      cld
:00401395 F3                      repz
:00401396 AB                      stosd
:00401397 5F                      pop edi
:00401398 C3                      ret    
Post 27 Feb 2016, 03:38
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
l4m2: Do you know how to write assembly code? Do you need help to write assembly code? We can help you if you ask.

Posting a binary file (no matter how you disguise it as text) doesn't show what you are doing. How do we know it is not malware?
Post 27 Feb 2016, 03:45
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 648
l4m2
revolution wrote:
l4m2: Do you know how to write assembly code? Do you need help to write assembly code? We can help you if you ask.

Posting a binary file (no matter how you disguise it as text) doesn't show what you are doing. How do we know it is not malware?

So should I give the whole .alf file (I used W32dasm to unassemble)? What's the difference between it and a binary file?
Post 27 Feb 2016, 03:49
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
l4m2 wrote:
So should I give the whole .alf file (I used W32dasm to unassemble)?
No. Please don't unless it can be directly assemble by fasm.
l4m2 wrote:
What's the difference between it and a binary file?
Binary is just data with no meaning. Source code shows the actual opcodes, comment and label names etc. Disassembly shows just the opcodes (and often misinterpreted data as code, and other silliness) without any notion of readable label names and programmer comments etc.
Post 27 Feb 2016, 03:58
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 648
l4m2
I changed the title to reflect the true subject
Post 27 Feb 2016, 14:24
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
A simple assembly version might look look this:
Code:
format pe console
include 'win32ax.inc'

TEST_LENGTH             = 1000000
TEST_REPITITIONS        = 10
TEST_WARM_UPS           = 3

macro rdtsc {
        pushad
        cpuid
        popad
        rdtsc
}

.data
        align 32
        dummy_buffer rb TEST_LENGTH

.code

proc start uses ebx edi
        locals
                stosb_time_low  rd 1
                stosb_time_high rd 1
                stosd_time_low  rd 1
                stosd_time_high rd 1
                print_length    rd 1
                dummy           rd 1
                output_string   rb 1024
        endl
        mov     ebx,TEST_REPITITIONS
    .loop_repitition:
        mov     edi,TEST_WARM_UPS
    .loop_warm_up:
        stdcall test_stosb,dummy_buffer,TEST_LENGTH
        mov     [stosb_time_low],eax
        mov     [stosb_time_high],edx
        stdcall test_stosd,dummy_buffer,TEST_LENGTH
        mov     [stosd_time_low],eax
        mov     [stosd_time_high],edx
        dec     edi
        jnz     .loop_warm_up
        invoke  wsprintf,addr output_string,<'Stosb: %I64u - Stosd: %I64u',13,10>,\
                [stosb_time_low],[stosb_time_high],[stosd_time_low],[stosd_time_high]
        mov     [print_length],eax
        invoke  GetStdHandle,STD_OUTPUT_HANDLE
        invoke  WriteFile,eax,addr output_string,[print_length],addr dummy,NULL
        dec     ebx
        jnz     .loop_repitition
        invoke  ExitProcess,0
endp

proc test_stosb uses edi,buffer,length
        ;return edx:edx = clock count
        rdtsc
        push    edx eax
        mov     edi,[buffer]
        mov     ecx,[length]
        mov     eax,3
        rep     stosb
        rdtsc
        pop     ecx edi
        sub     eax,ecx
        sbb     edx,edi
        ret
endp

proc test_stosd uses edi,buffer,length
        ;return edx:edx = clock count
        rdtsc
        push    edx eax
        mov     edi,[buffer]
        mov     ecx,[length]
        mov     eax,0x03030303
        shr     ecx,2
        rep     stosd
        rdtsc
        pop     ecx edi
        sub     eax,ecx
        sbb     edx,edi
        ret
endp

.end start    
Post 28 Feb 2016, 13:20
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 648
l4m2
revolution wrote:
A simple assembly version might look look this:
Code:
format pe console
include 'win32ax.inc'

TEST_LENGTH             = 1000000
TEST_REPITITIONS        = 10
TEST_WARM_UPS           = 3
...
        pop     ecx edi
        sub     eax,ecx
        sbb     edx,edi
        ret
endp

.end start    

Not a good idea to turn the code into this kind of assembly code when the turning breaks the appearance
Post 18 Mar 2016, 05:19
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler
In the future, you can use the options "-masm=Intel -S -o -" to get GCC to print assembly to the terminal. Remove the "-o -" to save it to a file.

It still likely won't be directly assemblable by FASM, but it'll be close (Intel syntax, at least) and more correct than a disasm.
Post 19 Apr 2016, 06:21
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.