flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
l4m2 26 Feb 2016, 16:07
(Title had been "STOSB runs faster than STOSD")
I ran http://codepad.org/s0KaWtZ2 in mingw on my computer and I got the data as below: Code: 955908 675748 163512 1006760 228480 524324 222084 694180 329844 639076 483316 294880 308200 547960 129488 342044 321091 583528 123416 272372 267328 511612 113316 196336 155084 513520 112216 222304 132300 504148 164156 240692 281688 1234776 129384 211608 257284 553888 129392 256732 259048 515636 119440 291152 264096 976044 1248500 194424 258352 522164 118084 246692 265976 517412 118320 254252 918516 565528 131288 262712 267308 518020 121264 250964 248052 519928 118416 267208 247020 521556 123136 253964 240560 876212 115368 283616 194844 510512 110968 289000 121468 509572 108728 275092 114572 526240 111028 291084 117020 514260 116040 297548 224500 523272 137528 308536 167712 541944 137676 622192 278844 854868 143220 329372 372884 820464 133592 330376 341204 830128 137348 327468 333304 843636 131416 334080 445004 591412 134696 322104 375592 568128 141244 585032 388712 895360 132556 312944 371204 880348 130064 346432 356796 921684 141704 311772 322804 838336 129560 384448 373184 1034948 174800 350732 371460 613836 129612 326932 388244 867864 170244 333432 408408 889896 129204 330428 351924 749552 373476 369396 461212 719556 133284 285164 414508 869236 140044 288804 347796 748564 126536 493376 377312 867112 130288 293172 922560 959612 144984 302208 370352 725684 133856 483252 420676 880860 248588 293816 805100 545444 124832 299980 269012 546328 208860 476216 467940 942340 160184 318740 418176 1191468 171672 297560 426352 899344 130844 329292 392496 874436 155136 307584 560556 554556 144636 306324 399252 775544 143064 404392 546232 535612 121204 281684 470624 805764 158640 326556 531532 543836 122864 265204 445672 916528 129780 298924 544020 573492 136240 308136 474264 705664 131252 304892 408212 843540 124568 234888 462060 577952 121880 218840 225104 572660 117588 226804 398436 784708 143812 181320 394520 545872 117728 210828 256132 800524 133188 236188 366984 699328 151500 533652 363464 825604 149024 264604 358024 807460 141892 279024 428412 832644 122760 258020 398780 612964 179376 311084 467012 809368 155844 269772 592672 559248 146496 296168 628472 541928 130764 312896 528084 531320 147240 520360 276756 711796 142140 294056 215668 739000 148524 251556 462808 631216 153656 233036 302348 578240 132224 283496 269296 633524 132696 329540 355852 628716 753552 324380 358560 736888 453888 838181 453700 562900 121560 209980 332352 615304 127584 249240 318360 849508 152020 260436 378360 596136 259007 245788 362628 619204 144416 234072 320728 601088 132964 277056 368120 613524 139376 1137168 157680 525520 148752 238064 382560 771800 134496 290396 499516 638644 121268 255476 387576 548384 123508 431368 362220 998864 224227 254328 327964 593800 134804 282072 342252 1006096 129632 272992 396328 594176 124788 218108 391548 611088 132880 253820 379640 611764 114336 251412 356464 1215836 121200 314808 335460 827824 132172 296016 319696 1157860 125916 290284 333408 911848 132168 302288 499952 676700 114212 312504 295424 743968 119596 285436 334664 903756 263364 287512 452616 591824 183796 442340 433324 740248 143508 320068 339756 785592 133484 286076 It's obvious that the column STOSB is faster than STOSD and MEMSET. How could that happen? Last edited by l4m2 on 27 Feb 2016, 14:23; edited 1 time in total |
|||
![]() |
|
l4m2 26 Feb 2016, 16:11
Also MOVSB is faster than MOVSD
|
|||
![]() |
|
revolution 26 Feb 2016, 16:24
Is your buffer aligned?
|
|||
![]() |
|
l4m2 26 Feb 2016, 16:33
revolution wrote: Is your buffer aligned? |
|||
![]() |
|
revolution 27 Feb 2016, 01:39
Post your assembly code so we can test it.
|
|||
![]() |
|
l4m2 27 Feb 2016, 03:16
Edit by revolution: Removed the useless pointless large binary dump
|
|||
![]() |
|
l4m2 27 Feb 2016, 03:18
It seems that a huge input will be cut so I put it on http://paste.ubuntu.com/15212249/
|
|||
![]() |
|
revolution 27 Feb 2016, 03:20
l4m2: Your post serves no purpose.
|
|||
![]() |
|
l4m2 27 Feb 2016, 03:20
Why?
|
|||
![]() |
|
revolution 27 Feb 2016, 03:22
How do you think it would be useful?
|
|||
![]() |
|
l4m2 27 Feb 2016, 03:23
revolution wrote: l4m2: Your post serves no purpose. I just disassemble the executable file and the "useless pointless large binary dump" is all I got |
|||
![]() |
|
l4m2 27 Feb 2016, 03:38
You may use http://paste.ubuntu.com/15212337/ to get the file in fasm (make the extended name into .exe) and the four function are http://paste.ubuntu.com/15212339/
Code: * Referenced by a CALL at Addresses: |:00401C74 , :00401C8B , :00401CA8 , :00401CCD , :00401CEA |:00401D05 , :00401D22 , :00401D38 | :00401334 60 pushad :00401335 0FA2 cpuid :00401337 61 popad :00401338 0F31 rdtsc :0040133A C3 ret :0040133B 90 nop :0040133C 57 push edi :0040133D BA60504000 mov edx, 00405060 :00401342 B940420F00 mov ecx, 000F4240 :00401347 B003 mov al, 03 :00401349 89D7 mov edi, edx :0040134B F3 repz :0040134C AA stosb :0040134D 5F pop edi :0040134E C3 ret :0040134F 90 nop :00401350 31D2 xor edx, edx :00401352 31C0 xor eax, eax * Referenced by a (U)nconditional or (C)onditional Jump at Address: |:00401367(C) | :00401354 C704956050400003030303 mov dword ptr [4*edx+00405060], 03030303 :0040135F 40 inc eax :00401360 89C2 mov edx, eax :00401362 3D90D00300 cmp eax, 0003D090 :00401367 75EB jne 00401354 :00401369 C3 ret :0040136A 6690 nop :0040136C 57 push edi :0040136D BF60504000 mov edi, 00405060 :00401372 B803000000 mov eax, 00000003 :00401377 B940420F00 mov ecx, 000F4240 :0040137C FC cld :0040137D F3 repz :0040137E AA stosb :0040137F 5F pop edi :00401380 C3 ret :00401381 8D7600 lea esi, dword ptr [esi+00] :00401384 57 push edi :00401385 BF60504000 mov edi, 00405060 :0040138A B803030303 mov eax, 03030303 :0040138F B990D00300 mov ecx, 0003D090 :00401394 FC cld :00401395 F3 repz :00401396 AB stosd :00401397 5F pop edi :00401398 C3 ret |
|||
![]() |
|
revolution 27 Feb 2016, 03:45
l4m2: Do you know how to write assembly code? Do you need help to write assembly code? We can help you if you ask.
Posting a binary file (no matter how you disguise it as text) doesn't show what you are doing. How do we know it is not malware? |
|||
![]() |
|
l4m2 27 Feb 2016, 03:49
revolution wrote: l4m2: Do you know how to write assembly code? Do you need help to write assembly code? We can help you if you ask. So should I give the whole .alf file (I used W32dasm to unassemble)? What's the difference between it and a binary file? |
|||
![]() |
|
revolution 27 Feb 2016, 03:58
l4m2 wrote: So should I give the whole .alf file (I used W32dasm to unassemble)? l4m2 wrote: What's the difference between it and a binary file? |
|||
![]() |
|
l4m2 27 Feb 2016, 14:24
I changed the title to reflect the true subject
|
|||
![]() |
|
revolution 28 Feb 2016, 13:20
A simple assembly version might look look this:
Code: format pe console include 'win32ax.inc' TEST_LENGTH = 1000000 TEST_REPITITIONS = 10 TEST_WARM_UPS = 3 macro rdtsc { pushad cpuid popad rdtsc } .data align 32 dummy_buffer rb TEST_LENGTH .code proc start uses ebx edi locals stosb_time_low rd 1 stosb_time_high rd 1 stosd_time_low rd 1 stosd_time_high rd 1 print_length rd 1 dummy rd 1 output_string rb 1024 endl mov ebx,TEST_REPITITIONS .loop_repitition: mov edi,TEST_WARM_UPS .loop_warm_up: stdcall test_stosb,dummy_buffer,TEST_LENGTH mov [stosb_time_low],eax mov [stosb_time_high],edx stdcall test_stosd,dummy_buffer,TEST_LENGTH mov [stosd_time_low],eax mov [stosd_time_high],edx dec edi jnz .loop_warm_up invoke wsprintf,addr output_string,<'Stosb: %I64u - Stosd: %I64u',13,10>,\ [stosb_time_low],[stosb_time_high],[stosd_time_low],[stosd_time_high] mov [print_length],eax invoke GetStdHandle,STD_OUTPUT_HANDLE invoke WriteFile,eax,addr output_string,[print_length],addr dummy,NULL dec ebx jnz .loop_repitition invoke ExitProcess,0 endp proc test_stosb uses edi,buffer,length ;return edx:edx = clock count rdtsc push edx eax mov edi,[buffer] mov ecx,[length] mov eax,3 rep stosb rdtsc pop ecx edi sub eax,ecx sbb edx,edi ret endp proc test_stosd uses edi,buffer,length ;return edx:edx = clock count rdtsc push edx eax mov edi,[buffer] mov ecx,[length] mov eax,0x03030303 shr ecx,2 rep stosd rdtsc pop ecx edi sub eax,ecx sbb edx,edi ret endp .end start |
|||
![]() |
|
l4m2 18 Mar 2016, 05:19
revolution wrote: A simple assembly version might look look this: Not a good idea to turn the code into this kind of assembly code when the turning breaks the appearance |
|||
![]() |
|
Tyler 19 Apr 2016, 06:21
In the future, you can use the options "-masm=Intel -S -o -" to get GCC to print assembly to the terminal. Remove the "-o -" to save it to a file.
It still likely won't be directly assemblable by FASM, but it'll be close (Intel syntax, at least) and more correct than a disasm. |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.