flat assembler
Message board for the users of flat assembler.

Index > Windows > Please download my program, run it and post here the results

Author
Thread Post new topic Reply to topic
Inagawa



Joined: 24 Mar 2012
Posts: 153
Inagawa
This test is no longer relevant. I simply wanted to know how many of you are on post-Nehalem architectures. In the end I decided to choose compatibility over comfort (the finished macro here)

A big thank you to all who helped me Smile


Last edited by Inagawa on 03 May 2012, 15:12; edited 3 times in total
Post 02 May 2012, 21:27
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
Source code please .... Wink

I don't trust people anymore on the net. You get back door-ed and your ass is a bot. Not me...please, post the source code.

EDIT: Seems legit but still

KERNEL : ExitProcess, GetCurrentProcess, SetPriorityClass, Sleep
MSVCRT : printf, getchar
Post 02 May 2012, 21:42
View user's profile Send private message Reply with quote
Enko



Joined: 03 Apr 2007
Posts: 678
Location: Mar del Plata
Enko
CPU not suported. AMD Athlon 1600+
Post 02 May 2012, 21:48
View user's profile Send private message Reply with quote
Inagawa



Joined: 24 Mar 2012
Posts: 153
Inagawa
Enko: Anyone with CPU older than cca 3 years(a rough guess) won't be able to run this.

typedef: Don't I seem like a trustworthy guy? Razz

Jokes aside, I will release the source code once I gather enough information from the program, though I can swear by my good conscience there isn't anything even remotely malicious about the code, it is simply to help me tweak the final version.

Edit: Seems you need at least a Nehalem arch. to run this.
Post 02 May 2012, 21:55
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
Code:
-=== 1048575 repetitions x 20 runs ===-

====> Run 001 = 1283 cycles
====> Run 002 = 1283 cycles
====> Run 003 = 1283 cycles
====> Run 004 = 1283 cycles
====> Run 005 = 1283 cycles
====> Run 006 = 1283 cycles
====> Run 007 = 1283 cycles
====> Run 008 = 1283 cycles
====> Run 009 = 1283 cycles
====> Run 010 = 1283 cycles
====> Run 011 = 1283 cycles
====> Run 012 = 1283 cycles
====> Run 013 = 1283 cycles
====> Run 014 = 1283 cycles
====> Run 015 = 1283 cycles
====> Run 016 = 1283 cycles
====> Run 017 = 1283 cycles
====> Run 018 = 1283 cycles
====> Run 019 = 1283 cycles
====> Run 020 = 1283 cycles

====> Average: 1283 cycles
    


AMD Phenom II X6 1055T 2.8 GHz
Post 02 May 2012, 22:30
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
[edit]WARNING: The disassembly comes from an older version of the program.[/edit]
Code:
.data:00401000 ;
.data:00401000 ; +-------------------------------------------------------------------------+
.data:00401000 ; ¦     This file is generated by The Interactive Disassembler (IDA)        ¦
.data:00401000 ; ¦     Copyright (c) 2010 by Hex-Rays SA, <support@hex-rays.com>           ¦
.data:00401000 ; ¦                      Licensed to: Freeware version                      ¦
.data:00401000 ; +-------------------------------------------------------------------------+
.data:00401000 ;
.data:00401000 ; Input MD5   : F48ABD8AB8116CA38179EDF1D28B2F71
.data:00401000
.data:00401000 ; File Name   : C:\Users\Hernan\AppData\Local\Temp\Rar$DR54.080\PC.EXE
.data:00401000 ; Format      : Portable executable for 80386 (PE)
.data:00401000 ; Imagebase   : 400000
.data:00401000 ; Section 1. (virtual address 00001000)
.data:00401000 ; Virtual size                  : 00000020 (     32.)
.data:00401000 ; Section size in file          : 00000200 (    512.)
.data:00401000 ; Offset to raw data for section: 00000400
.data:00401000 ; Flags C0000040: Data Readable Writable
.data:00401000 ; Alignment     : default
.data:00401000
.data:00401000                 Ideal
.data:00401000                 p686
.data:00401000                 pmmx
.data:00401000                 model flat
.data:00401000
.data:00401000 ; ---------------------------------------------------------------------------
.data:00401000
.data:00401000 ; Segment type: Pure data
.data:00401000 ; Segment permissions: Read/Write
.data:00401000 segment         _data para public 'DATA' use32
.data:00401000                 assume cs:_data
.data:00401000                 ;org 401000h
.data:00401000 dword_401000    dd 0                    ; DATA XREF: .text:004030B2w
.data:00401000                                         ; .text:00403187r ...
.data:00401004 dword_401004    dd 0                    ; DATA XREF: .text:004030BCw
.data:00401004                                         ; .text:0040317Fr ...
.data:00401008 dword_401008    dd 0                    ; DATA XREF: .text:004030C6w
.data:00401008                                         ; .text:00403121r ...
.data:0040100C dword_40100C    dd 0                    ; DATA XREF: .text:loc_4030D0w
.data:0040100C                                         ; .text:00403119r ...
.data:00401010                 db    0
.data:00401011                 db    0
.data:00401012                 db    0
.data:00401013                 db    0
.data:00401014 dword_401014    dd 0                    ; DATA XREF: .text:004031B5w
.data:00401014                                         ; .text:004031BBw ...
.data:00401018 dword_401018    dd 0                    ; DATA XREF: .text:loc_403248r
.data:0040101C dword_40101C    dd 0                    ; DATA XREF: .text:00403090w
.data:0040101C                                         ; sub_4031F1+Br
.data:00401020                 align 200h
.data:00401020 ends            _data
.data:00401020
.data:00402000 ; Section 2. (virtual address 00002000)
.data:00402000 ; Virtual size                  : 00000001 (      1.)
.data:00402000 ; Section size in file          : 00000200 (    512.)
.data:00402000 ; Offset to raw data for section: 00000600
.data:00402000 ; Flags C0000040: Data Readable Writable
.data:00402000 ; Alignment     : default
.data:00402000 ; ---------------------------------------------------------------------------
.data:00402000
.data:00402000 ; Segment type: Pure data
.data:00402000 ; Segment permissions: Read/Write
.data:00402000 segment         _data para public 'DATA' use32
.data:00402000                 assume cs:_data
.data:00402000                 ;org 402000h
.data:00402000                 db 90h, 1FFh dup(0)
.data:00402000 ends            _data
.data:00402000
.text:00403000 ; Section 3. (virtual address 00003000)
.text:00403000 ; Virtual size                  : 000002A0 (    672.)
.text:00403000 ; Section size in file          : 00000400 (   1024.)
.text:00403000 ; Offset to raw data for section: 00000800
.text:00403000 ; Flags 60000020: Text Executable Readable
.text:00403000 ; Alignment     : default
.text:00403000 ; ---------------------------------------------------------------------------
.text:00403000
.text:00403000 ; Segment type: Pure code
.text:00403000 ; Segment permissions: Read/Execute
.text:00403000 segment         _text para public 'CODE' use32
.text:00403000                 assume cs:_text
.text:00403000                 ;org 403000h
.text:00403000                 assume es:nothing, ss:nothing, ds:_data, fs:nothing, gs:nothing
.text:00403000
.text:00403000 ; ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
.text:00403000
.text:00403000
.text:00403000                 public start
.text:00403000 proc            start near
.text:00403000                 mov     eax, 80000001h
.text:00403005                 cpuid
.text:00403007                 bt      edx, 1Bh
.text:0040300B                 jb      short loc_40304D
.text:0040300D                 call    loc_403036
.text:0040300D ; ---------------------------------------------------------------------------
.text:00403012 aSorryYourCpuIs db 'Sorry, your CPU is not supported.',0Dh,0Ah,0
.text:00403036 ; ---------------------------------------------------------------------------
.text:00403036
.text:00403036 loc_403036:                             ; CODE XREF: start+Dp
.text:00403036                 call    [ds:printf]
.text:00403036 endp            start
.text:00403036
.text:0040303C                 add     esp, 4
.text:0040303F                 call    [ds:getchar]
.text:00403045                 push    0
.text:00403047                 call    [ds:ExitProcess]
.text:0040304D
.text:0040304D loc_40304D:                             ; CODE XREF: start+Bj
.text:0040304D                 push    eax
.text:0040304E                 push    ecx
.text:0040304F                 push    edx
.text:00403050                 push    ebx
.text:00403051                 push    ebp
.text:00403052                 push    esi
.text:00403053                 push    edi
.text:00403054                 push    14h
.text:00403056                 push    0FFFFFh         ; dwPriorityClass
.text:0040305B                 call    loc_403087
.text:0040305B ; ---------------------------------------------------------------------------
.text:00403060 aIRepetitionsXI db '-=== %i repetitions x %i runs ===-',0Dh,0Ah
.text:00403060                 db 0Dh,0Ah,0
.text:00403087 ; ---------------------------------------------------------------------------
.text:00403087
.text:00403087 loc_403087:                             ; CODE XREF: .text:0040305Bp
.text:00403087                 call    [ds:printf]
.text:0040308D                 add     esp, 0Ch
.text:00403090                 mov     [ds:dword_40101C], 14h
.text:0040309A                 call    [ds:GetCurrentProcess]
.text:004030A0                 push    20h
.text:004030A2                 push    eax             ; hProcess
.text:004030A3
.text:004030A3 loc_4030A3:
.text:004030A3                 call    [ds:SetPriorityClass]
.text:004030A9                 xor     edi, edi
.text:004030AB                 push    edi
.text:004030AC                 mov     ebx, 1
.text:004030B1                 push    ebx
.text:004030B2                 mov     [ds:dword_401000], 0FFFFFFFFh
.text:004030BC                 mov     [ds:dword_401004], 0FFFFFFFFh
.text:004030C6                 mov     [ds:dword_401008], 0FFFFFFFFh
.text:004030D0
.text:004030D0 loc_4030D0:
.text:004030D0                 mov     [ds:dword_40100C], 0FFFFFFFFh
.text:004030DA                 nop
.text:004030DB                 nop
.text:004030DC                 nop
.text:004030DD                 nop
.text:004030DE                 nop
.text:004030DF                 nop
.text:004030E0
.text:004030E0 loc_4030E0:                             ; CODE XREF: sub_4031F1+12j
.text:004030E0                 mov     esi, 0FFFFFh
.text:004030E5                 xor     eax, eax
.text:004030E7                 cpuid
.text:004030E9                 xor     eax, eax
.text:004030EB                 cpuid
.text:004030ED                 xor     eax, eax
.text:004030EF                 cpuid
.text:004030F1                 push    0               ; dwMilliseconds
.text:004030F3                 call    [ds:Sleep]
.text:004030F9                 push    esi
.text:004030FA                 nop
.text:004030FB                 nop
.text:004030FC                 nop
.text:004030FD                 nop
.text:004030FE                 nop
.text:004030FF                 nop
.text:00403100
.text:00403100 loc_403100:                             ; CODE XREF: .text:00403135j
.text:00403100                 xor     eax, eax
.text:00403102                 cpuid
.text:00403104                 rdtsc
.text:00403106                 push    edx
.text:00403107                 push    eax
.text:00403108                 xor     eax, eax
.text:0040310A                 cpuid
.text:0040310C                 xor     eax, eax
.text:0040310E                 cpuid
.text:00403110                 invlpg  cl
.text:00403113                 pop     ecx
.text:00403114                 sub     eax, ecx
.text:00403116                 pop     ecx
.text:00403117                 sbb     edx, ecx
.text:00403119                 cmp     edx, [ds:dword_40100C]
.text:0040311F                 jnz     short loc_403129
.text:00403121                 cmp     eax, [ds:dword_401008]
.text:00403127                 jnb     short loc_403134
.text:00403129
.text:00403129 loc_403129:                             ; CODE XREF: .text:0040311Fj
.text:00403129                 mov     [ds:dword_401008], eax
.text:0040312E                 mov     [ds:dword_40100C], edx
.text:00403134
.text:00403134 loc_403134:                             ; CODE XREF: .text:00403127j
.text:00403134                 dec     esi
.text:00403135                 jnz     short loc_403100
.text:00403137                 pop     esi
.text:00403138                 push    esi
.text:00403139                 push    0               ; dwMilliseconds
.text:0040313B                 call    [ds:Sleep]
.text:00403141                 nop
.text:00403142                 nop
.text:00403143                 nop
.text:00403144                 nop
.text:00403145                 nop
.text:00403146                 nop
.text:00403147                 nop
.text:00403148                 nop
.text:00403149                 nop
.text:0040314A                 nop
.text:0040314B                 nop
.text:0040314C                 nop
.text:0040314D                 nop
.text:0040314E                 nop
.text:0040314F                 nop
.text:00403150
.text:00403150 loc_403150:                             ; CODE XREF: .text:0040319Bj
.text:00403150                 push    esi
.text:00403151                 push    edi
.text:00403152                 invlpg  cl
.text:00403155                 push    edx
.text:00403156                 push    eax
.text:00403157                 xor     eax, eax
.text:00403159                 cpuid
.text:0040315B                 mov     ecx, 12Ch
.text:00403160
.text:00403160 loc_403160:                             ; CODE XREF: .text:0040316Ej
.text:00403160                 mov     eax, 0FFFFFFFFh
.text:00403165                 mov     edx, 0FFFFFFFFh
.text:0040316A                 rcr     eax, 1Fh
.text:0040316D                 dec     ecx
.text:0040316E                 jnz     short loc_403160
.text:00403170                 xor     eax, eax
.text:00403172                 cpuid
.text:00403174                 invlpg  cl
.text:00403177                 pop     ecx
.text:00403178                 sub     eax, ecx
.text:0040317A                 pop     ecx
.text:0040317B                 sbb     edx, ecx
.text:0040317D                 pop     edi
.text:0040317E                 pop     esi
.text:0040317F                 cmp     edx, [ds:dword_401004]
.text:00403185                 jnz     short loc_40318F
.text:00403187                 cmp     eax, [ds:dword_401000]
.text:0040318D                 jnb     short loc_40319A
.text:0040318F
.text:0040318F loc_40318F:                             ; CODE XREF: .text:00403185j
.text:0040318F                 mov     [ds:dword_401000], eax
.text:00403194                 mov     [ds:dword_401004], edx
.text:0040319A
.text:0040319A loc_40319A:                             ; CODE XREF: .text:0040318Dj
.text:0040319A                 dec     esi
.text:0040319B                 jnz     short loc_403150
.text:0040319D                 pop     esi
.text:0040319E                 mov     eax, [ds:dword_401000]
.text:004031A3                 sub     eax, [ds:dword_401008]
.text:004031A9                 mov     edx, [ds:dword_401004]
.text:004031AF                 sbb     edx, [ds:dword_40100C]
.text:004031B5                 add     [ds:dword_401014], eax
.text:004031BB                 adc     [ds:dword_401014], edx
.text:004031C1                 pop     ebx
.text:004031C2                 cmp     eax, 0
.text:004031C5                 jle     short loc_4031FB
.text:004031C7                 pop     edi
.text:004031C8                 inc     edi
.text:004031C9                 push    edi
.text:004031CA                 push    eax
.text:004031CB                 push    eax
.text:004031CC                 push    ebx
.text:004031CD                 push    3
.text:004031CF                 call    sub_4031F1
.text:004031CF ; ---------------------------------------------------------------------------
.text:004031D4 aRun0IICycles   db '====> Run %0*i = %i cycles',0Dh,0Ah,0
.text:004031F1
.text:004031F1 ; ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
.text:004031F1
.text:004031F1
.text:004031F1 proc            sub_4031F1 near         ; CODE XREF: .text:004031CFp
.text:004031F1                 call    [ds:printf]
.text:004031F7                 add     esp, 10h
.text:004031FA                 pop     eax
.text:004031FB
.text:004031FB loc_4031FB:                             ; CODE XREF: .text:004031C5j
.text:004031FB                 inc     ebx
.text:004031FC                 cmp     ebx, [ds:dword_40101C]
.text:00403202                 push    ebx
.text:00403203                 jle     loc_4030E0
.text:00403209                 pop     ebx
.text:0040320A                 pop     edi
.text:0040320B                 test    edi, edi
.text:0040320D                 jnz     short loc_403243
.text:0040320F                 push    eax             ; char *
.text:00403210                 call    loc_403237
.text:00403210 ; ---------------------------------------------------------------------------
.text:00403215 aTheCodeWasProb db 'The code was probably too short',0Dh,0Ah,0
.text:00403237 ; ---------------------------------------------------------------------------
.text:00403237
.text:00403237 loc_403237:                             ; CODE XREF: sub_4031F1+1Fp
.text:00403237                 call    [ds:printf]
.text:00403237 endp            sub_4031F1
.text:00403237
.text:0040323D                 add     esp, 4
.text:00403240                 pop     eax
.text:00403241                 jmp     short loc_40327C
.text:00403243 ; ---------------------------------------------------------------------------
.text:00403243
.text:00403243 loc_403243:                             ; CODE XREF: sub_4031F1+1Cj
.text:00403243                 mov     eax, [ds:dword_401014]
.text:00403248
.text:00403248 loc_403248:
.text:00403248                 mov     edx, [ds:dword_401018]
.text:0040324E                 idiv    edi
.text:00403250                 push    eax             ; dwPriorityClass
.text:00403251                 call    loc_403273
.text:00403251 ; ---------------------------------------------------------------------------
.text:00403256 aAverageICycles db 0Ah
.text:00403256                 db 0Dh,'====> Average: %i cycles',0Dh,0Ah,0
.text:00403273 ; ---------------------------------------------------------------------------
.text:00403273
.text:00403273 loc_403273:                             ; CODE XREF: .text:00403251p
.text:00403273                 call    [ds:printf]
.text:00403279                 add     esp, 8
.text:0040327C
.text:0040327C loc_40327C:                             ; CODE XREF: .text:00403241j
.text:0040327C                 call    [ds:GetCurrentProcess]
.text:00403282                 push    20h
.text:00403284                 push    eax
.text:00403285                 call    [ds:SetPriorityClass]
.text:0040328B                 pop     edi
.text:0040328C                 pop     esi
.text:0040328D                 pop     ebp
.text:0040328E                 pop     ebx
.text:0040328F                 pop     edx
.text:00403290                 pop     ecx
.text:00403291                 pop     eax
.text:00403292                 call    [ds:getchar]
.text:00403298                 push    0
.text:0040329A                 call    [ds:ExitProcess]
.text:0040329A ; ---------------------------------------------------------------------------
.text:004032A0                 dd 2 dup(0)
.text:004032A8                 dd 6 dup(0)
.text:004032C0                 dd 0
.text:004032C4                 dd 4Fh dup(0)
.text:004032C4 ends            _text
.text:004032C4
.idata:0040406C ;
.idata:0040406C ; Imports from KERNEL32.DLL
.idata:0040406C ;
.idata:0040406C ; Section 4. (virtual address 00004000)
.idata:0040406C ; Virtual size                  : 000000EC (    236.)
.idata:0040406C ; Section size in file          : 00000200 (    512.)
.idata:0040406C ; Offset to raw data for section: 00000C00
.idata:0040406C ; Flags C0000040: Data Readable Writable
.idata:0040406C ; Alignment     : default
.idata:0040406C ; ---------------------------------------------------------------------------
.idata:0040406C
.idata:0040406C ; Segment type: Externs
.idata:0040406C ; _idata
.idata:0040406C ; void __stdcall ExitProcess(UINT uExitCode)
.idata:0040406C                 extrn ExitProcess:dword ; DATA XREF: .text:00403047r
.idata:0040406C                                         ; .text:0040329Ar
.idata:00404070 ; HANDLE GetCurrentProcess(void)
.idata:00404070                 extrn GetCurrentProcess:dword ; DATA XREF: .text:0040309Ar
.idata:00404070                                         ; .text:loc_40327Cr
.idata:00404074 ; BOOL __stdcall SetPriorityClass(HANDLE hProcess,DWORD dwPriorityClass)
.idata:00404074                 extrn SetPriorityClass:dword ; DATA XREF: .text:loc_4030A3r
.idata:00404074                                         ; .text:00403285r
.idata:00404078 ; void __stdcall Sleep(DWORD dwMilliseconds)
.idata:00404078                 extrn Sleep:dword       ; DATA XREF: .text:004030F3r
.idata:00404078                                         ; .text:0040313Br
.idata:0040407C
.idata:00404080
.idata:004040CC ;
.idata:004040CC ; Imports from MSVCRT.DLL
.idata:004040CC ;
.idata:004040CC ; int printf(const char *,...)
.idata:004040CC                 extrn printf:dword      ; DATA XREF: start:loc_403036r
.idata:004040CC                                         ; .text:loc_403087r ...
.idata:004040D0 ; int getchar(void)
.idata:004040D0                 extrn getchar:dword     ; DATA XREF: .text:0040303Fr
.idata:004040D0                                         ; .text:00403292r
.idata:004040D4
.idata:004040D4
.idata:004040D4    


My results:
Code:
-=== 1048575 repetitions x 20 runs ===-

====> Run 001 = 2120 cycles
====> Run 002 = 2120 cycles
====> Run 003 = 2120 cycles
====> Run 004 = 2120 cycles
====> Run 005 = 2120 cycles
====> Run 006 = 2120 cycles
====> Run 007 = 2120 cycles
====> Run 008 = 2120 cycles
====> Run 009 = 2120 cycles
====> Run 010 = 2120 cycles
====> Run 011 = 2120 cycles
====> Run 012 = 2120 cycles
====> Run 013 = 2120 cycles
====> Run 014 = 2120 cycles
====> Run 015 = 2120 cycles
====> Run 016 = 2120 cycles
====> Run 017 = 2120 cycles
====> Run 018 = 2120 cycles
====> Run 019 = 2120 cycles
====> Run 020 = 2120 cycles

====> Average: 2120 cycles    
Core i3-2310M 2.1 GHz


Last edited by LocoDelAssembly on 03 May 2012, 00:09; edited 1 time in total
Post 02 May 2012, 23:27
View user's profile Send private message Reply with quote
Inagawa



Joined: 24 Mar 2012
Posts: 153
Inagawa
Thanks a lot for the runs. I have updated the version so it gives me more info and I don't have to bother you with constant tweaks. (It's in the first post.)
Post 02 May 2012, 23:43
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
Code:
-=== 1048575 repetitions x 20 runs ===- 

====> Run 001 = 5 cycles 
====> Run 002 = 5 cycles 
====> Run 003 = 5 cycles 
====> Run 004 = 5 cycles 
====> Run 005 = 5 cycles 
====> Run 006 = 5 cycles 
====> Run 007 = 5 cycles 
====> Run 008 = 5 cycles 
====> Run 009 = 5 cycles 
====> Run 010 = 5 cycles 
====> Run 011 = 5 cycles 
====> Run 012 = 5 cycles 
====> Run 013 = 5 cycles 
====> Run 014 = 5 cycles 
====> Run 015 = 5 cycles 
====> Run 016 = 5 cycles 
====> Run 017 = 5 cycles 
====> Run 018 = 5 cycles 
====> Run 019 = 5 cycles 
====> Run 020 = 5 cycles 

====> Average: 5 cycles 
    

HPE h8z series

AMD FX-8150 eight-core processor [3.6GHz, 8MB L2/8MB L3 Cache]
16GB DDR3-1333MHz SDRAM [4 DIMMs]
120TB SATA SSD RAID 0 (2 x 160GB HDD)
600W Power supply
3GB AMD Radeon HD 7950 [Dual Bracket, DVI, HDMI, 2x mini-DP]
Blu-ray player/writer & SuperMulti DVD burner
Win7[64Bit]
15-in-1 memory card reader, 4 USB 2.0 (front), 2 USB 3.0 (top)
Post 03 May 2012, 01:32
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
Cool
Post 03 May 2012, 01:33
View user's profile Send private message Reply with quote
Enko



Joined: 03 Apr 2007
Posts: 678
Location: Mar del Plata
Enko
Cpu not supported. This time Intel T4500.
Post 03 May 2012, 02:52
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17477
Location: In your JS exploiting you and your system
revolution
LocoDelAssembly wrote:
Code:
.text:00403000                 mov     eax, 80000001h
.text:00403005                 cpuid
.text:00403007                 bt      edx, 1Bh    
Inagawa: Why do you test for the RDSTCP instruction? You are not using it.
Post 03 May 2012, 03:42
View user's profile Send private message Visit poster's website Reply with quote
Inagawa



Joined: 24 Mar 2012
Posts: 153
Inagawa
Of course I am using it. I'll post the code

Code:
;=============================================================================
;=== PERFORMANCE COUNTER MACRO ===============================================
;=============================================================================
section '.data' data readable writeable

  ;
  ; Optimal alignment for a QWORD
  ;
  align                        8
  __Count                   dq 0
  __Overhead                dq 0
  __LoopCounter             dd 0
  __Average                 dq 0
  __AverageLoopCount        dd 0

macro StartPerformanceCounter InnerLoopCount = 0xFFFFF, AverageLoopCount = 20, ProcessPriority = 32
{
  local AverageLoop, OverheadLoop, WorkingLoop, Lower, Higher, RDTSCP_Compliant

  ;==============================================
  ;=== INITIALIZE ===============================

  ;
  ; Preserve the registers
  ;
  push        eax ecx edx ebx ebp esi edi

  ;
  ; First, check if the RDTSCP instruction is supported,
  ; exit if it's not.
  ;
  mov          eax, 0x80000001
  cpuid
  bt           edx, 0x1B
  jc           RDTSCP_Compliant
  cinvoke      printf, <'Sorry, your CPU is not supported.', 13, 10>
  cinvoke      getchar
  invoke       ExitProcess, 0



RDTSCP_Compliant:

  ;
  ; Output the info string
  ;
  cinvoke      printf, <'-=== %i repetitions x %i runs ===-', 13, 10, 13, 10>, InnerLoopCount, AverageLoopCount

  ;
  ; Initialize the AverageLoopCount
  ;
  mov         [__AverageLoopCount], AverageLoopCount

  ;
  ; Set the thread priority
  ;
  invoke       GetCurrentProcess
  invoke       SetPriorityClass, eax, ProcessPriority

  ;
  ; Initialize the successful loop counter
  ;
  xor         edi, edi
  push        edi

  ;
  ; Initialize the AverageLoop counter
  ;
  mov         ebx, 1
  push        ebx

  ;
  ; Initialize the working variables
  ;
  mov          DWORD [__Count], -1
  mov          DWORD [__Count+4], -1
  mov          DWORD [__Overhead], -1
  mov          DWORD [__Overhead+4], -1
  mov          DWORD [__Average], -1
  mov          DWORD [__Average+4], 0

  ;===============================================
  ;=== THE AVERAGE LOOP ===========================

  ;
  ; This loop repeats the whole process of calculating
  ; an overhead and getting the cycle results.
  ; It reports the cycle-count after each run and
  ; the Average of all runs at the end.
  ;
  ; Align 16 is recommended for P6+
  ;
align 16
AverageLoop:

  ;
  ; Address visible outside of this macro.
  ;
  __AverageLoop equ AverageLoop

  ;
  ; Initialize ESI (inner loop count)
  ;
  mov          esi, InnerLoopCount

  ;
  ; Intel suggests warming up the CPUID
  ;
  xor          eax, eax
  cpuid
  xor          eax, eax
  cpuid
  xor          eax, eax
  cpuid

  ;
  ; Start a new time slice for the overhead run
  ;
  invoke       Sleep, 0

  ;
  ; Save the inner loop count
  ;
  push         esi

  ;===============================================
  ;=== THE OVERHEAD LOOP =========================

  ;
  ; This loop measures the overhead to be subtracted
  ; from the final cycle count.
  ;
  ; Align 16 is recommended for P6+
  ;
align 16
OverheadLoop:

  ;
  ; Serialize
  ; Read the TimeStampCounter
  ;
  xor          eax, eax
  cpuid
  rdtsc

  ;
  ; Save the HO 32 bits of starting count
  ; Save the LO 32 bits of starting count
  ;
  push         edx
  push         eax

  ;
  ; Force the instructions to finish
  ;
  xor          eax, eax
  cpuid

  ;
  ; Call RDTSCP again
  ;
  xor          eax, eax
  cpuid
  rdtscp

  ;
  ; Restore the LO 32 bits of starting count
  ; Subtract the LO bits
  ;
  pop          ecx
  sub          eax, ecx

  ;
  ; Restore the HO 32 bits of starting count
  ; Subtract with carry the HO bits
  ;
  pop          ecx
  sbb          edx, ecx

  ;
  ; Check if this loop has lower cycle count
  ;
  cmp          edx, DWORD [__Overhead+4]
  jne          Lower
  cmp          eax, DWORD [__Overhead]
  jnb          Higher

Lower:

  ;
  ; Save the lowest cycle count
  ;
  mov          DWORD [__Overhead], eax
  mov          DWORD [__Overhead+4], edx

Higher:

  ;
  ; Repeat until ESI (inner loop count) is at zero
  ;
  dec          esi
  jnz          OverheadLoop
  ;=== THE OVERHEAD LOOP END =====================
  ;===============================================

  ;
  ; Reinitialize ESI (inner loop count) for the next run
  ;
  pop          esi
  push         esi

  ;
  ; Start a new time slice for the working loop
  ;
  invoke       Sleep, 0

  ;===============================================
  ;=== THE WORKING LOOP ==========================

  ;
  ; This loop counts the cycles between the Start
  ; and End macro
  ;
  ; Align 16 is recommended for P6+
  ;
align 16
WorkingLoop:

  ;
  ; Address visible outside of this macro.
  ;
  __WorkingLoop equ WorkingLoop

  ;
  ; Preserve the registers
  ; They have to be stored before the call
  ; to RDTSCP, to avoid influencing the timing
  ;
  push        esi edi

  ;
  ; Call RDTSCP again
  ;
  rdtscp

  ;
  ; Save the HO 32 bits of starting count
  ; Save the LO 32 bits of starting count
  ;
  push         edx
  push         eax

  ;
  ; Force the instructions to finish
  ;
  xor          eax, eax
  cpuid
}

macro EndPerformanceCounter
{
  local Lower, Higher, Exit, UnsuccessfulLoop, OutputAverage

  ;
  ; Call RDTSCP again
  ;
  xor          eax, eax
  cpuid
  rdtscp

  ;
  ; Restore the LO 32 bits of starting count
  ; Subtract the LO bits
  ;
  pop          ecx
  sub          eax, ecx

  ;
  ; Restore the HO 32 bits of starting count
  ; Subtract with carry the HO bits
  ;
  pop          ecx
  sbb          edx, ecx

  ;
  ; Return the original registers.
  ; They have to be popped here to avoid
  ; messing up the RDTSCP code
  ;
  pop          edi esi

  ;
  ; Check if this loop has lower cycle count
  ;
  cmp          edx, DWORD [__Count+4]
  jne          Lower
  cmp          eax, DWORD [__Count]
  jnb          Higher

Lower:

  ;
  ; Save the lowest cycle count
  ;
  mov          DWORD [__Count], eax
  mov          DWORD [__Count+4], edx

Higher:

  ;
  ; Repeat until the __LoopCounter is at 0
  ;
  dec          esi
  jnz          __WorkingLoop

  ;=== THE WORKING LOOP END ======================
  ;===============================================

  ;
  ; Reinitialize ESI (inner loop count) for the next run
  ;
  pop          esi

  ;
  ; Update the results.
  ;
  mov          eax, DWORD  [__Count]
  sub          eax, DWORD  [__Overhead]
  mov          edx, DWORD  [__Count+4]
  sbb          edx, DWORD  [__Overhead+4]

  ;
  ; Add to the Average
  ;
  add          DWORD [__Average], eax
  adc          DWORD [__Average], edx

  ;
  ; Print the result
  ; Repeat until the EBX is at 10
  ;
  pop          ebx
  cmp          eax, 0
  jle          UnsuccessfulLoop

  pop          edi
  inc          edi
  push         edi
  push         eax
  cinvoke      printf, <'====> Pass %0*i = %i cycles', 13, 10>, 3, ebx, eax
  pop          eax

  ;
  ; Some code sequences are too short to be properly
  ; measured. In that case the loop is skipped entirely.
  ;
UnsuccessfulLoop:

  ;
  ; Repeat the AverageLoop until EBX reaches the max limit
  ;
  inc          ebx
  cmp          ebx, [__AverageLoopCount]
  push         ebx
  jng          __AverageLoop

  ;=== THE AVERAGE LOOP END =======================
  ;===============================================

  ;
  ; Restore EBX and EDI
  ;
  pop          ebx
  pop          edi

  ;
  ; If EDI (successful loop count) is zero, there
  ; is no point in computing an average
  ;
  ; In such case, output an error message
  ;
  test         edi, edi
  jnz          OutputAverage
  push         eax
  cinvoke      printf, <'The code was probably too short', 13, 10>
  pop          eax
  jmp          Exit

  ;
  ; Compute the Average of all runs. (Average is the sum
  ; of all the successful runs divided by the number of runs (EDI))
  ;
OutputAverage:

  ;
  ; Compute the Average and output the result
  ;
  mov          eax, DWORD [__Average]
  mov          edx, DWORD [__Average+4]
  idiv         edi
  cinvoke      printf, <10, 13, '====> Average: %i cycles', 13, 10, 13, 10>, eax

  ;
  ; Exit the macro prematurely. It is probably because the
  ; tested code was too short, or the run just didn't run
  ; "right". Restart the macro several times and you will
  ; get results
  ;
Exit:

  ;
  ; Restore the process priority
  ;
  invoke       GetCurrentProcess
  invoke       SetPriorityClass, eax, NORMAL_PRIORITY_CLASS

  ;
  ; Return the original registers
  ;
  pop          edi esi ebp ebx edx ecx eax
}
    


You simply call it like so

Code:
StartPerformanceCounter ;You can also specify the number of reps, how many passes (runs) there are and what process priority is this test using.
;StartPerformanceCounter 155550, 50, REALTIME__PRIORITY_CLASS
;StartPerformanceCounter,,REALTIME__PRIORITY_CLASS ; Watch the commas!

  mov          ecx, 300  ; The code to be timed
@@:
  mov          eax, -1
  mov          edx, -1
  rcr          eax, 31
  dec          ecx
  jnz          @B

  EndPerformanceCounter
    


I have been working long and hard for this macro. I have used a basic structure from a MASM32 version of a much simpler PerfCounter. There is no register contamination between the macros, so you shouldn't be able to break it by doing anything inside it.

I hope this will be useful to someone to time their code, also it would be sweet if anyone with a knowledge on RDTSCP checked the code. Smile

Please keep in mind that I'm a rookie, before you try to bash my head in for possibly coding something wrong.


Last edited by Inagawa on 03 May 2012, 06:55; edited 1 time in total
Post 03 May 2012, 05:37
View user's profile Send private message Reply with quote
bzdashek



Joined: 15 Feb 2012
Posts: 147
Location: Tolstokvashino, Russia
bzdashek
Thanks for the source, Inagawa. It doesn't run on my Atom.

Did you consider using the QueryPerfomanceCounter API, like AsmGuru62 suggested in one of your topics? It also returns a QWORD (in form of LARGE_INTEGER structure), and you don't have to write such a long macros.
Post 03 May 2012, 06:18
View user's profile Send private message Reply with quote
Inagawa



Joined: 24 Mar 2012
Posts: 153
Inagawa
I am planning on updating this macro to automatically use QPC on processors that do not have RDTSCP in their arsenal. I have to figure out the source of weird results I've been getting from it, though.

You should be able to run it on Intel Atom by deleting the p on the RDTSCP instructions.
Post 03 May 2012, 06:38
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17477
Location: In your JS exploiting you and your system
revolution
Inagawa wrote:
You should be able to run it on Intel Atom by deleting the p on the RDTSCP instructions.
Do you know the difference between RDTSC and RDTSCP? I still say you are not using RDTSCP, you are only using the RDTSC portion of it and wasting the P portion.
Post 03 May 2012, 06:59
View user's profile Send private message Visit poster's website Reply with quote
Inagawa



Joined: 24 Mar 2012
Posts: 153
Inagawa
But I will gladly listen to anything you can teach me! Simply saying I am using it wrong won't really help me. I barely understand the Intel Software Developer's Manual, I only got out of it something about RDTSCP calling the CPUID before the ReadTimeStamp, but I still have to call it after to force the execution to be serial, no?

If you have a way to improve my code and my understanding, I'm more than willing to listen and learn

I have tried to understand the Intel's code to the best of my abilities.

"The solution to the problem presented in Section 0 is to add a CPUID instruction
just after the RDTPSCP and the two mov instructions (to store in memory the
value of edx and eax). The implementation is as follows:"

Code:
asm volatile ("CPUID\n\t"
"RDTSC\n\t"
"mov %%edx, %0\n\t"
"mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low)::
"%rax", "%rbx", "%rcx", "%rdx");
/***********************************/
/*call the function to measure here*/
/***********************************/
asm volatile("RDTSCP\n\t"
"mov %%edx, %0\n\t"
"mov %%eax, %1\n\t"
"CPUID\n\t": "=r" (cycles_high1), "=r" (cycles_low1)::
"%rax", "%rbx", "%rcx", "%rdx");
    

"In the code above, the first CPUID call implements a barrier to avoid out-of-order
execution of the instructions above and below the RDTSC instruction.
Nevertheless, this call does not affect the measurement since it comes before the
RDTSC (i.e., before the timestamp register is read)."


I deciphered the ugly mess of a code, read the text and then tried to apply that.

So - could you please help me improve the code? I'd be grateful.
Post 03 May 2012, 07:11
View user's profile Send private message Reply with quote
bzdashek



Joined: 15 Feb 2012
Posts: 147
Location: Tolstokvashino, Russia
bzdashek
Inagawa, Intel has updated their CPUID manual, take a look at this:
http://www.intel.com/Assets/PDF/appnote/241618.pdf

In the appendix part they have examples, which are very useful.
Post 03 May 2012, 08:47
View user's profile Send private message Reply with quote
Picnic



Joined: 05 May 2007
Posts: 1288
Location: behind the arc
Picnic
Pentium E6800 3.33GHz not supported.
Post 03 May 2012, 13:43
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17477
Location: In your JS exploiting you and your system
revolution
Inagawa wrote:
So - could you please help me improve the code? I'd be grateful.
If you forgo using the rare 'P' version of RDTSC then you would get considerably better compatibility with different CPUs.
Post 03 May 2012, 13:54
View user's profile Send private message Visit poster's website Reply with quote
Inagawa



Joined: 24 Mar 2012
Posts: 153
Inagawa
You're probably right on this one. I have modified the code to check for CPUID, and then check for RDTSC.
I'll post the finished code in a new topic in Macroinstructions.

bzdashek: Thanks, I had a look into it, definitely useful for CPUID related stuff
Post 03 May 2012, 14:32
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.