flat assembler
Message board for the users of flat assembler.

 Index > Projects and Ideas > Statistical code profiling
Author
Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8039
Location: Kraków, Poland
Tomasz Grysztar 05 Sep 2010, 17:23
Here's my proof-of-concept of very simple statistical code profiler for 16-bit DOS. I came up with this idea as I wanted some easy and non-intrusive way of finding out what parts of the program are the slowest ones (and of course I found out on Wikipedia that this method is known and used already; and I was not surprised, because it is so plain and simple).

The method is to handle the clock interrupt and store the interrupt's return address each time it is called. Statistically we should get more frequently the addresses from the parts of code, where the program spend the most of its time, so if we gather high enough quantity of data and draw the points on the map of our program, the areas with high density of interrupt occurrences would be the ones, where program most likely spends more time than in the others. And because clock interrupts are not so frequently happening, the execution of program is almost undisturbed, so it proceeds just like it would proceed in normal environment.

So, here's the code of this incredibly simple profiler (I save it as STATPROF.INC):
Code:
```PROFILER_BUFFER_SIZE = 1024

start_profiler: ; installs profiler and starts gathering data
mov     [cs:profiler_data_count],0
push    ds 0
pop     ds
cli
mov     ax,irq_0
xchg    ax,[ds:8*4]
mov     word [cs:original_irq_0],ax
mov     ax,cs
xchg    ax,[ds:8*4+2]
mov     word [cs:original_irq_0+2],ax
sti
pop     ds
retf

stop_profiler:  ; stops profiler, returns pointer to gathered data in DS:DX,
; and the count of data in CX
push    0
pop     ds
cli
mov     ax,word [cs:original_irq_0]
mov     [ds:8*4],ax
mov     ax,word [cs:original_irq_0+2]
mov     [ds:8*4+2],ax
sti
push    cs
pop     ds
mov     dx,profiler_buffer
mov     cx,[ds:profiler_data_count]
retf

irq_0:
push    ax bx bp
mov     bx,[cs:profiler_data_count]
cmp     bx,PROFILER_BUFFER_SIZE
jae     pointer_stored
shl     bx,2
inc     [cs:profiler_data_count]
mov     bp,sp
mov     ax,[ss:bp+3*2]
mov     word [cs:profiler_buffer+bx],ax
mov     ax,[ss:bp+3*2+2]
mov     word [cs:profiler_buffer+bx+2],ax
pointer_stored:
pop     bp bx ax
jmp     far [cs:original_irq_0]

original_irq_0 dd ?

profiler_data_count dw ?
profiler_buffer dd PROFILER_BUFFER_SIZE dup (?)    ```

And here the sample test program. After it starts the profiler, it run some fake code in loop until a key is pressed (a few seconds should be enough to gather some interesting data in this case), and the it stops profiling and save all gathered pointers to STATPROF.DAT file. If you make a listing of this test program, you can then look up the addresses that appears the most frequently in that data - in my tests it is the address of "rep stosb" instruction and the address just after the "mul".
Code:
```format MZ
entry main:start

segment main

start:

call    profiler:start_profiler

main_loop:
mov     ah,1                    ; exit on any key
int     16h
jnz     stop

push    cs                      ; this is just some garbage code
pop     es                      ;  solely for testing purposes
mov     di,test_buffer
mov     cx,100h
mov     al,11h
rep     stosb
mov     cx,40
@@:
mov     ax,120
mov     bx,12
mul     bx
loop    @b

jmp     main_loop               ; keep doing it so data may get gathered

stop:

call    profiler:stop_profiler

shl     cx,2
push    cx dx ds
push    cs
pop     ds
mov     ah,3Ch
xor     cx,cx
mov     dx,output_file
int     21h
jc      exit
mov     bx,ax
pop     ds dx cx
mov     ah,40h
int     21h
mov     ah,3Eh
int     21h

exit:
mov     ax,4C00h
int     21h

test_buffer db 100h dup ?

output_file db 'statprof.dat',0

segment profiler

include 'statprof.inc'    ```

Do you have an idea whether it would be possible/easy to implement this idea for other operating systems?
05 Sep 2010, 17:23

Joined: 24 Dec 2008
Posts: 1081
Location: Belgium
ouadji 05 Sep 2010, 18:15

No problem to implement this idea on Windows (XP).
I have a driver that hooks all interrupts (256)
(windows xp pro, Q6600, 4xIDTs).
it would be simply necessary to add some lines of code and configure it.
There are no particular problems to do that.

_________________
I am not young enough to know everything (Oscar Wilde)-
05 Sep 2010, 18:15
LocoDelAssembly
Your code has a bug

Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 05 Sep 2010, 23:52
A thread could be injected into the process with some timer loop in the threadproc that periodically interrupts some thread under study to take the context structure? On non SMP systems this may even not hurt performance a lot. In multi-core systems you may be disturbing the thread under study by terminating the just granted CPU access, but if constraining the thread to a single core/thread is not a problem then both this thread and the one that takes the measurements could be set to an affinity that makes them always execute under the same core/thread..
05 Sep 2010, 23:52
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 18944
Location: In your JS exploiting you and your system
revolution 06 Sep 2010, 05:31
One thing to consider with statistical profilers is that they won't tell you why your code spends a lot of time in a particular place. It can still be very tricky to determine just what the cause is. And if one does not understand the cause then one ends up with the same problems elsewhere. And making any specific improvement might only work on their test machine environment and nowhere else.
06 Sep 2010, 05:31
ass0

Joined: 31 Dec 2008
Posts: 518
Location: ( . Y . )
ass0 06 Sep 2010, 06:34
example?

_________________

Nombre: Aquiles Castro.
06 Sep 2010, 06:34
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 18944
Location: In your JS exploiting you and your system
revolution 06 Sep 2010, 06:38
Cache thrashing. And not just L1-2-3 cache, all the other caches also. Very tricky to analyse.
06 Sep 2010, 06:38
Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8039
Location: Kraków, Poland
Tomasz Grysztar 06 Sep 2010, 07:33
revolution wrote:
One thing to consider with statistical profilers is that they won't tell you why your code spends a lot of time in a particular place. It can still be very tricky to determine just what the cause is. And if one does not understand the cause then one ends up with the same problems elsewhere. And making any specific improvement might only work on their test machine environment and nowhere else.

Yeah, that is the point. The profiling as I know it is all about finding out what parts execute slowly, while leaving the "why" part to answer on your own.
Still, I more often see a problems with bad algorithm than with a things like cache. Cache problems kick in after you've already quite good.
Well, the reasons may be numerous - so each time you realize one, you get invaluable experience.
06 Sep 2010, 07:33

Joined: 24 Dec 2008
Posts: 1081
Location: Belgium
ouadji 06 Sep 2010, 08:06
Quote:

Still, I more often see a problems with bad algorithm than with a things like cache.
(Cache problems kick in after you've already quite good.)
200% agree with Tomasz.
The main thing is of knowing where that occurs.
the "why" ... is a matter of intelligence, personal experience, and intrinsic knowledge of the processor.

_________________
I am not young enough to know everything (Oscar Wilde)-
06 Sep 2010, 08:06
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 18944
Location: In your JS exploiting you and your system
revolution 06 Sep 2010, 08:23
the "why" ... is a matter of intelligence, personal experience, and intrinsic knowledge of the processor.
Which is why computers will never take over the world. Phew! I'm glad about that.
06 Sep 2010, 08:23
Sean4CC

Joined: 15 Apr 2015
Posts: 14
Sean4CC 16 Apr 2015, 01:18
There are ideas in AI from the 1970's that were never fully investigated. in 1979 a Z80 cpu was \$35, "hey that's cheap man, I can buy one". Then you go to buy 64 kbyte of ram for it, "dude, that'll be \$625".
Even a top university would have problems providing 6 mbyte of ram. Today processor speeds are 1000 times faster and you get about 10,000 times the memory. For that sort of 70's AI you can multiply the 2 together. You can do 10 million times better. My personal view is that you can do human level AI in one PC case. Humans are not the center of the universe, the earth moves around the sun and not vice versa. It is quite simply a coding issue and there is a ton of alchemy out there, and alchemists to boot confusing the matter. Does it matter so much if there was human level AI? Perhaps not, science is a slow incremental crawl where ideas have to be tried out in the real physical world to see if they are correct or not. You can't do science just by reasoning alone. You could find smart things that can answer many questions but which cannot do science any faster than humans can. Hey, google can answer your questions but it couldn't boil an egg (yet). I think it is one reason google are getting into robotics. Another is having a private army while saying they are only a contractor.
16 Apr 2015, 01:18
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum