flat assembler
Message board for the users of flat assembler.
Index
> Projects and Ideas > Statistical code profiling |
Author |
|
Tomasz Grysztar 05 Sep 2010, 17:23
Here's my proof-of-concept of very simple statistical code profiler for 16-bit DOS. I came up with this idea as I wanted some easy and non-intrusive way of finding out what parts of the program are the slowest ones (and of course I found out on Wikipedia that this method is known and used already; and I was not surprised, because it is so plain and simple).
The method is to handle the clock interrupt and store the interrupt's return address each time it is called. Statistically we should get more frequently the addresses from the parts of code, where the program spend the most of its time, so if we gather high enough quantity of data and draw the points on the map of our program, the areas with high density of interrupt occurrences would be the ones, where program most likely spends more time than in the others. And because clock interrupts are not so frequently happening, the execution of program is almost undisturbed, so it proceeds just like it would proceed in normal environment. So, here's the code of this incredibly simple profiler (I save it as STATPROF.INC): Code: PROFILER_BUFFER_SIZE = 1024 start_profiler: ; installs profiler and starts gathering data mov [cs:profiler_data_count],0 push ds 0 pop ds cli mov ax,irq_0 xchg ax,[ds:8*4] mov word [cs:original_irq_0],ax mov ax,cs xchg ax,[ds:8*4+2] mov word [cs:original_irq_0+2],ax sti pop ds retf stop_profiler: ; stops profiler, returns pointer to gathered data in DS:DX, ; and the count of data in CX push 0 pop ds cli mov ax,word [cs:original_irq_0] mov [ds:8*4],ax mov ax,word [cs:original_irq_0+2] mov [ds:8*4+2],ax sti push cs pop ds mov dx,profiler_buffer mov cx,[ds:profiler_data_count] retf irq_0: push ax bx bp mov bx,[cs:profiler_data_count] cmp bx,PROFILER_BUFFER_SIZE jae pointer_stored shl bx,2 inc [cs:profiler_data_count] mov bp,sp mov ax,[ss:bp+3*2] mov word [cs:profiler_buffer+bx],ax mov ax,[ss:bp+3*2+2] mov word [cs:profiler_buffer+bx+2],ax pointer_stored: pop bp bx ax jmp far [cs:original_irq_0] original_irq_0 dd ? profiler_data_count dw ? profiler_buffer dd PROFILER_BUFFER_SIZE dup (?) And here the sample test program. After it starts the profiler, it run some fake code in loop until a key is pressed (a few seconds should be enough to gather some interesting data in this case), and the it stops profiling and save all gathered pointers to STATPROF.DAT file. If you make a listing of this test program, you can then look up the addresses that appears the most frequently in that data - in my tests it is the address of "rep stosb" instruction and the address just after the "mul". Code: format MZ entry main:start segment main start: call profiler:start_profiler main_loop: mov ah,1 ; exit on any key int 16h jnz stop push cs ; this is just some garbage code pop es ; solely for testing purposes mov di,test_buffer mov cx,100h mov al,11h rep stosb mov cx,40 @@: mov ax,120 mov bx,12 mul bx loop @b jmp main_loop ; keep doing it so data may get gathered stop: call profiler:stop_profiler shl cx,2 push cx dx ds push cs pop ds mov ah,3Ch xor cx,cx mov dx,output_file int 21h jc exit mov bx,ax pop ds dx cx mov ah,40h int 21h mov ah,3Eh int 21h exit: mov ax,4C00h int 21h test_buffer db 100h dup ? output_file db 'statprof.dat',0 segment profiler include 'statprof.inc' Do you have an idea whether it would be possible/easy to implement this idea for other operating systems? |
|||
05 Sep 2010, 17:23 |
|
LocoDelAssembly 05 Sep 2010, 23:52
A thread could be injected into the process with some timer loop in the threadproc that periodically interrupts some thread under study to take the context structure? On non SMP systems this may even not hurt performance a lot. In multi-core systems you may be disturbing the thread under study by terminating the just granted CPU access, but if constraining the thread to a single core/thread is not a problem then both this thread and the one that takes the measurements could be set to an affinity that makes them always execute under the same core/thread..
|
|||
05 Sep 2010, 23:52 |
|
revolution 06 Sep 2010, 05:31
One thing to consider with statistical profilers is that they won't tell you why your code spends a lot of time in a particular place. It can still be very tricky to determine just what the cause is. And if one does not understand the cause then one ends up with the same problems elsewhere. And making any specific improvement might only work on their test machine environment and nowhere else.
|
|||
06 Sep 2010, 05:31 |
|
ass0 06 Sep 2010, 06:34
example?
_________________ Nombre: Aquiles Castro. Location2: about:robots |
|||
06 Sep 2010, 06:34 |
|
revolution 06 Sep 2010, 06:38
Cache thrashing. And not just L1-2-3 cache, all the other caches also. Very tricky to analyse.
|
|||
06 Sep 2010, 06:38 |
|
Tomasz Grysztar 06 Sep 2010, 07:33
revolution wrote: One thing to consider with statistical profilers is that they won't tell you why your code spends a lot of time in a particular place. It can still be very tricky to determine just what the cause is. And if one does not understand the cause then one ends up with the same problems elsewhere. And making any specific improvement might only work on their test machine environment and nowhere else. Yeah, that is the point. The profiling as I know it is all about finding out what parts execute slowly, while leaving the "why" part to answer on your own. Still, I more often see a problems with bad algorithm than with a things like cache. Cache problems kick in after you've already quite good. Well, the reasons may be numerous - so each time you realize one, you get invaluable experience. |
|||
06 Sep 2010, 07:33 |
|
ouadji 06 Sep 2010, 08:06
Quote:
The main thing is of knowing where that occurs. the "why" ... is a matter of intelligence, personal experience, and intrinsic knowledge of the processor. |
|||
06 Sep 2010, 08:06 |
|
revolution 06 Sep 2010, 08:23
ouadji wrote: the "why" ... is a matter of intelligence, personal experience, and intrinsic knowledge of the processor. |
|||
06 Sep 2010, 08:23 |
|
Sean4CC 16 Apr 2015, 01:18
There are ideas in AI from the 1970's that were never fully investigated. in 1979 a Z80 cpu was $35, "hey that's cheap man, I can buy one". Then you go to buy 64 kbyte of ram for it, "dude, that'll be $625".
Even a top university would have problems providing 6 mbyte of ram. Today processor speeds are 1000 times faster and you get about 10,000 times the memory. For that sort of 70's AI you can multiply the 2 together. You can do 10 million times better. My personal view is that you can do human level AI in one PC case. Humans are not the center of the universe, the earth moves around the sun and not vice versa. It is quite simply a coding issue and there is a ton of alchemy out there, and alchemists to boot confusing the matter. Does it matter so much if there was human level AI? Perhaps not, science is a slow incremental crawl where ideas have to be tried out in the real physical world to see if they are correct or not. You can't do science just by reasoning alone. You could find smart things that can answer many questions but which cannot do science any faster than humans can. Hey, google can answer your questions but it couldn't boil an egg (yet). I think it is one reason google are getting into robotics. Another is having a private army while saying they are only a contractor. |
|||
16 Apr 2015, 01:18 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.