flat assembler
Message board for the users of flat assembler.

flat assembler > Projects and Ideas > Statistical code profiling

Author
Thread Post new topic Reply to topic
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6867
Location: Kraków, Poland
Here's my proof-of-concept of very simple statistical code profiler for 16-bit DOS. I came up with this idea as I wanted some easy and non-intrusive way of finding out what parts of the program are the slowest ones (and of course I found out on Wikipedia that this method is known and used already; and I was not surprised, because it is so plain and simple).

The method is to handle the clock interrupt and store the interrupt's return address each time it is called. Statistically we should get more frequently the addresses from the parts of code, where the program spend the most of its time, so if we gather high enough quantity of data and draw the points on the map of our program, the areas with high density of interrupt occurrences would be the ones, where program most likely spends more time than in the others. And because clock interrupts are not so frequently happening, the execution of program is almost undisturbed, so it proceeds just like it would proceed in normal environment.

So, here's the code of this incredibly simple profiler (I save it as STATPROF.INC):
Code:
PROFILER_BUFFER_SIZE = 1024 start_profiler: ; installs profiler and starts gathering data mov [cs:profiler_data_count],0 push ds 0 pop ds cli mov ax,irq_0 xchg ax,[ds:8*4] mov word [cs:original_irq_0],ax mov ax,cs xchg ax,[ds:8*4+2] mov word [cs:original_irq_0+2],ax sti pop ds retf stop_profiler: ; stops profiler, returns pointer to gathered data in DS:DX, ; and the count of data in CX push 0 pop ds cli mov ax,word [cs:original_irq_0] mov [ds:8*4],ax mov ax,word [cs:original_irq_0+2] mov [ds:8*4+2],ax sti push cs pop ds mov dx,profiler_buffer mov cx,[ds:profiler_data_count] retf irq_0: push ax bx bp mov bx,[cs:profiler_data_count] cmp bx,PROFILER_BUFFER_SIZE jae pointer_stored shl bx,2 inc [cs:profiler_data_count] mov bp,sp mov ax,[ss:bp+3*2] mov word [cs:profiler_buffer+bx],ax mov ax,[ss:bp+3*2+2] mov word [cs:profiler_buffer+bx+2],ax pointer_stored: pop bp bx ax jmp far [cs:original_irq_0] original_irq_0 dd ? profiler_data_count dw ? profiler_buffer dd PROFILER_BUFFER_SIZE dup (?)


And here the sample test program. After it starts the profiler, it run some fake code in loop until a key is pressed (a few seconds should be enough to gather some interesting data in this case), and the it stops profiling and save all gathered pointers to STATPROF.DAT file. If you make a listing of this test program, you can then look up the addresses that appears the most frequently in that data - in my tests it is the address of "rep stosb" instruction and the address just after the "mul".
Code:
format MZ entry main:start segment main start: call profiler:start_profiler main_loop: mov ah,1 ; exit on any key int 16h jnz stop push cs ; this is just some garbage code pop es ; solely for testing purposes mov di,test_buffer mov cx,100h mov al,11h rep stosb mov cx,40 @@: mov ax,120 mov bx,12 mul bx loop @b jmp main_loop ; keep doing it so data may get gathered stop: call profiler:stop_profiler shl cx,2 push cx dx ds push cs pop ds mov ah,3Ch xor cx,cx mov dx,output_file int 21h jc exit mov bx,ax pop ds dx cx mov ah,40h int 21h mov ah,3Eh int 21h exit: mov ax,4C00h int 21h test_buffer db 100h dup ? output_file db 'statprof.dat',0 segment profiler include 'statprof.inc'


Do you have an idea whether it would be possible/easy to implement this idea for other operating systems?
Post 05 Sep 2010, 17:23
View user's profile Send private message Visit poster's website Reply with quote
ouadji



Joined: 24 Dec 2008
Posts: 1081
Location: Belgium

No problem to implement this idea on Windows (XP).
I have a driver that hooks all interrupts (256)
(windows xp pro, Q6600, 4xIDTs).
it would be simply necessary to add some lines of code and configure it.
There are no particular problems to do that.

_________________
I am not young enough to know everything (Oscar Wilde)- Image
Post 05 Sep 2010, 18:15
View user's profile Send private message Send e-mail Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4634
Location: Argentina
A thread could be injected into the process with some timer loop in the threadproc that periodically interrupts some thread under study to take the context structure? On non SMP systems this may even not hurt performance a lot. In multi-core systems you may be disturbing the thread under study by terminating the just granted CPU access, but if constraining the thread to a single core/thread is not a problem then both this thread and the one that takes the measurements could be set to an affinity that makes them always execute under the same core/thread..
Post 05 Sep 2010, 23:52
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15870
Location: 162173 Ryugu
One thing to consider with statistical profilers is that they won't tell you why your code spends a lot of time in a particular place. It can still be very tricky to determine just what the cause is. And if one does not understand the cause then one ends up with the same problems elsewhere. And making any specific improvement might only work on their test machine environment and nowhere else.
Post 06 Sep 2010, 05:31
View user's profile Send private message Visit poster's website Reply with quote
ass0



Joined: 31 Dec 2008
Posts: 521
Location: ( . Y . )
example?

_________________
Image
Nombre: Aquiles Castro.
Location2: about:robots
Post 06 Sep 2010, 06:34
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15870
Location: 162173 Ryugu
Cache thrashing. And not just L1-2-3 cache, all the other caches also. Very tricky to analyse.
Post 06 Sep 2010, 06:38
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6867
Location: Kraków, Poland
revolution wrote:
One thing to consider with statistical profilers is that they won't tell you why your code spends a lot of time in a particular place. It can still be very tricky to determine just what the cause is. And if one does not understand the cause then one ends up with the same problems elsewhere. And making any specific improvement might only work on their test machine environment and nowhere else.

Yeah, that is the point. The profiling as I know it is all about finding out what parts execute slowly, while leaving the "why" part to answer on your own.
Still, I more often see a problems with bad algorithm than with a things like cache. Cache problems kick in after you've already quite good.
Well, the reasons may be numerous - so each time you realize one, you get invaluable experience.
Post 06 Sep 2010, 07:33
View user's profile Send private message Visit poster's website Reply with quote
ouadji



Joined: 24 Dec 2008
Posts: 1081
Location: Belgium
Quote:

Still, I more often see a problems with bad algorithm than with a things like cache.
(Cache problems kick in after you've already quite good.)
200% agree with Tomasz.
The main thing is of knowing where that occurs.
the "why" ... is a matter of intelligence, personal experience, and intrinsic knowledge of the processor.

_________________
I am not young enough to know everything (Oscar Wilde)- Image
Post 06 Sep 2010, 08:06
View user's profile Send private message Send e-mail Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15870
Location: 162173 Ryugu
ouadji wrote:
the "why" ... is a matter of intelligence, personal experience, and intrinsic knowledge of the processor.
Which is why computers will never take over the world. Phew! I'm glad about that.
Post 06 Sep 2010, 08:23
View user's profile Send private message Visit poster's website Reply with quote
Sean4CC



Joined: 15 Apr 2015
Posts: 14
There are ideas in AI from the 1970's that were never fully investigated. in 1979 a Z80 cpu was $35, "hey that's cheap man, I can buy one". Then you go to buy 64 kbyte of ram for it, "dude, that'll be $625".
Even a top university would have problems providing 6 mbyte of ram. Today processor speeds are 1000 times faster and you get about 10,000 times the memory. For that sort of 70's AI you can multiply the 2 together. You can do 10 million times better. My personal view is that you can do human level AI in one PC case. Humans are not the center of the universe, the earth moves around the sun and not vice versa. It is quite simply a coding issue and there is a ton of alchemy out there, and alchemists to boot confusing the matter. Does it matter so much if there was human level AI? Perhaps not, science is a slow incremental crawl where ideas have to be tried out in the real physical world to see if they are correct or not. You can't do science just by reasoning alone. You could find smart things that can answer many questions but which cannot do science any faster than humans can. Hey, google can answer your questions but it couldn't boil an egg (yet). I think it is one reason google are getting into robotics. Another is having a private army while saying they are only a contractor.
Post 16 Apr 2015, 01:18
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >

Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 2004-2018, Tomasz Grysztar.

Powered by rwasa.