flat assembler
Message board for the users of flat assembler.

Index > Projects and Ideas > Statistical code profiling

Author
Thread Post new topic Reply to topic
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 05 Sep 2010, 17:23
Here's my proof-of-concept of very simple statistical code profiler for 16-bit DOS. I came up with this idea as I wanted some easy and non-intrusive way of finding out what parts of the program are the slowest ones (and of course I found out on Wikipedia that this method is known and used already; and I was not surprised, because it is so plain and simple).

The method is to handle the clock interrupt and store the interrupt's return address each time it is called. Statistically we should get more frequently the addresses from the parts of code, where the program spend the most of its time, so if we gather high enough quantity of data and draw the points on the map of our program, the areas with high density of interrupt occurrences would be the ones, where program most likely spends more time than in the others. And because clock interrupts are not so frequently happening, the execution of program is almost undisturbed, so it proceeds just like it would proceed in normal environment.

So, here's the code of this incredibly simple profiler (I save it as STATPROF.INC):
Code:
PROFILER_BUFFER_SIZE = 1024

start_profiler: ; installs profiler and starts gathering data
        mov     [cs:profiler_data_count],0
        push    ds 0
        pop     ds
        cli
        mov     ax,irq_0
        xchg    ax,[ds:8*4]
        mov     word [cs:original_irq_0],ax
        mov     ax,cs
        xchg    ax,[ds:8*4+2]
        mov     word [cs:original_irq_0+2],ax
        sti
        pop     ds
        retf

stop_profiler:  ; stops profiler, returns pointer to gathered data in DS:DX,
                ; and the count of data in CX
        push    0
        pop     ds
        cli
        mov     ax,word [cs:original_irq_0]
        mov     [ds:8*4],ax
        mov     ax,word [cs:original_irq_0+2]
        mov     [ds:8*4+2],ax
        sti
        push    cs
        pop     ds
        mov     dx,profiler_buffer
        mov     cx,[ds:profiler_data_count]
        retf

irq_0:
        push    ax bx bp
        mov     bx,[cs:profiler_data_count]
        cmp     bx,PROFILER_BUFFER_SIZE
        jae     pointer_stored
        shl     bx,2
        inc     [cs:profiler_data_count]
        mov     bp,sp
        mov     ax,[ss:bp+3*2]
        mov     word [cs:profiler_buffer+bx],ax
        mov     ax,[ss:bp+3*2+2]
        mov     word [cs:profiler_buffer+bx+2],ax
      pointer_stored:
        pop     bp bx ax
        jmp     far [cs:original_irq_0]

original_irq_0 dd ?

profiler_data_count dw ?
profiler_buffer dd PROFILER_BUFFER_SIZE dup (?)    


And here the sample test program. After it starts the profiler, it run some fake code in loop until a key is pressed (a few seconds should be enough to gather some interesting data in this case), and the it stops profiling and save all gathered pointers to STATPROF.DAT file. If you make a listing of this test program, you can then look up the addresses that appears the most frequently in that data - in my tests it is the address of "rep stosb" instruction and the address just after the "mul".
Code:
format MZ
entry main:start

segment main

  start:

        call    profiler:start_profiler

  main_loop:
        mov     ah,1                    ; exit on any key
        int     16h
        jnz     stop


        push    cs                      ; this is just some garbage code
        pop     es                      ;  solely for testing purposes
        mov     di,test_buffer
        mov     cx,100h
        mov     al,11h
        rep     stosb
        mov     cx,40
      @@:
        mov     ax,120
        mov     bx,12
        mul     bx
        loop    @b


        jmp     main_loop               ; keep doing it so data may get gathered

  stop:

        call    profiler:stop_profiler

        shl     cx,2
        push    cx dx ds
        push    cs
        pop     ds
        mov     ah,3Ch
        xor     cx,cx
        mov     dx,output_file
        int     21h
        jc      exit
        mov     bx,ax
        pop     ds dx cx
        mov     ah,40h
        int     21h
        mov     ah,3Eh
        int     21h

  exit:
        mov     ax,4C00h
        int     21h

  test_buffer db 100h dup ?

  output_file db 'statprof.dat',0

segment profiler

  include 'statprof.inc'    


Do you have an idea whether it would be possible/easy to implement this idea for other operating systems?
Post 05 Sep 2010, 17:23
View user's profile Send private message Visit poster's website Reply with quote
ouadji



Joined: 24 Dec 2008
Posts: 1081
Location: Belgium
ouadji 05 Sep 2010, 18:15

No problem to implement this idea on Windows (XP).
I have a driver that hooks all interrupts (256)
(windows xp pro, Q6600, 4xIDTs).
it would be simply necessary to add some lines of code and configure it.
There are no particular problems to do that.

_________________
I am not young enough to know everything (Oscar Wilde)- Image
Post 05 Sep 2010, 18:15
View user's profile Send private message Send e-mail Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 05 Sep 2010, 23:52
A thread could be injected into the process with some timer loop in the threadproc that periodically interrupts some thread under study to take the context structure? On non SMP systems this may even not hurt performance a lot. In multi-core systems you may be disturbing the thread under study by terminating the just granted CPU access, but if constraining the thread to a single core/thread is not a problem then both this thread and the one that takes the measurements could be set to an affinity that makes them always execute under the same core/thread..
Post 05 Sep 2010, 23:52
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20454
Location: In your JS exploiting you and your system
revolution 06 Sep 2010, 05:31
One thing to consider with statistical profilers is that they won't tell you why your code spends a lot of time in a particular place. It can still be very tricky to determine just what the cause is. And if one does not understand the cause then one ends up with the same problems elsewhere. And making any specific improvement might only work on their test machine environment and nowhere else.
Post 06 Sep 2010, 05:31
View user's profile Send private message Visit poster's website Reply with quote
ass0



Joined: 31 Dec 2008
Posts: 518
Location: ( . Y . )
ass0 06 Sep 2010, 06:34
example?

_________________
Image
Nombre: Aquiles Castro.
Location2: about:robots
Post 06 Sep 2010, 06:34
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20454
Location: In your JS exploiting you and your system
revolution 06 Sep 2010, 06:38
Cache thrashing. And not just L1-2-3 cache, all the other caches also. Very tricky to analyse.
Post 06 Sep 2010, 06:38
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 06 Sep 2010, 07:33
revolution wrote:
One thing to consider with statistical profilers is that they won't tell you why your code spends a lot of time in a particular place. It can still be very tricky to determine just what the cause is. And if one does not understand the cause then one ends up with the same problems elsewhere. And making any specific improvement might only work on their test machine environment and nowhere else.

Yeah, that is the point. The profiling as I know it is all about finding out what parts execute slowly, while leaving the "why" part to answer on your own.
Still, I more often see a problems with bad algorithm than with a things like cache. Cache problems kick in after you've already quite good.
Well, the reasons may be numerous - so each time you realize one, you get invaluable experience.
Post 06 Sep 2010, 07:33
View user's profile Send private message Visit poster's website Reply with quote
ouadji



Joined: 24 Dec 2008
Posts: 1081
Location: Belgium
ouadji 06 Sep 2010, 08:06
Quote:

Still, I more often see a problems with bad algorithm than with a things like cache.
(Cache problems kick in after you've already quite good.)
200% agree with Tomasz.
The main thing is of knowing where that occurs.
the "why" ... is a matter of intelligence, personal experience, and intrinsic knowledge of the processor.

_________________
I am not young enough to know everything (Oscar Wilde)- Image
Post 06 Sep 2010, 08:06
View user's profile Send private message Send e-mail Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20454
Location: In your JS exploiting you and your system
revolution 06 Sep 2010, 08:23
ouadji wrote:
the "why" ... is a matter of intelligence, personal experience, and intrinsic knowledge of the processor.
Which is why computers will never take over the world. Phew! I'm glad about that.
Post 06 Sep 2010, 08:23
View user's profile Send private message Visit poster's website Reply with quote
Sean4CC



Joined: 15 Apr 2015
Posts: 14
Sean4CC 16 Apr 2015, 01:18
There are ideas in AI from the 1970's that were never fully investigated. in 1979 a Z80 cpu was $35, "hey that's cheap man, I can buy one". Then you go to buy 64 kbyte of ram for it, "dude, that'll be $625".
Even a top university would have problems providing 6 mbyte of ram. Today processor speeds are 1000 times faster and you get about 10,000 times the memory. For that sort of 70's AI you can multiply the 2 together. You can do 10 million times better. My personal view is that you can do human level AI in one PC case. Humans are not the center of the universe, the earth moves around the sun and not vice versa. It is quite simply a coding issue and there is a ton of alchemy out there, and alchemists to boot confusing the matter. Does it matter so much if there was human level AI? Perhaps not, science is a slow incremental crawl where ideas have to be tried out in the real physical world to see if they are correct or not. You can't do science just by reasoning alone. You could find smart things that can answer many questions but which cannot do science any faster than humans can. Hey, google can answer your questions but it couldn't boil an egg (yet). I think it is one reason google are getting into robotics. Another is having a private army while saying they are only a contractor.
Post 16 Apr 2015, 01:18
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.