flat assembler
Message board for the users of flat assembler.

Index > Main > No code profiler for FASM people. We are so poor!

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
system error



Joined: 01 Sep 2013
Posts: 670
system error 05 Dec 2014, 12:44
I've yet to see anyone come up with a basic code profiling utility for FASM, written in FASM. MASM people been enjoying theirs for so long. Even Agner Fog provided one for them. I know things like multithreading would make profiling obsolete, but at least we should have a basic utility like this.

There is one provided by AMD for linux but it takes a PhD to use it and I don't have a PhD.

Well???
Post 05 Dec 2014, 12:44
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20445
Location: In your JS exploiting you and your system
revolution 05 Dec 2014, 12:53
I've written a few macros to inject code into procedures at compile time. The code it mostly just simple RDTSC instructions and appropriate add/sub/store arithmetic for counters and accumulators. I'm sure many others here have done the same. It is no big thing but, as you say, they are hard to use and set up properly.

Instrumenting code changes its behaviour so the user must be aware of what they are doing else the results can be meaningless. It is also application specific as to how the measurements should be made. Each application will have its own requirements and quirks that need programmer attention.

Executive summary: Profiling is not just simple plug-and-play thing. You've got to know what you are doing.
Post 05 Dec 2014, 12:53
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 05 Dec 2014, 12:57
Or probably someone could just teach me the easy way to use AMD's CodeXL to measure my code performance, in assembly not in C. I don't understand most of the terms used by CodeXL. Its an overkill. I just need some simple time lapse test from point A to point B of my code. Nothing fancy. I could use RDTSC at both ends but I think its more complicated than that.
Post 05 Dec 2014, 12:57
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 05 Dec 2014, 13:00
revolution wrote:
I've written a few macros to inject code into procedures at compile time. The code it mostly just simple RDTSC instructions and appropriate add/sub/store arithmetic for counters and accumulators. I'm sure many others here have done the same. It is no big thing but, as you say, they are hard to use and set up properly.

Instrumenting code changes its behaviour so the user must be aware of what they are doing else the results can be meaningless. It is also application specific as to how the measurements should be made. Each application will have its own requirements and quirks that need programmer attention.

Executive summary: Profiling is not just simple plug-and-play thing. You've got to know what you are doing.
Where can I get that? Do you know how to use codeXL?
Post 05 Dec 2014, 13:00
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 05 Dec 2014, 13:05
It would be nice and fun to have this, even a basic one should do so that we can bitch-slapping each other over whose codes are the fastest or the shortest.
Post 05 Dec 2014, 13:05
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20445
Location: In your JS exploiting you and your system
revolution 05 Dec 2014, 13:14
system error wrote:
Where can I get that [profiler]?
If you are one of our customers then you already have it. Otherwise I have nothing that is even remotely usuable for arbitrary code.
system error wrote:
Do you know how to use codeXL?
No.
system error wrote:
I could use RDTSC at both ends but I think its more complicated than that.
It is a lot more complicated than that. So many things can go wrong. And it all depends upon what you are timing and how it functions.
system error wrote:
It would be nice and fun to have this, even a basic one should do so that we can bitch-slapping each other over whose codes are the fastest or the shortest.
Speed of execution is not a constant. Different systems do things in different ways, some things are faster and other things are slower, so comparisons are mostly pointless for people with disparate systems.
Post 05 Dec 2014, 13:14
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8358
Location: Kraków, Poland
Tomasz Grysztar 05 Dec 2014, 13:21
For probably the least intrusive measurements the statistical profiling is a nice concept to play with. Though I have only ever used it in DOS environment.
Post 05 Dec 2014, 13:21
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 05 Dec 2014, 13:25
revolution wrote:
Speed of execution is not a constant. Different systems do things in different ways, some things are faster and other things are slower, so comparisons are mostly pointless for people with disparate systems.


But could be useful as per code comparison, like seeing the effect of unrolling my loops things like that. Without it, I am completely clueless on code performance. Nothing. Nada. I just write code for the sake of completion.
Post 05 Dec 2014, 13:25
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 05 Dec 2014, 13:28
Tomasz Grysztar wrote:
For probably the least intrusive measurements the statistical profiling is a nice concept to play with. Though I have only ever used it in DOS environment.
That's a good start for me. I will play with it tonight. Thanks.
Post 05 Dec 2014, 13:28
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20445
Location: In your JS exploiting you and your system
revolution 05 Dec 2014, 13:34
system error wrote:
But could be useful as per code comparison, like seeing the effect of unrolling my loops things like that. Without it, I am completely clueless on code performance. Nothing. Nada. I just write code for the sake of completion.
Sure, but for "bitch-slapping" it would be useless. But you should also be aware of the pitfalls that a particular method will have.

I actually think that a custom written profiler for each app is really the only sensible and most reliable method. Some form of universal profiler is not gong to make the grade if one is really serious about performance. But if you are just dabbling for "bitch-slapping" reasons then I guess any profiler will do, even if it is wrong.
Post 05 Dec 2014, 13:34
View user's profile Send private message Visit poster's website Reply with quote
gens



Joined: 18 Feb 2013
Posts: 161
gens 05 Dec 2014, 14:07
sampling profiling

codexl does it
for linux there is the perf framework
https://perf.wiki.kernel.org/index.php/Main_Page
program has to have debug symbols
ofc, you can diy it http://en.wikipedia.org/wiki/Hardware_performance_counter
guess that is what Tomasz did

rdtsc is fine but you should account for it changing the alignment
so rdtsc stuff - align 16 - loop measured
intel cpu's would probably slow down on crossing the page border, so aligning to page length
and ofc make the loop measured run for long enough to increase the precision of the measurement
Post 05 Dec 2014, 14:07
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20445
Location: In your JS exploiting you and your system
revolution 05 Dec 2014, 14:18
RDTSC has more problem than just that. Thread migration and interruption begin the most damning events that will make the readings meaningless.
Post 05 Dec 2014, 14:18
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 05 Dec 2014, 14:23
gens wrote:
sampling profiling

codexl does it
for linux there is the perf framework
https://perf.wiki.kernel.org/index.php/Main_Page
program has to have debug symbols
ofc, you can diy it http://en.wikipedia.org/wiki/Hardware_performance_counter
guess that is what Tomasz did

rdtsc is fine but you should account for it changing the alignment
so rdtsc stuff - align 16 - loop measured
intel cpu's would probably slow down on crossing the page border, so aligning to page length
and ofc make the loop measured run for long enough to increase the precision of the measurement
Ok gens. Thanks for stopping by. I just installed perf as you suggested and i tested it against one of my executable. Here's the result;

Code:
perf stat -B ./low64
0.00324449999999999

 Performance counter stats for './low64':

          0.151585 task-clock (msec)         #    0.231 CPUs utilized          
                 2 context-switches          #    0.013 M/sec                  
                 0 cpu-migrations            #    0.000 K/sec                  
                 3 page-faults               #    0.020 M/sec                  
           196,858 cycles                    #    1.299 GHz                    
            79,876 stalled-cycles-frontend   #   40.58% frontend cycles idle   
     <not counted> stalled-cycles-backend  
     <not counted> instructions            
     <not counted> branches                
     <not counted> branch-misses           

       0.000655774 seconds time elapsed    


It's a statistical program I made sometime ago. How does the result look like to you? Bad or bad?
Post 05 Dec 2014, 14:23
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20445
Location: In your JS exploiting you and your system
revolution 05 Dec 2014, 14:26
It looks very bad. You are not running it long enough to gather enough long term data.

Usually profilers are run on long running tasks that need to run faster to save money or something. For a program that runs within a single tick of 16ms you won't see anything meaningful.
Post 05 Dec 2014, 14:26
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 05 Dec 2014, 14:29
revolution wrote:
system error wrote:
But could be useful as per code comparison, like seeing the effect of unrolling my loops things like that. Without it, I am completely clueless on code performance. Nothing. Nada. I just write code for the sake of completion.
Sure, but for "bitch-slapping" it would be useless. But you should also be aware of the pitfalls that a particular method will have.

I actually think that a custom written profiler for each app is really the only sensible and most reliable method. Some form of universal profiler is not gong to make the grade if one is really serious about performance. But if you are just dabbling for "bitch-slapping" reasons then I guess any profiler will do, even if it is wrong.


revo, we can't just skip code performance measures just because it is not easy to implement. It is not healthy for FASM programmers.
Post 05 Dec 2014, 14:29
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 05 Dec 2014, 14:33
revolution wrote:
It looks very bad.


HAHAHA! I know its bad, but I don't know how bad. I can't tell which one is which from the generated information. But its a large program though. Compiles to 930 bytes in size. Math-intensive.
Post 05 Dec 2014, 14:33
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20445
Location: In your JS exploiting you and your system
revolution 05 Dec 2014, 14:38
system error wrote:
revo, we can't just skip code performance measures just because it is not easy to implement. It is not healthy for FASM programmers.
Maybe you misunderstand me. I was suggesting taking the hard route, not the easy route. The easy route might work, might. But to be really sure we need to go deeper and properly do things to make sure we are really getting the right result. I am pleased that someone cares enough to start profiling but please make sure you know you are using the right tool for your situation, else I would be unhappy to learn that you got bad results and didn't realise they were bad.
Post 05 Dec 2014, 14:38
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 05 Dec 2014, 14:59
revolution wrote:
system error wrote:
revo, we can't just skip code performance measures just because it is not easy to implement. It is not healthy for FASM programmers.
Maybe you misunderstand me. I was suggesting taking the hard route, not the easy route. The easy route might work, might. But to be really sure we need to go deeper and properly do things to make sure we are really getting the right result. I am pleased that someone cares enough to start profiling but please make sure you know you are using the right tool for your situation, else I would be unhappy to learn that you got bad results and didn't realise they were bad.


I know what you mean revo. But hard route needs tools. At least a basic tool. That's why I am asking because I haven't seen one. I've seen many in MASM circle and they are making good use of it. In the end what you'll see is FASM write-to-completion coders vs MASM performance-aware coders. It is not healthy and should not make it into a habit.
Post 05 Dec 2014, 14:59
View user's profile Send private message Reply with quote
gens



Joined: 18 Feb 2013
Posts: 161
gens 05 Dec 2014, 15:23
run your loop 100 000 times then divide the run length by 100 000

sampling profilers work by reading the instruction pointer every once in a while, amongst other things
so the loop HAS to run for a while if you want usable results


tools like perf, codexl and such can show you where the program spends it's time, the whole program
and only if it runs for a longer time
for a loop you would be better to measure it yourself
you have to know about things like alignment, instruction and data caches and so on or you can get misleading results

oh ye, and put your cpu to performance mode
forgot to say, also symbols under linux
Post 05 Dec 2014, 15:23
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1670
Location: Toronto, Canada
AsmGuru62 05 Dec 2014, 15:53
I am writing the code generation utility for FASM (and its been long time).
It will have the profiling ability beside the OOP code generation.
But there will be no CPU clocks - just the counters for every function:

1. The # of times called
2. The total amount of time spent inside (using QueryPerformanceCounter API)

I am aiming this for large projects, so you can make the code with probes
and then get the statistics. Then switch it off with a checkbox and build the
released version.
Post 05 Dec 2014, 15:53
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.