flat assembler
Message board for the users of flat assembler.

Index > Main > alpha testing - speed improvements

Goto page Previous  1, 2, 3, 4  Next
Author
Thread Post new topic Reply to topic
scientica
Retired moderator


Joined: 16 Jun 2003
Posts: 689
Location: Linköping, Sweden
scientica 23 Jan 2004, 20:46
W00t?!?!?! Is there something wrong or is just fasm insanely fast all of a sudden??:
Code:
[frekla@ns1 fasm_tmp]$ ./fasm501a3 dwordGEN.ASM dwordGEN.501a3OUT
flat assembler  version 1.51 alpha 3
2 passes, 19.2 seconds, 4744967 bytes.
[frekla@ns1 fasm_tmp]$ ll dwordGEN.501a3OUT 
-rw-r-----    1 frekla   frekla    4744967 23 jan 21.38 dwordGEN.501a3OUT
[frekla@ns1 fasm_tmp]$     

Can any one test this one? (second run gave me 20.5 sec !!)
(I attach the modifyed source, note you must change line 428 from:
Code:
                if (first < 0) __asm__ ("int $3 "); // YUCK GAS syntax!!!    

to this to get it to compile with non gcc compiler
Code:
         if (first < 0) __asm int 3;    

)

as I'm in linux I can't provide a pre-compiled exe (only elf you some wants)

[EDIT]Pre-generated asm file can be downloaded from here:
http://w1.364.telia.com/~u36404088/tmp/dwordGEN.ASM.tgz
[/EDIT]


Description: Optimized generator (calls and jumps forced to dword size)
Download
Filename: GenAsm2.c.asm
Filesize: 20.01 KB
Downloaded: 725 Time(s)


_________________
... a professor saying: "use this proprietary software to learn computer science" is the same as English professor handing you a copy of Shakespeare and saying: "use this book to learn Shakespeare without opening the book itself.
- Bradley Kuhn
Post 23 Jan 2004, 20:46
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 23 Jan 2004, 21:33
decard wrote:
Well this time it wasn't really faster - Fresh compiled about 0,5 sec faster on my machine.

It wasn't expected to be much faster, because in assembler module there aren't many critical places that can largely affect the overall speed. Also the difference would only be visible on big sources assembling in large amount of passes.
Post 23 Jan 2004, 21:33
View user's profile Send private message Visit poster's website Reply with quote
khanh



Joined: 25 Jul 2003
Posts: 27
khanh 24 Jan 2004, 10:38
Rolling Eyes Great work!
I'm hearing forward from your success
Post 24 Jan 2004, 10:38
View user's profile Send private message Reply with quote
Tommy



Joined: 17 Jun 2003
Posts: 489
Location: Norway
Tommy 24 Jan 2004, 18:22
Great Privalov! Wink Every little improvement is worth it! Wink Keep it up!

Cheers,
Tommy
Post 24 Jan 2004, 18:22
View user's profile Send private message Visit poster's website Reply with quote
Randall Hyde



Joined: 03 Dec 2003
Posts: 57
Randall Hyde 30 Jan 2004, 23:37
Hi all,
I've modified my assembly benchmark generator program to produce some FASM compatible output. I'm sure the macros I wrote really stink, so if anyone has some suggestions for the code generation for FASM, I'd appreciate some input.

You can read about the assembler benchmark generator here:
http://www.masmforum.com/topic.php?t=1873&postdays=0&postorder=asc&start=0

And you can download the code with the FASM generator below.
Cheers,
Randy Hyde

BTW, here's a typical code sequence generated by the benchmark generator:
Code:
 format  MS COFF 
_fk_73n equ 0560A7409h
_14udcdksnb00 equ 1896619531
w_ equ 0AABE57BDh
u_t393h equ 02C65D964h
cxfra_p10bp03vs equ 3305213952

 section     '.data' data readable writeable align 16
sv__fk_73n dd _fk_73n-1443525641+cxfra_p10bp03vs-0C5019000h+u_t393h
sv__14udcdksnb00 dd _14udcdksnb00-0710C1E0Bh+u_t393h-744872292+_fk_73n
sv_w_ dd w_-2864601021+w_-2864601021+u_t393h
sv_u_t393h dd u_t393h-744872292+_14udcdksnb00-0710C1E0Bh+u_t393h
sv_cxfra_p10bp03vs dd cxfra_p10bp03vs-0C5019000h+_fk_73n-1443525641+_fk_73n

macro empty
{
}
macro parms a, b, c, d, e
{
  db a, b, c, d, e
}

empty
empty
empty
empty
empty
parms  0,1,2,3,4
parms  0,1,2,3,4
parms  0,1,2,3,4
parms  0,1,2,3,4
parms  0,1,2,3,4
 section   '.text' code readable executable align 16

macro _if r1, r2
{
 cmp r1, r2
 jne @f
}

macro _elseif r1, r2
{
@@: cmp r1, r2
 jne @f
}
macro _else
{
@@:
}
macro _endif
{
@@:
}
macro _while r1, r2
{
@@:
 cmp r1, r2
 jnb @f
}
macro _endw
{
 jmp @b
@@:
}

_if eax, ebx
_elseif ecx, 1
_else
_endif
_while eax, 10
 inc eax
_endw

_if eax, ebx
_elseif ecx, 1
_else
_endif
_while eax, 10
 inc eax
_endw

_if eax, ebx
_elseif ecx, 1
_else
_endif
_while eax, 10
 inc eax
_endw

_if eax, ebx
_elseif ecx, 1
_else
_endif
_while eax, 10
 inc eax
_endw

_if eax, ebx
_elseif ecx, 1
_else
_endif
_while eax, 10
 inc eax
_endw
    


Description:
Download
Filename: bm.zip
Filesize: 15.3 KB
Downloaded: 717 Time(s)

Post 30 Jan 2004, 23:37
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 30 Jan 2004, 23:42
You shouldn't use "equ" for equates, as in fasm "equ" is used for defining symbolic constants - a kind of macros. Numerical constants are defined using = symbol in fasm.
Post 30 Jan 2004, 23:42
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 31 Jan 2004, 00:48
I'm attaching the corrected version of benchmark, which defines the equates fasm's way.

And here are some result of the benchmark with parameter n only (time in seconds, tested on Duron 600). As you can see, the improvement of fasm 1.51 over 1.50 is really large for bigger files. I have put also results for MASM 6.14 for comparision, though it refused to assemble files with n larger than 3000, due to problems with such a number of .if-like structures.
Code:
n       fasm 1.50       fasm 1.51       MASM 6.14

1000    0.25            0.15            0.35
2000    0.45            0.35            0.65
3000    0.75            0.45            1.1
4000    1.1             0.65            ?
5000    1.5             0.75            ?
10000   7.9             1.5             ?
20000   71.2            3.0             ?    

So I have made another test, attaching the h1000 parameter after n setting to keep the amount of .if-structures small. The results are:
Code:
n       fasm 1.50       fasm 1.51       MASM 6.14

1000    0.25            0.15            0.25
2000    0.35            0.25            0.35
5000    1.2             0.45            0.65
10000   7.3             0.75            1.3
20000   69.4            1.4             3.4
50000   487.9           3.6             15.6
100000  ?               7.9             59.5    

I wasn't just patient enough to wait for fasm 1.50 to assemble the "n100000 h1000" benchmark. Also, for higher n values I tried the benchmark generated some duplicated labels, so it became unusable.


Description:
Download
Filename: bm.zip
Filesize: 15.17 KB
Downloaded: 705 Time(s)

Post 31 Jan 2004, 00:48
View user's profile Send private message Visit poster's website Reply with quote
comrade



Joined: 16 Jun 2003
Posts: 1150
Location: Russian Federation
comrade 31 Jan 2004, 01:23
Unbelievable!

_________________
comrade (comrade64@live.com; http://comrade.ownz.com/)
Post 31 Jan 2004, 01:23
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Randall Hyde



Joined: 03 Dec 2003
Posts: 57
Randall Hyde 31 Jan 2004, 05:52
Privalov wrote:
You shouldn't use "equ" for equates, as in fasm "equ" is used for defining symbolic constants - a kind of macros. Numerical constants are defined using = symbol in fasm.


Okay, that's an easy fix. And a good thing to know (e.g., for HLA code generation which uses EQUs).
Cheers,
Randy Hyde
Post 31 Jan 2004, 05:52
View user's profile Send private message Visit poster's website Reply with quote
Randall Hyde



Joined: 03 Dec 2003
Posts: 57
Randall Hyde 31 Jan 2004, 05:55
Privalov wrote:
Also, for higher n values I tried the benchmark generated some duplicated labels, so it became unusable.


Hmmm...
That's interesting, I'll have to figure out why the duplicate symbol problem is occurring. Amazing speedups on the large files, though.
Cheers,
Randy Hyde
Post 31 Jan 2004, 05:55
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 31 Jan 2004, 08:15
Hi all.
Here are the results of above tests in graphical view. You can see, that FASM 1.51 graphic is very close to straight line, that IMHO shows that it uses almost ideal algorithms for building label lists. Even MASM's graphic becomes warped at big values. Unfortunately that means that future improvements in the speed (of parser) are almost imposible.

Regards.


Description:
Filesize: 4.7 KB
Viewed: 20057 Time(s)

chart.png




Last edited by JohnFound on 03 Feb 2004, 22:20; edited 1 time in total
Post 31 Jan 2004, 08:15
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Randall Hyde



Joined: 03 Dec 2003
Posts: 57
Randall Hyde 02 Feb 2004, 16:20
JohnFound wrote:
Hi all.
Here are the results of above tests in graphical view. You can see, that FASM 1.51 graphic is very close to straight line, that IMHO shows that it uses almost ideal algorithms for building label lists. Even MASM's graphic becomes warped at big values. Unfortunately that means that future improvements in the speed (of parser) are almost imposible.

Regards.


I'm assuming the Y-axis was supposed to be some unit involving time Smile.

As for parser improvement, that depends mainly on whether the symbol table routines are still consuming the most time or whether the bottleneck has shifted elsewhere. OTOH, assembly times have probably gotten to the point where, even for large project, we don't need to worry about them anymore. What's the current assembly time for FRESH?
Cheers,
Randy Hyde
Post 02 Feb 2004, 16:20
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 02 Feb 2004, 16:36
Randall Hyde wrote:
I'm assuming the Y-axis was supposed to be some unit involving time Smile.


Well, the units are: X - n parameter of your benchmark, and Y - time in seconds. You know, some people still uses "," for decimal separator and "nothing" for thousends. Wink

Quote:
As for parser improvement, that depends mainly on whether the symbol table routines are still consuming the most time or whether the bottleneck has shifted elsewhere.


Your test in it's present stage can significaly load only the parser of FASM, the results indicate only the performance of the parser and maybe a little of the preprocessor.

Regards.
Post 02 Feb 2004, 16:36
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Randall Hyde



Joined: 03 Dec 2003
Posts: 57
Randall Hyde 03 Feb 2004, 21:19
JohnFound wrote:
Randall Hyde wrote:
I'm assuming the Y-axis was supposed to be some unit involving time Smile.


Well, the units are: X - n parameter of your benchmark, and Y - time in seconds. You know, some people still uses "," for decimal separator and "nothing" for thousends. Wink


I was wondering more about the numbers themselves. 10,000 seconds at the low end seems kind of high Smile
Cheers,
Randy Hyde
Post 03 Feb 2004, 21:19
View user's profile Send private message Visit poster's website Reply with quote
pelaillo
Missing in inaction


Joined: 19 Jun 2003
Posts: 878
Location: Colombia
pelaillo 03 Feb 2004, 21:32
2hr 42min
Post 03 Feb 2004, 21:32
View user's profile Send private message Yahoo Messenger Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 03 Feb 2004, 22:22
It is fixed. Thank you.
Post 03 Feb 2004, 22:22
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
HarryTuttle



Joined: 26 Sep 2003
Posts: 211
Location: Poland
HarryTuttle 04 Feb 2004, 11:02
it seems that Fasm1.47 is faster compared with all tested here.
What do you think about it?

_________________
Microsoft: brings power of yesterday to computers of today.
Post 04 Feb 2004, 11:02
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 04 Feb 2004, 11:34
HarryTuttle wrote:
it seems that Fasm1.47 is faster compared with all tested here.
What do you think about it?


Hm, what actually you mean??? Confused
Post 04 Feb 2004, 11:34
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Octavio



Joined: 21 Jun 2003
Posts: 366
Location: Spain
Octavio 21 Mar 2004, 03:32
[quote="Privalov"]Recently I've decided it is the high time to suspend adding new features into fasm and concentrate on speed optimizations of what we've already got -

I have been testing a new algorithm to calculate branch displacements
with a correction value for backwards jumps ,whith this algoritm the numbers of pases seems to be independent of the program size and in most cases only 3-6 passes are required. the correction value is the diference betwen the actual value of a label and the value stored in a previous pass .
Post 21 Mar 2004, 03:32
View user's profile Send private message Visit poster's website Reply with quote
CodeForce



Joined: 22 Jun 2004
Posts: 1
CodeForce 22 Jun 2004, 15:09
Hi,

I came accross this thread when searching for hash algorihms and think I can add some info/suggestions: One of the better sites I seen about hashing is this one. It compares different hasing algorithms and I am sure some of them will fit fasm's workload better then others, its worth taking a look. ANother link is this one.

As for the statiement that hasing problably won't improve the application beyond its current state, I like to add some food for thought. Hasing performance consists of various parameters:

1. Hash table size
2. Key distribution
3. Key collision handling
4. Open/close hash tables

And some more. Try experimeting beyond just replacing one hash function for some other and change collision handling for example. For closed tables considder using the next slot in the table when having a collision, if that does not work, use the one after that. Why? Because it improves locality of reference, somehting which current processors are very sensitive to. For open tables, try to use sorted lists after each table entry. Even better then lists are arrays fo corse (for the same reason, but you need a very good distribution for this). Also considder mixing both methods, eg, each hash table entry its own fixed size array, but when missed, considder the next array in the hash table as an alternative. This can keep the hash table relatively small and cost you only more comparisons (sometimes).

Also try to measure how much time is currenlty spend in the label storage/lookup code. If this is next to nothing, then don't bother optimising for speed, just optimise for memory usage while preserving speed!

Beyond all this rumbling, I am plesantly surprised there are still folks out there that program things in assembler. It used to be a hobby of mine, shuffling data with the limiter number of registers available, exctracting every ounce of performance out of an arcane x86 architecure (did 68K and 6502 as well). All in all some good memories, even abused the parity bit once to do some cheap multiple xor operation in a linear shift feedback random number gererator (or something like that).

Good luck with your project and thats all I have to say now Smile.


CodeForce
Post 22 Jun 2004, 15:09
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.