flat assembler
Message board for the users of flat assembler.
 Home   FAQ   Search   Register 
 Profile   Log in to check your private messages   Log in 
flat assembler > Macroinstructions > [fasmg] performance and optimisation

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
_shura



Joined: 22 May 2015
Posts: 60
[fasmg] performance and optimisation
Ohai,
I have created a huge macroinstructionset and it worked just fine with a small input, but does not really scaled up well. The compilation time is still below 10 seconds, but I do not want it to crash, when I increase the input further.

So now I have to optimise my macros and I guess, loops would be a good starting point, but where should I have look at too? Are there any optimisation-tipps for fasmg you can give? I already figured out, that display has a huge impact in compilation time and even the definition of this macro:

Code:

Macro display strings&
End Macro



Increased the compilation-time to a minute (if it does not crashed before).

And how can I measure, which parts are really slow and should be optimised first? Or a feature-request: some macro, thats returns the current time in nanosecons, since compilation started. (Even if this probably will increase the compilation time, because more syscalls. And it could be quite tricky with multiple runs.)

If you want to see what I mean: https://github.com/sivizius/sucks execute ./main.sh examples/uf4.sucks
(nomen est omen)
Post 23 Sep 2017, 13:13
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6676
Location: Kraków, Poland
Re: [fasmg] performance and optimisation

_shura wrote:
The compilation time is still below 10 seconds, but I do not want it to crash, when I increase the input further.

Does is crash for you at any time? If yes, please report every crash (include replication steps and/or minimal source if possible), because if any happens this is a serious bug that should be fixed.

_shura wrote:
So now I have to optimise my macros and I guess, loops would be a good starting point, but where should I have look at too? Are there any optimisation-tipps for fasmg you can give?

I've been giving small tips here and there, mainly reducing this to a simple heuristic that you should try to keep small the number of source lines that are processed. The loops are, as you guess, a critical part, because any large loop that get repeated many times results in a large number of lines processed. This is the one fundamental disadvantage of fasm 2 / fasm g architecture that I've been warning about since the beginning: it needs to constantly reinterpret the ever-mutating source text, so it is not possible to pre-compile or even cache the meaning of a line like fasm 1 did.

You are right about DISPLAY, every invocation of this directive causes some system calls, so this may get slow if you use it a lot. It is better to buffer the data in some way, as a growing string or inside a VIRTUAL block, and then display all at once. And redirect to file to avoid console lag. Or perhaps even better would be to use the new feature and store entire VIRTUAL block holding such buffer into an auxiliary file with VIRTUAL AS.

Also, if the data that you output (either regularly of with DISPLAY) does require some post-processing, consider using POSTPONE ? to skip that processing in the intermediate passes.


_shura wrote:
And how can I measure, which parts are really slow and should be optimised first? Or a feature-request: some macro, thats returns the current time in nanosecons, since compilation started. (Even if this probably will increase the compilation time, because more syscalls. And it could be quite tricky with multiple runs.)

You can use the -v2 option to display messages in real time, this way you can perhaps trace what places are really slow. A real-time timestamp variable might possibly be added, too.
Post 23 Sep 2017, 15:43
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 60
Thank you for links.
Virtual as is already in use, this is a really nice feature I found not until yesterday in the manual.
»small the number of source lines« = better but everything in another macro?
Is there really no way to put e.g. the instructionsets, that are used frequently, in some kind of preprocessed file? Does display not already put everything in a buffer before displaying?

With fasmg g.hxhsr, ./make.sh examples/uf4.flib:

Quote:

flat assembler version g.hxhsr
3 passes, 1.0 seconds, 1922 bytes.


With definition of

Code:

Macro display strings&
End Macro



at the very beginning:

Quote:

flat assembler version g.hxhsr
3 passes, 9.1 seconds, 1922 bytes.


With definition of

Code:

Macro display strings&
  Display strings
End Macro



at the very beginning:

Quote:

flat assembler version g.hxhsr
3 passes, 9.7 seconds, 1922 bytes.


So the impact of defining display is more relevant than displaying itself.
Then I added:

Code:

Repeat 100
  mov eaxecx ;my definition of this x86-instruction
End Repeat



I get

Quote:

3 passes, 11.2 seconds, 2122 bytes.


Then I changed it to:

Code:

Repeat 1000 ;factor 10, to I expect 110.0 seconds (about 2 minutes)
  mov eaxecx ;my definition of this x86-instruction
End Repeat



But now I get:

Quote:

3 passes, 264.4 seconds, 3922 bytes.


3 passes is ok, that is, what I expected, because it is always 3 passes. Without any definition of display, I get

Quote:

3 passes, 3.7 seconds, 3922 bytes.


Which is also just fine (well, it could be better, but for ~1200 Instructions, this is OK for my crappy PC)
I guess, redefining an internal macro is just a very bad idea?
Post 23 Sep 2017, 17:03
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6676
Location: Kraków, Poland

_shura wrote:
I guess, redefining an internal macro is just a very bad idea?

Well, it is when this macro is then used thousands of times. You must be using DISPLAY extensively in your sources if re-defining has such an impact.
Post 23 Sep 2017, 17:10
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 60
Well, thats what I do, listing, debugging, displaying some structures, etc. (^.^); not a good idea?
Btw. it is only the listing-part, which is slow, so I guess, its a good starting point.
Post 23 Sep 2017, 17:16
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6676
Location: Kraków, Poland
I must look deeper at your sources, a factor of 9 is a bit strange for just simply re-defining DISPLAY. If I do this:

Code:
repeat 1000000
        display ''
end repeat

and then precede it with:

Code:
macro display? any&
end macro

it does slow down only by a factor 1.5, and when I switch the definition to:

Code:
macro display? any&
        display any
end macro

the factor is about 2 compared to the initial one. So the numbers are very different from your case, and there is actually a huge difference between re-defining DISPLAY as empty macro or not.

How much RAM do you have? Could the memory usage have an impact on performance in your case? If you look at my fossil repository, I have recently created a branch that has reduced memory usage. Normally it has a worse performance than the official version, but on machines with little RAM it might be advantageous.
Post 23 Sep 2017, 17:30
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 60
Intel Core i5 CPU M 520 @ 2.40GHz (x86_64), 2×2 CPUs, 8 GB RAM – it is an ThinkPad T410.
Post 23 Sep 2017, 18:37
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6676
Location: Kraków, Poland
Then memory is definitely not an issue. There must be some interaction within your sources that causes this. I will look into it when I have more time.
Post 23 Sep 2017, 19:10
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6676
Location: Kraków, Poland
Today I have fixed a bug that was causing a runaway effect and a large slowdowns with some macros. Please check whether it affects this problem too.
Post 18 Oct 2017, 14:19
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 60
Uhm, actually I tried the old code with g.hxhsr again and there was not any difference in speed with or without definition. *confused*
Post 18 Oct 2017, 16:52
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6676
Location: Kraków, Poland
Perhaps there was some additional factor.
Does it work exactly the same with the latest version then?
Post 18 Oct 2017, 17:56
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 60
Same speed. I cannot figure it out, why I do not have any performance-differences between the version with and without definition of display. Neither with my current version, nor with the version of the repository I mentioned here (https://github.com/sivizius/sucks/tree/06dd40f76026597bf104b8399c627fc96b158e87). The only change I make since september is, that I changed from firefox to chromium as my browser. I call this is very weird, if this was the reason, but I cannot recreate this problem.
Post 18 Oct 2017, 19:04
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 60
I get arround 10s with:

Code:

Repeat 2000000
  display `%
End Repeat



arround 13s with:

Code:

Macro display string&
End Macro
Repeat 2000000
  display `%
End Repeat



and arround 15s with

Code:

Macro display string&
  Display string
End Macro
Repeat 2000000
  display `%
End Repeat



both with g.hxhsr and g.hxhsr.
So the version with definition of a Macro »display« is slower, but by a factor arround 1.5 and not 10, which is fine.
Post 18 Oct 2017, 19:16
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 60
The linux-version of fasmg depends on shared libraries, right? Could be an update of my system affect the speed?
Post 18 Oct 2017, 19:31
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6676
Location: Kraków, Poland
The only thing it uses from shared libraries is malloc.
Post 18 Oct 2017, 20:11
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 60
I guess, the best way to speed up things is to reduce the number of passes and avoiding loops. I use in my macroinstructionset queques, stacks and other memory-structures, that do not necessarily get into the final output, but because these structures should be of dynamic size, I need at least 2 passes to calculate the required size.
One way to optimise this is to such memory-spaces, that can be accessed with load/store like data like this:

Code:

myMemory allocate 12
myMemory reallocate 14
store byte 42 at myMemory13
load temp byte from myMemory13



If you do not want new instructions, then this may a way to implement it:

Code:

virtual
  myMemory::
    rb 12
end virtual
myMemory = myMemory + 2
size = sizeof myMemory ;size = 14
store byte 42 at myMemory13
load temp byte from myMemory13



My second suggestion is to add some notation for list-variables. I could do it like this:

Code:

struc addToList entry*, value*, field
  repeat 1itementry
    match anyfield
      .item#%#.#any = value
    else
      .item#% = value
    end match
  end repeat
end struc
struc getItem list*, entry*, field
  repeat 1itementry
    match anyfield
      . = myList.item#%#.#any
    else
      . = myList.item#%
    end match
  end repeat
end struc
myList addToList 3"foo"
myList addToList 3"bar"bar
temp getItem myList3bar
display temp
temp getItem myList3
display temp



But I really prefer something like this instead:

Code:

myList[3] = "foo"
myList[3].bar = "bar"
display myList[3].barmyList[3]




By the way: Is there a way to delete all definitions inside I namespace, e.q.:

Code:

foo.bar = 0
foo.bar.a = 1
foo.bar.b = 2
foo.bar.a.b = 3
foo = 4
delete foo.bar.;to delete foo.bar.a, foo.bar.b and foo.bar.a.b, but not foo and foo.bar




I can live without that, but this could decrease memory-usage.
Post 19 Oct 2017, 16:37
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6676
Location: Kraków, Poland

_shura wrote:
I guess, the best way to speed up things is to reduce the number of passes and avoiding loops. I use in my macroinstructionset queques, stacks and other memory-structures, that do not necessarily get into the final output, but because these structures should be of dynamic size, I need at least 2 passes to calculate the required size.
One way to optimise this is to such memory-spaces, that can be accessed with load/store like data like (...)

Please take at look at the recent addition of VIRTUAL block continuation. This is more or less what you described here.


_shura wrote:
But I really prefer something like this instead:

Code:

myList[3] = "foo"
myList[3].bar = "bar"
display myList[3].barmyList[3]



This does not look far from variants that already work:

Code:
myList#3 = "foo"
myList#3.bar = "bar"
display myList#3.barmyList#3

myList.3 = "foo"
myList.3.bar = "bar"
display myList.3.barmyList.3

I guess that the real issue you signal here is to have the ability to evaluate an expression to give an index within a single line. For that a macro is the only way.


_shura wrote:
I can live without that, but this could decrease memory-usage.

Even though (as opposed to fasm 1) fasmg could reuse some of the memory released this way, this would not really make that much of a difference. The bulk of memory usage in fasmg comes from maintaining the structures that are related to symbols but cannot be released even when symbols are undefined. And undefining all the symbols within a namespace is slow.
Post 19 Oct 2017, 16:55
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 60

Tomasz Grysztar wrote:
Please take at look at the recent addition of VIRTUAL block continuation. This is more or less what you described here.


I missed this update of virtual-blocks and this looks good and may improve things. But the manual is a bit confusing, because it explains two separate things in one example:

Code:

virtual as 'log'
  db 'this will be put in the separate file »outputfile.log«'
end virtual



and

Code:

virtual
  foobar::
end virtual
virtual foobar
  db "Hello World!"1310
end virtual



and I suggest to split this up.
Anyway: Thank you for that.


Tomasz Grysztar wrote:
I guess that the real issue you signal here is to have the ability to evaluate an expression to give an index within a single line. For that a macro is the only way.


Exactly. My idea was to implement a stack this way with an stack-pointer. I suggest to allow expressions like this

Code:

myList#(expression)





Tomasz Grysztar wrote:
And undefining all the symbols within a namespace is slow.


I have already feared that.
Post 19 Oct 2017, 17:27
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6676
Location: Kraków, Poland

_shura wrote:
But the manual is a bit confusing, because it explains two separate things in one example

The examples are there to illustrate, not explain. The AS feature is described earlier in the manual, but has no dedicated example, this is perhaps what misled you.


_shura wrote:

Tomasz Grysztar wrote:
I guess that the real issue you signal here is to have the ability to evaluate an expression to give an index within a single line. For that a macro is the only way.


Exactly. My idea was to implement a stack this way with an stack-pointer. I suggest to allow expressions like this

Code:

myList#(expression)



Allowing expressions embedded inside symbol identifiers is a can of worms that I am not willing to open. I intentionally kept the identifier syntax not only simple, but in a similar syntactical range as symbol names from fasm 1, because this ensures that any more or less standard assembly language that one would want to implement with fasmg's macros is not going to have problems like unwanted interactions with the identifier syntax. An expression inside an identifier would be especially nasty, as this would entail arbitrary nesting. There is a reason why I design fasm's language to be emergent and based only on simple features.
Post 19 Oct 2017, 20:12
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 60
Currently »#(« always result in an invalid instruction, argument or expression, so this would not interact with current code. Is the philosophy of fasmg more important than writing readable code?
Well, I could parse expressions so a read of myList#(1+2) or mylist[1+2] is allowed in some macros, but there is no way to add some of my syntactical sugar to assignments (»myList[1+2] = "foobar"«). Of course, I could parse my code with the ?!-macro, but it this is not very convenient.
What about allowing to defining the struc »=«, so I can do all of my syntactical sugar myself without having it written in fasmg-source:

Code:

struc = expression&
  local tempexpr
  ;=! – my idea to reference the default =
  match list=[ list_expression =], .
    expr parseExpression expression
    temp parseExpression list_expression
    repeat 1itemtemp
      list.#% =! expr
    end repeat
  else
    temp checkSymbol .
    if ( temp = 0 )
       expr parseExpression expression
       . =! expr
    else
       compileExpression expression
       mov dword [ temp.addr ], eax
    end if
  end match
end struc
abc[1+2] = foo("bar"3)



This would allow custom high-level-syntax inside fasmg, but without defining this struc, nothing would be changed.
Post 19 Oct 2017, 21:20
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >

Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2005 phpBB Group.

Main index   Download   Documentation   Examples   Message board
Copyright © 2004-2016, Tomasz Grysztar.