flat assembler
Message board for the users of flat assembler.

Index > Macroinstructions > [fasmg] performance and optimisation

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
_shura



Joined: 22 May 2015
Posts: 61
_shura 23 Sep 2017, 13:13
Ohai,
I have created a huge macroinstructionset and it worked just fine with a small input, but does not really scaled up well. The compilation time is still below 10 seconds, but I do not want it to crash, when I increase the input further.

So now I have to optimise my macros and I guess, loops would be a good starting point, but where should I have look at too? Are there any optimisation-tipps for fasmg you can give? I already figured out, that display has a huge impact in compilation time and even the definition of this macro:
Code:
Macro display strings&
End Macro
    

Increased the compilation-time to a minute (if it does not crashed before).

And how can I measure, which parts are really slow and should be optimised first? Or a feature-request: some macro, thats returns the current time in nanosecons, since compilation started. (Even if this probably will increase the compilation time, because more syscalls. And it could be quite tricky with multiple runs.)

If you want to see what I mean: https://github.com/sivizius/sucks execute ./main.sh examples/uf4.sucks
(nomen est omen)
Post 23 Sep 2017, 13:13
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8263
Location: Kraków, Poland
Tomasz Grysztar 23 Sep 2017, 15:43
_shura wrote:
The compilation time is still below 10 seconds, but I do not want it to crash, when I increase the input further.
Does is crash for you at any time? If yes, please report every crash (include replication steps and/or minimal source if possible), because if any happens this is a serious bug that should be fixed.
_shura wrote:
So now I have to optimise my macros and I guess, loops would be a good starting point, but where should I have look at too? Are there any optimisation-tipps for fasmg you can give?
I've been giving small tips here and there, mainly reducing this to a simple heuristic that you should try to keep small the number of source lines that are processed. The loops are, as you guess, a critical part, because any large loop that get repeated many times results in a large number of lines processed. This is the one fundamental disadvantage of fasm 2 / fasm g architecture that I've been warning about since the beginning: it needs to constantly reinterpret the ever-mutating source text, so it is not possible to pre-compile or even cache the meaning of a line like fasm 1 did.

You are right about DISPLAY, every invocation of this directive causes some system calls, so this may get slow if you use it a lot. It is better to buffer the data in some way, as a growing string or inside a VIRTUAL block, and then display all at once. And redirect to file to avoid console lag. Or perhaps even better would be to use the new feature and store entire VIRTUAL block holding such buffer into an auxiliary file with VIRTUAL AS.

Also, if the data that you output (either regularly of with DISPLAY) does require some post-processing, consider using POSTPONE ? to skip that processing in the intermediate passes.

_shura wrote:
And how can I measure, which parts are really slow and should be optimised first? Or a feature-request: some macro, thats returns the current time in nanosecons, since compilation started. (Even if this probably will increase the compilation time, because more syscalls. And it could be quite tricky with multiple runs.)
You can use the -v2 option to display messages in real time, this way you can perhaps trace what places are really slow. A real-time timestamp variable might possibly be added, too.
Post 23 Sep 2017, 15:43
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 61
_shura 23 Sep 2017, 17:03
Thank you for links.
Virtual as is already in use, this is a really nice feature I found not until yesterday in the manual.
»small the number of source lines« = better but everything in another macro?
Is there really no way to put e.g. the instructionsets, that are used frequently, in some kind of preprocessed file? Does display not already put everything in a buffer before displaying?

With fasmg g.hxhsr, ./make.sh examples/uf4.flib:
Quote:

flat assembler version g.hxhsr
3 passes, 1.0 seconds, 1922 bytes.

With definition of
Code:
Macro display strings&
End Macro
    

at the very beginning:
Quote:

flat assembler version g.hxhsr
3 passes, 9.1 seconds, 1922 bytes.

With definition of
Code:
Macro display strings&
  Display strings
End Macro
    

at the very beginning:
Quote:

flat assembler version g.hxhsr
3 passes, 9.7 seconds, 1922 bytes.

So the impact of defining display is more relevant than displaying itself.
Then I added:
Code:
Repeat 100
  mov eax, ecx ;my definition of this x86-instruction
End Repeat
    

I get
Quote:

3 passes, 11.2 seconds, 2122 bytes.

Then I changed it to:
Code:
Repeat 1000 ;factor 10, to I expect 110.0 seconds (about 2 minutes)
  mov eax, ecx ;my definition of this x86-instruction
End Repeat
    

But now I get:
Quote:

3 passes, 264.4 seconds, 3922 bytes.

3 passes is ok, that is, what I expected, because it is always 3 passes. Without any definition of display, I get
Quote:

3 passes, 3.7 seconds, 3922 bytes.

Which is also just fine (well, it could be better, but for ~1200 Instructions, this is OK for my crappy PC)
I guess, redefining an internal macro is just a very bad idea?
Post 23 Sep 2017, 17:03
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8263
Location: Kraków, Poland
Tomasz Grysztar 23 Sep 2017, 17:10
_shura wrote:
I guess, redefining an internal macro is just a very bad idea?
Well, it is when this macro is then used thousands of times. You must be using DISPLAY extensively in your sources if re-defining has such an impact.
Post 23 Sep 2017, 17:10
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 61
_shura 23 Sep 2017, 17:16
Well, thats what I do, listing, debugging, displaying some structures, etc. (^.^); not a good idea?
Btw. it is only the listing-part, which is slow, so I guess, its a good starting point.
Post 23 Sep 2017, 17:16
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8263
Location: Kraków, Poland
Tomasz Grysztar 23 Sep 2017, 17:30
I must look deeper at your sources, a factor of 9 is a bit strange for just simply re-defining DISPLAY. If I do this:
Code:
repeat 1000000
        display ''
end repeat    
and then precede it with:
Code:
macro display? any&
end macro    
it does slow down only by a factor 1.5, and when I switch the definition to:
Code:
macro display? any&
        display any
end macro    
the factor is about 2 compared to the initial one. So the numbers are very different from your case, and there is actually a huge difference between re-defining DISPLAY as empty macro or not.

How much RAM do you have? Could the memory usage have an impact on performance in your case? If you look at my fossil repository, I have recently created a branch that has reduced memory usage. Normally it has a worse performance than the official version, but on machines with little RAM it might be advantageous.
Post 23 Sep 2017, 17:30
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 61
_shura 23 Sep 2017, 18:37
Intel Core i5 CPU M 520 @ 2.40GHz (x86_64), 2×2 CPUs, 8 GB RAM – it is an ThinkPad T410.
Post 23 Sep 2017, 18:37
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8263
Location: Kraków, Poland
Tomasz Grysztar 23 Sep 2017, 19:10
Then memory is definitely not an issue. There must be some interaction within your sources that causes this. I will look into it when I have more time.
Post 23 Sep 2017, 19:10
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8263
Location: Kraków, Poland
Tomasz Grysztar 18 Oct 2017, 14:19
Today I have fixed a bug that was causing a runaway effect and a large slowdowns with some macros. Please check whether it affects this problem too.
Post 18 Oct 2017, 14:19
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 61
_shura 18 Oct 2017, 16:52
Uhm, actually I tried the old code with g.hxhsr again and there was not any difference in speed with or without definition. *confused*
Post 18 Oct 2017, 16:52
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8263
Location: Kraków, Poland
Tomasz Grysztar 18 Oct 2017, 17:56
Perhaps there was some additional factor.
Does it work exactly the same with the latest version then?
Post 18 Oct 2017, 17:56
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 61
_shura 18 Oct 2017, 19:04
Same speed. I cannot figure it out, why I do not have any performance-differences between the version with and without definition of display. Neither with my current version, nor with the version of the repository I mentioned here (https://github.com/sivizius/sucks/tree/06dd40f76026597bf104b8399c627fc96b158e87). The only change I make since september is, that I changed from firefox to chromium as my browser. I call this is very weird, if this was the reason, but I cannot recreate this problem.
Post 18 Oct 2017, 19:04
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 61
_shura 18 Oct 2017, 19:16
I get arround 10s with:
Code:
Repeat 2000000
  display `%
End Repeat
    

arround 13s with:
Code:
Macro display string&
End Macro
Repeat 2000000
  display `%
End Repeat
    

and arround 15s with
Code:
Macro display string&
  Display string
End Macro
Repeat 2000000
  display `%
End Repeat
    

both with g.hxhsr and g.hxhsr.
So the version with definition of a Macro »display« is slower, but by a factor arround 1.5 and not 10, which is fine.
Post 18 Oct 2017, 19:16
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 61
_shura 18 Oct 2017, 19:31
The linux-version of fasmg depends on shared libraries, right? Could be an update of my system affect the speed?
Post 18 Oct 2017, 19:31
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8263
Location: Kraków, Poland
Tomasz Grysztar 18 Oct 2017, 20:11
The only thing it uses from shared libraries is malloc.
Post 18 Oct 2017, 20:11
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 61
_shura 19 Oct 2017, 16:37
I guess, the best way to speed up things is to reduce the number of passes and avoiding loops. I use in my macroinstructionset queques, stacks and other memory-structures, that do not necessarily get into the final output, but because these structures should be of dynamic size, I need at least 2 passes to calculate the required size.
One way to optimise this is to such memory-spaces, that can be accessed with load/store like data like this:
Code:
myMemory allocate 12
myMemory reallocate 14
store byte 42 at myMemory: 13
load temp byte from myMemory: 13
    

If you do not want new instructions, then this may a way to implement it:
Code:
virtual
  myMemory::
    rb 12
end virtual
myMemory = myMemory + 2
size = sizeof myMemory ;size = 14
store byte 42 at myMemory: 13
load temp byte from myMemory: 13
    

My second suggestion is to add some notation for list-variables. I could do it like this:
Code:
struc addToList entry*, value*, field
  repeat 1, item: entry
    match any, field
      .item#%#.#any = value
    else
      .item#% = value
    end match
  end repeat
end struc
struc getItem list*, entry*, field
  repeat 1, item: entry
    match any, field
      . = myList.item#%#.#any
    else
      . = myList.item#%
    end match
  end repeat
end struc
myList addToList 3, "foo"
myList addToList 3, "bar", bar
temp getItem myList, 3, bar
display temp
temp getItem myList, 3
display temp
    

But I really prefer something like this instead:
Code:
myList[3] = "foo"
myList[3].bar = "bar"
display myList[3].bar, myList[3]
    


By the way: Is there a way to delete all definitions inside I namespace, e.q.:
Code:
foo.bar = 0
foo.bar.a = 1
foo.bar.b = 2
foo.bar.a.b = 3
foo = 4
delete foo.bar.* ;to delete foo.bar.a, foo.bar.b and foo.bar.a.b, but not foo and foo.bar
    


I can live without that, but this could decrease memory-usage.
Post 19 Oct 2017, 16:37
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8263
Location: Kraków, Poland
Tomasz Grysztar 19 Oct 2017, 16:55
_shura wrote:
I guess, the best way to speed up things is to reduce the number of passes and avoiding loops. I use in my macroinstructionset queques, stacks and other memory-structures, that do not necessarily get into the final output, but because these structures should be of dynamic size, I need at least 2 passes to calculate the required size.
One way to optimise this is to such memory-spaces, that can be accessed with load/store like data like (...)
Please take at look at the recent addition of VIRTUAL block continuation. This is more or less what you described here.

_shura wrote:
But I really prefer something like this instead:
Code:
myList[3] = "foo"
myList[3].bar = "bar"
display myList[3].bar, myList[3]
    
This does not look far from variants that already work:
Code:
myList#3 = "foo"
myList#3.bar = "bar"
display myList#3.bar, myList#3

myList.3 = "foo"
myList.3.bar = "bar"
display myList.3.bar, myList.3    
I guess that the real issue you signal here is to have the ability to evaluate an expression to give an index within a single line. For that a macro is the only way.

_shura wrote:
I can live without that, but this could decrease memory-usage.
Even though (as opposed to fasm 1) fasmg could reuse some of the memory released this way, this would not really make that much of a difference. The bulk of memory usage in fasmg comes from maintaining the structures that are related to symbols but cannot be released even when symbols are undefined. And undefining all the symbols within a namespace is slow.
Post 19 Oct 2017, 16:55
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 61
_shura 19 Oct 2017, 17:27
Tomasz Grysztar wrote:
Please take at look at the recent addition of VIRTUAL block continuation. This is more or less what you described here.

I missed this update of virtual-blocks and this looks good and may improve things. But the manual is a bit confusing, because it explains two separate things in one example:
Code:
virtual as 'log'
  db 'this will be put in the separate file »outputfile.log«'
end virtual
    

and
Code:
virtual
  foobar::
end virtual
virtual foobar
  db "Hello World!", 13, 10
end virtual
    

and I suggest to split this up.
Anyway: Thank you for that.

Tomasz Grysztar wrote:
I guess that the real issue you signal here is to have the ability to evaluate an expression to give an index within a single line. For that a macro is the only way.

Exactly. My idea was to implement a stack this way with an stack-pointer. I suggest to allow expressions like this
Code:
myList#(expression)
    


Tomasz Grysztar wrote:
And undefining all the symbols within a namespace is slow.

I have already feared that.
Post 19 Oct 2017, 17:27
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8263
Location: Kraków, Poland
Tomasz Grysztar 19 Oct 2017, 20:12
_shura wrote:
But the manual is a bit confusing, because it explains two separate things in one example
The examples are there to illustrate, not explain. The AS feature is described earlier in the manual, but has no dedicated example, this is perhaps what misled you.

_shura wrote:
Tomasz Grysztar wrote:
I guess that the real issue you signal here is to have the ability to evaluate an expression to give an index within a single line. For that a macro is the only way.

Exactly. My idea was to implement a stack this way with an stack-pointer. I suggest to allow expressions like this
Code:
myList#(expression)
    
Allowing expressions embedded inside symbol identifiers is a can of worms that I am not willing to open. I intentionally kept the identifier syntax not only simple, but in a similar syntactical range as symbol names from fasm 1, because this ensures that any more or less standard assembly language that one would want to implement with fasmg's macros is not going to have problems like unwanted interactions with the identifier syntax. An expression inside an identifier would be especially nasty, as this would entail arbitrary nesting. There is a reason why I design fasm's language to be emergent and based only on simple features.
Post 19 Oct 2017, 20:12
View user's profile Send private message Visit poster's website Reply with quote
_shura



Joined: 22 May 2015
Posts: 61
_shura 19 Oct 2017, 21:20
Currently »#(« always result in an invalid instruction, argument or expression, so this would not interact with current code. Is the philosophy of fasmg more important than writing readable code?
Well, I could parse expressions so a read of myList#(1+2) or mylist[1+2] is allowed in some macros, but there is no way to add some of my syntactical sugar to assignments (»myList[1+2] = "foobar"«). Of course, I could parse my code with the ?!-macro, but it this is not very convenient.
What about allowing to defining the struc »=«, so I can do all of my syntactical sugar myself without having it written in fasmg-source:
Code:
struc = expression&
  local temp, expr
  ;=!  my idea to reference the default =
  match list=[ list_expression =], .
    expr parseExpression expression
    temp parseExpression list_expression
    repeat 1, item: temp
      list.#% =! expr
    end repeat
  else
    temp checkSymbol .
    if ( temp = 0 )
       expr parseExpression expression
       . =! expr
    else
       compileExpression expression
       mov dword [ temp.addr ], eax
    end if
  end match
end struc
abc[1+2] = foo("bar", 3)
    

This would allow custom high-level-syntax inside fasmg, but without defining this struc, nothing would be changed.
Post 19 Oct 2017, 21:20
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.