flat assembler
Message board for the users of flat assembler.

Index > Macroinstructions > MACROMUL

Author
Thread Post new topic Reply to topic
edfed



Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 24 Oct 2007, 20:50
problem with mul is the execution time

one question:
does exist a macro mul like this one?
Code:
macro eaxx2 {shl eax,1}
macro eaxx3 {lea eax,[eax*3]}
macro eaxx4 {shl eax,2}
macro eaxx5 {lea eax,[eax+eax*4]}
macro eaxx6
{eaxx3
eaxx2}
macro eaxx7
{mov ebx,eax
eaxx8
sub eax,ebx}
macro eaxx8 {shl eax,3}
macro eaxx9 {lea eax,[eax+eax*8]}
macro eaxx10
{eaxx5
eaxx2}
etc etc...

    


it can be usefull to have an universal EAXxX macro
to multiply eax by a BYTE uniquelly
EAXxXX is possible but too much lines to write
is there a fast manner to make that??
repplacing the MUL & IMUL with a macro?


Last edited by edfed on 25 Oct 2007, 21:56; edited 2 times in total
Post 24 Oct 2007, 20:50
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3885
Location: vpcmipstrm
bitRAKE 24 Oct 2007, 21:40
Here is a good link...
http://www.azillionmonkeys.com/qed/amultl2.html

On newer processors MUL is fast enough to not need special treatment. DIV, though - just forget it exists and use another method.
Post 24 Oct 2007, 21:40
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 24 Oct 2007, 21:51
it is for the code on older X86 processors
Post 24 Oct 2007, 21:51
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3885
Location: vpcmipstrm
bitRAKE 24 Oct 2007, 22:25
Quote:
Please note that the optimization model is for pre-MMX based Pentiums. The MMX style pentiums offer more pairing potential, that I have as yet not investigated.
On newer processors the latency can be hidden. For example, on a K7 it is possible to execute the following in 3 cycles:
Code:
mul  ebp
add      ebx, eax
mov eax, [esi][j]
mov    [esi][j][-4], ebx
mov        ebx, esp
adc ebx, edx    
ESP=0 here to reduce code size. ESI offsets are all one byte for the same reason. Notice how EAX is loaded as early as possible allowing the processor to handle subsequent loops very efficiently.
Post 24 Oct 2007, 22:25
View user's profile Send private message Visit poster's website Reply with quote
shoorick



Joined: 25 Feb 2005
Posts: 1614
Location: Ukraine
shoorick 25 Oct 2007, 04:53
macro eaxx3
{
lea eax,[eax + eax*2]
}
(32-bit mode only)
Post 25 Oct 2007, 04:53
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 25 Oct 2007, 21:58
macro eaxx3 {lea eax,[eax*3]}
Post 25 Oct 2007, 21:58
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 25 Oct 2007, 22:11
shoorick: why 32bit mode only?
edfed: why did you repost same macro again?
Post 25 Oct 2007, 22:11
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 25 Oct 2007, 22:16
sorry
i didn't know that it wasn't the same
Sad Crying or Very sad


Last edited by edfed on 25 Oct 2007, 22:30; edited 1 time in total
Post 25 Oct 2007, 22:16
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 25 Oct 2007, 22:28
wow... it really doesn't compile for some reason.
but the macro is still same shoorick's, since there is no "+esi" nor any chance to put it there somehow
Post 25 Oct 2007, 22:28
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 25 Oct 2007, 22:32
i've made a mistake

lea eax,[eax+3+esi]
and not lea eax,[eax*3+esi]

really sorry
Post 25 Oct 2007, 22:32
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 26 Oct 2007, 14:19
but one thing is sure VID
macromul will demonstrate the many ways to multiply without any imul instruction

after i think about a macrodiv

and with FP numbers, it is possible to make the square root with dividing the exponent by two

cause square root is the half of power
x^1/2

it will help for your floating point thread Wink
Post 26 Oct 2007, 14:19
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 26 Oct 2007, 15:37
and does this have to do with macros? why should this kind of thing be done with macro?
Post 26 Oct 2007, 15:37
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 26 Oct 2007, 15:53
not should be
could be

called function is too slow
macro is like instruction in this case

it's an other way to work
it surelly have an application
and if not, it is only to demonstrate it is possible

probably there is an algorythmic way to calculate the instructions to execute

is there a posibility of overload the internal register that hold the current opcode? without changing the eip reg
and if not, is there a µP that can do it???

like:
mov opcodereg,shr eax,2
Post 26 Oct 2007, 15:53
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 26 Oct 2007, 15:54
or override the internal caches with temporary code?
Post 26 Oct 2007, 15:54
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 26 Oct 2007, 16:15
i don't think using macro in this case is good. It makes your code LESS readable than for example this:
Code:
lea eax, [eax + 2*eax]   ;eax = eax*3
    

or
Code:
;eax = min(eax, ebx)
cmp eax, ebx
cmova eax, ebx 
    


Everyone who is going to read assembly code knows instructions, but he doesn't know your custom macros. He has to find macro declaration, and then understand what macro does - this is not easy, easpecially with complex macros. If there are many such custom macros, it becomes

And another thing, for minor part of application which really needs optimization, i don't see reason to use any macros at all. Pure assembly gives better results.
Post 26 Oct 2007, 16:15
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 26 Oct 2007, 16:21
ok!
Post 26 Oct 2007, 16:21
View user's profile Send private message Visit poster's website Reply with quote
shoorick



Joined: 25 Feb 2005
Posts: 1614
Location: Ukraine
shoorick 29 Oct 2007, 06:48
shoorick: why 32bit mode only? >
seems i was wrong - i thought if you may use only bx,di,si and bp in 16-bit mode as indexing registers, so you may use only them with lea reg,[reg + reg*2], but i see now it is possible for eax also
Post 29 Oct 2007, 06:48
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.