flat assembler
Message board for the users of flat assembler.

Index > Main > Optimizing - is it true?

Goto page Previous  1, 2, 3  Next
Author
Thread Post new topic Reply to topic
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 02 Mar 2005, 12:43
I thught the branch hints basically execute as no-ops on older processors?
Post 02 Mar 2005, 12:43
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 02 Mar 2005, 14:12
f0dder wrote:
I thught the branch hints basically execute as no-ops on older processors?
Yes, they do, but adding no-ops without any effects on most CPUs isn't woth the effort.

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 02 Mar 2005, 14:12
View user's profile Send private message Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
S.T.A.S. 02 Mar 2005, 22:30
I belive 2E & 3E bytes take no additional time to execute, because they are prefixes (so, considered as a part of opcode), but not nop-equivalents.
Post 02 Mar 2005, 22:30
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 02 Mar 2005, 22:35
STAS, I meant "nop" in the effect that they shouldn't have any effect on CPUs that don't support them as hints. Since they're just CS: and DS: prefixes, they shouldn't take any additional time to execute. It does add a bit to code size, but that is negligible - you're probably only going to use the hints on time-critical code anyway.
Post 02 Mar 2005, 22:35
View user's profile Send private message Visit poster's website Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
S.T.A.S. 02 Mar 2005, 23:14
Oh, we're both talking about the same things Smile
Post 02 Mar 2005, 23:14
View user's profile Send private message Reply with quote
Ralph



Joined: 04 Oct 2003
Posts: 86
Ralph 04 Mar 2005, 19:07
Hey while we're on optimizing, I always wondered what would be faster, mov or an alu instruction. For example, which one is faster?

Code:
sub eax,eax
sub ebx,ebx
    

Code:
sub eax,eax
mov ebx,eax
    


I'm guessing the first one because the mov depends on the sub before it. If that's the case, what happens if you space it out enough to eliminate the dependancy?
Post 04 Mar 2005, 19:07
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 07 Mar 2005, 17:54
the mov is theoretically faster because it only copys 32bits, but add/sub must also take care of upto 32 carrys.
Post 07 Mar 2005, 17:54
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Ralph



Joined: 04 Oct 2003
Posts: 86
Ralph 07 Mar 2005, 23:55
That's a good point. I wasn't thinking about flags. Thanks.
Post 07 Mar 2005, 23:55
View user's profile Send private message Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
S.T.A.S. 11 Mar 2005, 09:43
There's another point:
IA-32 Intel® Architecture Optimization Reference Manual wrote:
Use xor and pxor instructions to clear registers and break dependencies for integer operations
AMD Athlon™ Processor x86 Code Optimization Guide wrote:
To clear an integer register to all 0s, use “XOR reg, reg”. The AMD Athlon processor is able to avoid the false read dependency on the XOR instruction.
Post 11 Mar 2005, 09:43
View user's profile Send private message Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 11 Mar 2005, 11:14
, And here's another point. DON'T use Boolean operators to acomplish NON-BOOLEAN operations, IF you want others to understand your CODE, as opposed, to having others able to read your PROGRAM.
The distinction between MOV and XOR is enormous, from a functional point of view. The 30 picoseconds saved by using XOR instead of MOV is IRRELEVANT.
PRIORITY number one, is, was, and always will be, READABILITY. If your goal is to replace the current contents of a register with zero's, the proper way to do that, is to MOV zero's into the register, NOT add, subtract, or implement even sillier mathematical manipulations, like exclusive or.
Since many view computer science is a branch of MATHEMATICS, this debate will not be easily won by me, and I suppose the vast majority of those perusing this forum have no idea what the perceived issue is! So long as programmers work BY THEMSELVES, it makes no difference what one uses. BUT, on any collaborative endeavor, the BOTTLENECK is ALWAYS readability. Smile
Post 11 Mar 2005, 11:14
View user's profile Send private message Reply with quote
IronFelix



Joined: 09 Dec 2004
Posts: 141
Location: Russia, Murmansk region
IronFelix 11 Mar 2005, 12:29
I think it is obviously for GOOD assembler programmer that XOR EAX,EAX is MOV EAX,0. As for me, it is more clear to read XOR EAX,EAX than MOV EAX,0, because first instruction is faster and easy to understand.
Readability is important of course but you can't write fastest program without optimization, which makes your code much less readable. Use comments - it really helps. And of course optimization must be performed AFTER the readable code is ready and works.
Post 11 Mar 2005, 12:29
View user's profile Send private message Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
S.T.A.S. 11 Mar 2005, 12:41
tom tobias, yes you're right Very Happy that old good story "How to program Pascal language using C compiler"...
However, is there any relation to the art of assembly ? Wink

And, by the way, I'm confused by reading about XOR as a boolean operator, it's just straightforward arithmetics isn't it? Cool
Post 11 Mar 2005, 12:41
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8354
Location: Kraków, Poland
Tomasz Grysztar 11 Mar 2005, 14:12
eXclusive OR is boolean operator, the same as AND and OR are.
Post 11 Mar 2005, 14:12
View user's profile Send private message Visit poster's website Reply with quote
madmatt



Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
madmatt 11 Mar 2005, 14:42
Just my two cent's worth Wink :

for AND, OR, XOR the source operand's bits determines what gets changed in the destination's operand bits, NEG and NOT just take a source operand:

The AND operator uses the source bits to determine which destination bits get cleared, 1 = keep this bit the same, 0 = clear this bit
The OR operator uses the source bits to determine which destination bits get set, 1 = on , 0 = unchanged
The XOR operator uses the source bits to determine which destination bits gets "flipped", 1 = reverse this bit (0=1 and 1=0), 0 = unchanged
The NEG operator simply reverses the sign of the source operand
The NOT operator works like the NEG instruction, except it subtracts one from the source operand
Post 11 Mar 2005, 14:42
View user's profile Send private message Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
S.T.A.S. 11 Mar 2005, 15:16
XOR is boolean operator when we speak about boolean arguments (i.e. bits), not integers.

All CPU commands like 'add', 'and', 'or', 'sub', 'xor' are executed by Arithmetic Logic Unit.
Bitwise opearations are computed with the rules of boolean algebra. But in fact (that is in hardware) arithmetical addition & substractions are just combination of some 'and' and 'or', so why we don't consider them as boolean operators too?

Also, in HLL, there's a difference between bitwise and logical (boolean) operators:
1 & 2 = 0 (this is 'bitwise and', it operates with integer types of data)
but:
1 && 2 = 1 (there's no direct x86 opcode, it's compiled into some Jcc /SETcc stuff)
Post 11 Mar 2005, 15:16
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 11 Mar 2005, 16:50
I think he meant that XOR uses at least 3 gates to accomplish the manipulation while mov uses only 1. Though when keeping in mind the 32-bit processors and 5-byte mov instruction, it is always two reads from the memory so too much gate logic for me, eh!?

My theory is when you "align 4" and "xor eax,eax \ xor ebx,ebx", then its two bytes and issues in 1clock because both take 1µop (or one of UV-pipes) and can fit in one read WHILE "mov eax,0 \ mov ebx,0" are the only instructions not optimized for size like "BYTE mov eax,00h" so there we have it - 10bytes Sad 3-reads (the best) and 4-reads (worst case) ... makes you think doesn't it.

XOR really is the most obvious optimization and it should be used if not for a very good reason Razz

IMUL eax,eax,3 ;can be used but only if you can hide 3+ clocks in the next few instructions.
LEA eax,[eax*3] ;is better even if it held you for a one-clock penalty for AGI-stall.
Post 11 Mar 2005, 16:50
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8354
Location: Kraków, Poland
Tomasz Grysztar 11 Mar 2005, 17:12
S.T.A.S. wrote:

XOR is boolean operator when we speak about boolean arguments (i.e. bits), not integers.

All CPU commands like 'add', 'and', 'or', 'sub', 'xor' are executed by Arithmetic Logic Unit.
Bitwise opearations are computed with the rules of boolean algebra. But in fact (that is in hardware) arithmetical addition & substractions are just combination of some 'and' and 'or', so why we don't consider them as boolean operators too?

The AND, OR, XOR and NOT are called "logical instructions" by Intel, and to quote the Intel's manuals: they "perform the standard Boolean operations for which they are named". Bitwise, of course, since Intel processors do not use any other way of encoding Boolean values that single bits. But these instruction really form a different group than a binary arithmetic or decimal arithmetic, which operate on different data encodings (x86 binary arithmetics operate on two's complement encoding of integers, what could be different in other architectures, etc.).
Post 11 Mar 2005, 17:12
View user's profile Send private message Visit poster's website Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
S.T.A.S. 12 Mar 2005, 12:53
I've searched for word 'Boolean' these (latest ?) Intel's docs:
IA-32 Intel® Architecture Optimization Reference Manual (24896611.pdf)
IA-32 Intel® Architecture Software Developer’s Manual
Instruction Set Reference (25366614.pdf, 25366714.pdf)
and found nothing.

Just System Programming Guide (25366814.pdf Vol. 3 7-31) contains following string:
boolean MONITOR_MWAIT_works = TRUE;

I've read a few old books where XOR was called 'modulo-two congruence addition' with nice functional diagram how to reduce three-input adder (which executes ADD command) into modulo-two sum gate (which executes XOR).

Anyway IMHO this is just a question of philosophy. The name 'ALU' itself symbolizes a very blurred edge between arithmetisc and logics (they are the same for hardware, but humans stuck with old school)

Here's another example:
ADD EAX,EAX does the same thing as SHL EAX,1 and SAL EAX,1 do, but these all are different instructions. BTW SHL and SAL are called 'multiply by 2' (that is arithmetic), but in other 's CPU manuals these are 'shifts' or even 'logical shift'.

PS. Sorry for the offtopic here, but this is rather interesting question like 'why ancient people didn't know of negative numbers'. At the first glance it's silly, but it helped me to understand substraction Smile
Post 12 Mar 2005, 12:53
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8354
Location: Kraków, Poland
Tomasz Grysztar 12 Mar 2005, 13:06
You forgot to seach the Intel's Software Developer's Manual Vol.1: Basic Architecture (25366514.pdf), and this where my quotation comes from (section 7.2.4, page 7-12).
Post 12 Mar 2005, 13:06
View user's profile Send private message Visit poster's website Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
S.T.A.S. 12 Mar 2005, 17:47
Oh yes, my bad Embarassed
But I don't think this volume can be considered as a good argument.

Have a look at next section's name: 7.2.5. Shift and Rotate Instructions. It's rater strange, why not multiply & divide by 2 like in Vol.2 ? And where are *left* and *right* sides of a byte?! Wink BTW, why not top and bottom?

The reason for all that funny stuff is obvious: this is the simplification.
It's easy to say that XOR is boolean instruction, which deals with separate bits of a byte and that's all. But byte is a single whole, not just a collection of bits. Each bit in byte has its own position (or weight). So, to get value of a byte we have to sum all bits multiplied by appropriate power of 2.
Huh, everyone knows that, doesn't he?
NO! This is the great secret. Oh, no, not for you Smile But for many people Sad

I'd like to give a classic example of such "arithmetics".
How to compute: is a number a power of 2?
Of course, it just simple: divide by 2 in cycle and check the result!
Ugh! Let's use magic arithmetic instructions instead:
Code:
; eax = number
        lea     edx, [eax-1]
        and     edx, eax
; if edx = 0, then eax is power of 2    
That's all. Is AND a boolean operatoir in this context?
That's why I say that XOR is arithmetics too.
Post 12 Mar 2005, 17:47
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.