flat assembler
Message board for the users of flat assembler.

Index > Heap > Just curious about 64-Bit

Author
Thread Post new topic Reply to topic
Hicel



Joined: 09 Sep 2004
Posts: 55
Hicel
Hi! Today, I just came up with the question how long would it take before most people out there will use 64-Bit proccessors and OS's. Because of now I for myself see no real reason to use it yet. And its not a "complete" change and addition of features like from 16-Bit to 32-Bit. Same PE header, same structure. Sure it is faster and qwords are the big thing. Or am I wrong? Also I haven't seen any 64-Bit compilers yet ( I believe there are some ) I only know assembers (such as the great fasm Wink ). so what do you think? Is it a good thing to change NOW to 64-Bit programming or just use keep at 32-Bit programming for now?

Also a question: Is it really true if I invoke for example MessageBoxA which has 4 parameters will be invoked by using 4 registers and no pushes. But CreateWindowEx for example has 12 parameters (don't know exactly) and then you have to set 4 registers and push 8 parameters?? If this is true that's just crazy.

I don't know much about all the 64-Bit stuff. So don't critizise me too hard if I wrote some crap Laughing
Post 20 Sep 2005, 18:17
View user's profile Send private message Reply with quote
proveren



Joined: 24 Jul 2004
Posts: 68
Location: Bulgaria
proveren
IMO: 64-bit for home pc's is going to be a long process, because it has no obvious benefits so far. I don't think you would have to put arguments in registers for messageboxA because: 1) this would mean that there would be no backwards compatibility with 32-bit programs (which are not so old to be left out Smile ). 2) Moving into registers rather than pushing would make the time of invoking+exectution of the function not faster than with 0,01% - why bother?
But arithmetic operations are OK with 32-bit registers, I don't think they provided the need for 64-bit CPU.
Post 20 Sep 2005, 22:32
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7725
Location: Kraków, Poland
Tomasz Grysztar
Windows on x86-64 (AMD64/EM64T) architecture uses the fastcall convention. For more details see http://blogs.msdn.com/oldnewthing/archive/2004/01/14/58579.aspx
It really has useful advantages (no, you don't do pushes, you fill and reuse the pre-allocated space on the stack - and no need to restore stack frame until you're done with all the calling, etc.), I plan to post some article about this soon.
Post 20 Sep 2005, 22:46
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
See also Calling Convention Process for x64 64-Bit

I'm looking forward to your article, Tomasz Smile
Post 21 Sep 2005, 12:34
View user's profile Send private message Visit poster's website Reply with quote
bogdanontanu



Joined: 07 Jan 2004
Posts: 403
Location: Sol. Earth. Europe. Romania. Bucuresti
bogdanontanu
For an general context like an OS API call --> putting parameters in registers (aka so called FASTCALL) is much SLOWER that pushing on the STACK Very Happy

This speed assumption from using registers is one of the typical errors of medium and newbie ASM programmer and mostly "advanced" HLL programmers Very Happy

I really can NOT believe that M$ has fallen for it ... it means they have only dreamers left, there are no real programmers with them anymore ...

I did explained it clearly countless times why FASTCALL is slower than STDCALL as a general OS API standard...

Besides FreeBDS versus Linux also demonstrated it many times...

doh hummans ... never learn... I am expecting a "revolution" in 20 years when they will rediscover STDCALL as the next "new" thing Razz
Post 21 Sep 2005, 15:12
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
Huh, why it should be so much slower?
Post 21 Sep 2005, 15:15
View user's profile Send private message Visit poster's website Reply with quote
bogdanontanu



Joined: 07 Jan 2004
Posts: 403
Location: Sol. Earth. Europe. Romania. Bucuresti
bogdanontanu
Oh dear, let's explain it one more time...

Well, first of all let me say that IF you know the exact context and IF you are going to call a procedure millions of times in a tight loop... THEN by all means use a form of parameters passing by registers (aka fastcall). The perfect example is plotiing pixels inside an inner render loop.

From this normal understnading a great error appears by generalization. And you all know how humman like to generalize. They wrongly assume that it will be always faster to use FASTCALL.

Let us consider the environment: General OS --> this means that we do NOT know the context of the algorithm of the application and we can not fix the context of the work done inside the OS API (because there are so many API's with so much different function). Besides no API will be called milions of times inside atight loop Very Happy; insteda it will rarely interlaced with your OWN loops dooing the real hard work (or at least it should).

So, how is it going to be slower?

Simple: the algorithm from higher ground (aka your application) needs the registers for its own inner algorithmical operations that can not be known by the OS. At the same time the API needs the registers and the parameters passed to it to do its job! Do you see the race here?

In short you will be very fast ending up doing something like this:

In the general application:

Code:
mov ecx,counter
... do some work ...
; we have to call an API ooops
push ecx ; save counter
push ebx; save context pointer
mov ecx, api_argument_1
mov ebx, api_argument_2
Call API_Function_XX

;restore my algorithmical context
pop ebx
pop ecx
    


In the API
Code:

;we will need arg_2 later on
; we can not kill one register for it...

push ebx

... do some work

; oops I need registers to do my work
; save agrument_1 

push ecx
mov ecx, inside_api_counter
... do some work
;what was the value of the arg1 again?
pop ecx
cmp ecx,whatever

;notice that because no stack frame is hard to keep track
;of the function arguments anyway

; can we really assume that we will not need the
;registers used as arguments...?
;will this API run faster with more free registers?

    


To summarize:
----------------------

1)Basically unless the API is trivial you will not benefit from not having a stack frame and not having the parameters thransfers in an very uniform way via stack.

2)You need registers to have faster implementaions of both API and Application work... using such an important resource for statically keeping parameters is really a "stupid" ideea,

If you do that stupid thing then BOTH the APi and the application will suffer and generally become slower at the algorithmical level ... and the algorithmical level does matter a lot more than a few apparently saved cycles on trivial API calls.

3)The stack is the ONLY thing that is guaranteed to be in the cache most of the time... come on after all you have just CALLed an API ...guess what? You did it via stack.

Besides in an secure OS each API should be a switch from ring 3 to ring 0... in this context do you really think that a few cycles saved matter anymore, now really?

4)A real inteligent aproach would have been to have a hardware stack inside CPU (let's say 256 dwords) and some CPU's do have such a thing.

5)What about recursive stuff... can you live without that? You know it NEEDS to save arguments on the stack. What about stack machines? What about VARARGS? Do you think compilers will perform better having to keep track of some extra stuff?

6)Stack has many uses and it helps keeping things uniform in between architectures... some CPUs do have a limited number of registers some have more but ALL do have a STACK.

7)Some APIs might have many parameters and this is normal in an OS. You will need to transfer the rest of arguments on the stack anyway (or with pointers to memory that might NOT be in cache). This will only complicate management of code and arguments... what for?

8)This "solution" assumes that you do a LOT of API calls and no real work in your application because it is trying to save there a few cycles... Very Happy but even so it hinders the performance of the API by doing this.

A win sometimes a little and loose most of the times a lot kind of solution...

ONLY peoples with little knowledge of OS development, API, applications and at a first "from the plane view" could have choosen such a solution...

Besides Linux did that and it is SLOWER overall Razz

Simply put there is an GENERAL very good solution that was an evolution: STACK FRAMES and STDCALL or C CALL etc. Do you feel like we need to return to stone age ?

Remember: FASTCALL is only better inside GFX inner loops or tight controlled enviroments.. otherwise it is a failure and an useless mind complication... use it wisely!
Post 21 Sep 2005, 18:09
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
bogdanontanu: you critized people who prefer fastcall for generalizing, but you was generalizing too, just read yours post with caution on generalizing. you just can't say this is better than this, like you did many times in this post. anyway, you got the point, using register purposed for special operation (counter, base) as data storage _is_ backward compatible stupidity, but i don't see problem in using r12-r15 for example
Post 21 Sep 2005, 19:12
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7725
Location: Kraków, Poland
Tomasz Grysztar
The fastcall was chosen for Win64 architecture mainly because you've got eight more GPRs there, so you don't have to worry so much about free registers as with x86 - which always lacked a bit in this area. Well, it's still far from the 128 registers on IA64, but it changes the situation enough that the arguments from x86 may not apply.

Also both fastcall and C convention have the advantage over stdcall in the area of stack reuse - you can use MOV instructions instead of PUSHes and stack frame restoring; and for the parameters of smaller size the smaller transfers may be used (while PUSH always goes with the fixed size, 64 bits in this case), and you can even carry on some common parameters unchanged between the calls.
Post 21 Sep 2005, 19:18
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
But Bogdan, we consider Win64 OS in this topic, not some general OS, as Tomasz said. AFAIK Win64 depends on EM64T architecture.
According to the link given in my post above, first four integer arguments are passed in RCX, RDX, R8, and R9 registers. Registers RAX, RCX, RDX, R8, R9, R10, and R11 are volatile.
It means that I can use RBX, RBP, RSI, RDI, R12, R13, R14, and R15 as general registers without the need of saving them between API calls and RAX, R10, and R11 as temporary registers. I think that's quite enough for me and I can omit those four fastcall registers.
Post 22 Sep 2005, 13:01
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
but you need ecx for rep prefix. i think using ecx for argument is just drawback from past, it 'd be better to use r12-r15 for args
Post 22 Sep 2005, 14:15
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
You're right, but I use REP rarely, and LOOP/LOOPcc never, so it wouldn't be an issue for me.
Post 22 Sep 2005, 14:45
View user's profile Send private message Visit poster's website Reply with quote
bogdanontanu



Joined: 07 Jan 2004
Posts: 403
Location: Sol. Earth. Europe. Romania. Bucuresti
bogdanontanu
Quote:

But Bogdan, we consider Win64 OS in this topic, not some general OS, as Tomasz said.


Yes, but I would like to consider Win64 as a general purpose OS Wink and like all developers I do not like a separate set of rules for every aspect of every OS out there; reality is enough.

They say: "The nice thing about humman race standards is this: there are so many standards to choose from on every single issue ... "

Quote:

AFAIK Win64 depends on EM64T architecture.
According to the link given in my post above, first four integer arguments are passed in RCX, RDX, R8, and R9 registers. Registers RAX, RCX, RDX, R8, R9, R10, and R11 are volatile.
It means that I can use RBX, RBP, RSI, RDI, R12, R13, R14, and R15 as general registers without the need of saving them between API calls and RAX, R10, and R11 as temporary registers. I think that's quite enough for me and I can omit those four fastcall registers.


Yes, I agree that they choose to do this FASTCALL thing only when the CPU got more registers; as it looks logical. But this is an old ideea and there have been very old RISC like CPU's around with 16, 32 or more registers available all this time... and there were attempts to transfer parameters via registers before nothing very viable as a general rule.

So you have more registers and suddenly you consider that you can waste some of them? I see...

I would really prefer to use all of those registers for better and faster implementation of algorithms.

Well, have you also considered that an API will have to call another API and even deeper this API might also call an API?

In fact your own application procedures might as well have a depth of 8... add another deepth of eight for the API's... where are you going to keep the parameters while you go deeper on your Call graph? of course: on the STACK....

You see... I have been arround to notice the hard transition from passing parameters via registers to the evolution represented by STACK frames and their advantages in uniformity, versatility and overall speed...

Probably for you FASTCALL is a "new interesting thing" ... and your hopes are high... that is OK.

But for the experienced OS developers at Microsoft making such mistakes is non acceptable... at least without making dark fun of it Very Happy

Besides I like the STACK concept and the recursion and the stack machines... and unlimited CALL deepths Wink

Maybe I ask too much...

_________________
"Any intelligent fool can make things bigger,
more complex, and more violent.
It takes a touch of genius -- and a lot of courage --
to move in the opposite direction."
Post 22 Sep 2005, 18:02
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
Well, Bogdan, I have to agree with you. Microsoft's FASTCALL becomes really uneconomical on nested API calls or recursive functions.
In fact, I use FASTCALL in my projects too, but only when the function uses one parameter and one output value (eAX for both values). At the beginning, I used only registers for parameters and it was horrible. I gave up that calling convention soon.
Post 26 Sep 2005, 12:34
View user's profile Send private message Visit poster's website Reply with quote
weiss



Joined: 03 Jan 2006
Posts: 25
weiss
is the fastcall down to the fact push/pop instructions only work in one pipe on cpus that support pipelining thus a mov is much faster?

atleast in my own research for optimised code in speed, eliminating push/pops increased execution of code on all processors that support pipelining.
Post 02 Aug 2006, 09:35
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.