I have a very curious quest which i have decided to write and show here.
I have tried several programs (in MS Windows) in 3 different notebooks PCs (all of them above pentium III epoch, this means, all of them with a fan which is activated depending of cpu and mainboard heat detected).
NOTE: I am talking about 2D or 3D graphics, movement, under DirectX, opengl, or whatever it be.
For example, many hardware platforms emulators using windowed or full screen (MAME (with most roms being emulated), ZSNES for windows, BlueMSX, VMWARE (with MSDOS 6.22 as virtual machine), MagicENGINE (all versiones), etc.) don't seem to make cpu fan to be acelerated.
VirtualPC not tested.
But others like NLMSX, ParaMSX, etc. make cpu fan to be accelerated.
NOTE: all tests performed in a 60 m^2 room at 25 degrees, about 50% humidity.
Most compiled programs (downloaded from this forum or from sourceforge.net, etc., etc.), when using windowed or full screen always make cpu fan to be accelerated.
It seems that there are no way to to say to the final executeable that DO NOT work whenever there are nothing to do...
I've noticed that ALL the time used when the programs are WAITING FOR VSYNC to actually swap the screen buffers seems to be wasting CPU resources, what is innecessary.
I've been comparing, and the winners are mpeg players, which only use less than 10% CPU resource when playing video at full screen at 60Hz. Hardware emulators like blueMSX, ZSNES, MAME/MESS32, and some others, do it well, consuming only from 10% to 40% of CPU (I am talking about a machine I686 at 1200Mhz), but not perfect.
When using the wait functionnality of DirectX, the CPU time is 100% and the program locks on this command. For sure there is some other (clever) way to do the same while keeping a perfect synchronization, but I didn't have found it for now.
Some ways to do it consist about create multimedia timers, and use 2 or more threads. But it is not convincing.
This is an answer from Daniel Vik (main author of blueMSX emulator):
The reason why some DirectX apps use 100% cpu is actually not because of
DirectX itself. In normal windows apps, the message loop looks something
like:
while (GetMessage(&msx, NULL, 0, 0) {
TranslateMessage(&msg);
DispatchMessage(&msg);
}
With this method you will not get the good timing required for DirectX
apps to run smoothly.
A common practice in DirectX apps is to busy wait in the main message loop
in order to get more accurate timing. This is recommended in most DirectX
getting started books and for a game that runs in fullscreen it is ok to
use 100% cpu since no other apps needs to get much response. They do it
like this:
for ( ; ; ) {
if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) {
if (msg.message == WM_QUIT) break;
TranslateMessage(&msg);
DispatchMessage(&msg);
}
time = getTime();
if (time >= frameTime) {
drawFrame();
}
}
The getTime() method (has to be implemented) uses the high performance
counters to get a high resolution time stamp. If it is time to draw a
frame it does it otherwise, the loop continues. The PeekMessage function
returns immediately if no windows message has arrived so most of the cpu
time is spent spinning in this loop.
In blueMSX I changed this loop to not busy wait using PeekMessage. To get
the accurate timing I have a 1 ms timer that sets an event which breaks
the blocking message call MsgWaitForMultipleObjects and the loop is
something like:
while (!doExit) {
DWORD rv = MsgWaitForMultipleObjects(1, &Prt.ddrawEvent, FALSE,
INFINITE, QS_ALLINPUT);
while (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) {
if (msg.message == WM_QUIT) {
doExit = 1;
break;
}
TranslateMessage(&msg);
DispatchMessage(&msg);
}
if (rv == WAIT_OBJECT_0) { // The 1ms timer expired...
time = getTime();
if (time >= frameTime) {
drawFrame();
}
}
}
So what happens is that the Message receive method is intercepted every 1
ms and then I do the checks if the DirectX frame needs to be redrawn. This
adds only a little overhead to the common message loop used in regular
windows apps but i gets pretty much the same good accuracy as the busy
wait one.
On top of this I use two threads. One that does the directx drawing and
one that does all the emulation. This actually gives some performance
gains since as you said, the flip and other directx commands take some
time. But the time spent in directx waits are not that long though so
running everything in one thread also works ok.
I hope this explanation was not too confusing
And he added:
I think it would be possible to do an even better job. I think there are
some waiting in the DirectX calls that probably could be avoided by using
the No Delay option. That requires some more knowledge about DirectX
though. In blueMSX at least there are some commands that are issued after
each other so only the last one (which is the flip) can be done with the
No Delay option. I'm not sure how this can be made more effective.
My opinion is that without any kind of doubt the best and perfect way to do this should be to patch vsync display Interrupt Service Routine in Windows.
Is it possible to patch vsync display Interrupt Service Routine in Windows?
I mean; if the Default ISR for Vsync in Windows is:
Windows_VSYNC_ISR: code a
code b
...code c...
RETurn from ISR
then patch it and set up as:
Windows_VSYNC_ISR: FLIP the Screen Buffers
CALL [our drawing code]
code a
code b
...code c...
RETurn from ISR
Anyone knows something about?
Thanx
