flat assembler
Message board for the users of flat assembler.

Index > Windows > Code that (h)eats CPU more than other code doing 'the same'

Author
Thread Post new topic Reply to topic
psychophanta



Joined: 12 Oct 2004
Posts: 2
psychophanta
I have a very curious quest which i have decided to write and show here.

I have tried several programs (in MS Windows) in 3 different notebooks PCs (all of them above pentium III epoch, this means, all of them with a fan which is activated depending of cpu and mainboard heat detected).

NOTE: I am talking about 2D or 3D graphics, movement, under DirectX, opengl, or whatever it be.

For example, many hardware platforms emulators using windowed or full screen (MAME (with most roms being emulated), ZSNES for windows, BlueMSX, VMWARE (with MSDOS 6.22 as virtual machine), MagicENGINE (all versiones), etc.) don't seem to make cpu fan to be acelerated.
VirtualPC not tested.

But others like NLMSX, ParaMSX, etc. make cpu fan to be accelerated.

NOTE: all tests performed in a 60 m^2 room at 25 degrees, about 50% humidity.

Most compiled programs (downloaded from this forum or from sourceforge.net, etc., etc.), when using windowed or full screen always make cpu fan to be accelerated.
It seems that there are no way to to say to the final executeable that DO NOT work whenever there are nothing to do...

I've noticed that ALL the time used when the programs are WAITING FOR VSYNC to actually swap the screen buffers seems to be wasting CPU resources, what is innecessary.

I've been comparing, and the winners are mpeg players, which only use less than 10% CPU resource when playing video at full screen at 60Hz. Hardware emulators like blueMSX, ZSNES, MAME/MESS32, and some others, do it well, consuming only from 10% to 40% of CPU (I am talking about a machine I686 at 1200Mhz), but not perfect.

When using the wait functionnality of DirectX, the CPU time is 100% and the program locks on this command. For sure there is some other (clever) way to do the same while keeping a perfect synchronization, but I didn't have found it for now.
Some ways to do it consist about create multimedia timers, and use 2 or more threads. But it is not convincing.

This is an answer from Daniel Vik (main author of blueMSX emulator):
Quote:
The reason why some DirectX apps use 100% cpu is actually not because of
DirectX itself. In normal windows apps, the message loop looks something
like:
while (GetMessage(&msx, NULL, 0, 0) {
TranslateMessage(&msg);
DispatchMessage(&msg);
}

With this method you will not get the good timing required for DirectX
apps to run smoothly.
A common practice in DirectX apps is to busy wait in the main message loop
in order to get more accurate timing. This is recommended in most DirectX
getting started books and for a game that runs in fullscreen it is ok to
use 100% cpu since no other apps needs to get much response. They do it
like this:

for ( ; ; ) {
if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) {
if (msg.message == WM_QUIT) break;
TranslateMessage(&msg);
DispatchMessage(&msg);
}
time = getTime();
if (time >= frameTime) {
drawFrame();
}
}

The getTime() method (has to be implemented) uses the high performance
counters to get a high resolution time stamp. If it is time to draw a
frame it does it otherwise, the loop continues. The PeekMessage function
returns immediately if no windows message has arrived so most of the cpu
time is spent spinning in this loop.

In blueMSX I changed this loop to not busy wait using PeekMessage. To get
the accurate timing I have a 1 ms timer that sets an event which breaks
the blocking message call MsgWaitForMultipleObjects and the loop is
something like:

while (!doExit) {
DWORD rv = MsgWaitForMultipleObjects(1, &Prt.ddrawEvent, FALSE,
INFINITE, QS_ALLINPUT);
while (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) {
if (msg.message == WM_QUIT) {
doExit = 1;
break;
}
TranslateMessage(&msg);
DispatchMessage(&msg);
}
if (rv == WAIT_OBJECT_0) { // The 1ms timer expired...
time = getTime();
if (time >= frameTime) {
drawFrame();
}
}
}

So what happens is that the Message receive method is intercepted every 1
ms and then I do the checks if the DirectX frame needs to be redrawn. This
adds only a little overhead to the common message loop used in regular
windows apps but i gets pretty much the same good accuracy as the busy
wait one.


On top of this I use two threads. One that does the directx drawing and
one that does all the emulation. This actually gives some performance
gains since as you said, the flip and other directx commands take some
time. But the time spent in directx waits are not that long though so
running everything in one thread also works ok.

I hope this explanation was not too confusing

And he added:
Quote:
I think it would be possible to do an even better job. I think there are
some waiting in the DirectX calls that probably could be avoided by using
the No Delay option. That requires some more knowledge about DirectX
though. In blueMSX at least there are some commands that are issued after
each other so only the last one (which is the flip) can be done with the
No Delay option. I'm not sure how this can be made more effective.


My opinion is that without any kind of doubt the best and perfect way to do this should be to patch vsync display Interrupt Service Routine in Windows.

Is it possible to patch vsync display Interrupt Service Routine in Windows?
I mean; if the Default ISR for Vsync in Windows is:


Code:
Windows_VSYNC_ISR: code a 
                   code b 
                   ...code c... 
                   RETurn from ISR     



then patch it and set up as:


Code:
Windows_VSYNC_ISR: FLIP the Screen Buffers 
                   CALL [our drawing code] 
                   code a 
                   code b 
                   ...code c... 
                   RETurn from ISR     


Anyone knows something about?

Thanx Smile
Post 12 Oct 2004, 20:15
View user's profile Send private message ICQ Number Reply with quote
comrade



Joined: 16 Jun 2003
Posts: 1137
Location: Russian Federation
comrade
Windows code is available...
Post 12 Oct 2004, 21:54
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Forget about patching the ISR unless you're doing _very_ specialized stuff - people wouldn't want to install a driver to be able to run a game or multimedia app. Besides, do all video cards support the vsync irq?
Post 13 Oct 2004, 10:28
View user's profile Send private message Visit poster's website Reply with quote
psychophanta



Joined: 12 Oct 2004
Posts: 2
psychophanta
Quote:

Forget about patching the ISR unless you're doing _very_ specialized stuff - people wouldn't want to install a driver to be able to run a game or multimedia app. Besides, do all video cards support the vsync irq?

Yes, you are right, but it should be the best way to do.

I have located a good text about this matter: http://www.compuphase.com/vretrace.htm

Thanks for answer.
Post 13 Oct 2004, 20:26
View user's profile Send private message ICQ Number Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1171
Location: Overflow
Matrix
Hy, first of all, i'd like to recommend using FPU instructions in parallel with some complex CPU opcodes.

then you can think on what instructions heat more the processor,
of course it depends on architecture, an AMD may heat more on other operations than an Intel processor.
Post 17 Oct 2004, 03:28
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.