flat assembler
Message board for the users of flat assembler.
Index
> Windows > Mandelbrot Benchmark FPU/SSE2 released Goto page Previous 1, 2, 3 ... 6, 7, 8 ... 18, 19, 20 Next |
Author |
|
Kuemmel 05 Nov 2007, 19:20
Xorpd! wrote: I agree that the Xeon results look suspect -- I would guess that he doesn't have one of the cores enabled on the x64 tests. It seems this is a common problem. If you could get DaveB to fix that somehow and run all 10 or 11 benchmarks again, his system should end up seriously smoking everything we have seen to date... Hm, he claims that everything is okay, also the fact that the KMB V0.53 scaled as normal with his machine indicates this, also that the Cinebench according to him was running normal...hard to say what's wrong...some instruction cache limit issues !? The inner loop of your versions is still too short for that problem, or ? |
|||
05 Nov 2007, 19:20 |
|
Kuemmel 08 Dec 2007, 14:16
First AMD Phenom Results !!!
It looks that AMD can regain the speed regarding SSE2, still Intel takes the lead at the same clock speed and Intel's just can clock more... I updated the table at my webpage and also made one for Xorpd!s version to have an overview on the speed per MHZ and core at the same page below. The E5310 result isn't still confirmed as there was problems with enabled cores but the result fits to the other Core 2 Duo ones. Here are the pure AMD Phenom Results I got: KMB V0.53 MT: AMD Phenom 9500 2500 MHz, 4 Cores: FPU: 635.082 SSE2: 1260.500 Xorpd!s x64 version: AMD Phenom 9500 2500 MHz, 4 Cores: KMB0.57 2T_X1 - 623.935 KMB0.57 2T_X2 - 1142.545 KMB0.57 2T_X3 - 627.709 KMB0.57 2T_X4 - 1541.304 KMB0.57 MT_X1 - 1177.102 KMB0.57 MT_X2 - 2072.169 KMB0.57 MT_X3 - 1181.569 KMB0.57 MT_X4 - 2725.107 AMD Phenom 9500 1666 MHz, 4 Cores: KMB0.57 2T_X1 - 416.235 KMB0.57 2T_X2 - 761.231 KMB0.57 2T_X3 - 418.472 KMB0.57 2T_X4 - 1029.234 KMB0.57 MT_X1 - 789.210 KMB0.57 MT_X2 - 1393.035 KMB0.57 MT_X3 - 799.341 KMB0.57 MT_X4 - 1834.129 Everything is in line, but what is really strange is the results for X3 version, any clue, Xorpd! !? |
|||
08 Dec 2007, 14:16 |
|
FrozenKnight 08 Dec 2007, 21:56
AMD Athlon 64 X2 5400+ 2.8 GHZ some background apps running
FPU - 344.698 SSE2 - 472.987 Binary used from first post downloaded today. formated according to included XLS file AMD Athlon 64 X2 5400+ 2800 2 344.698 61.6 472.987 84.5 1.37 |
|||
08 Dec 2007, 21:56 |
|
f0dder 08 Dec 2007, 23:29
From what I've seen so far, AMD is heading for disaster with their latest chips :/
|
|||
08 Dec 2007, 23:29 |
|
FrozenKnight 09 Dec 2007, 11:18
i should probably remind you that that was in 32 bit mode. on a 64 bit chip. (i don't have a 64 bit version of windows yet)
and Intell may be getting better but they still cost 2 times as much for about the same performance level. |
|||
09 Dec 2007, 11:18 |
|
f0dder 09 Dec 2007, 11:41
Oh really?
Phenom Quad 9600, (2.3GHz) : DKK 1.890 Core 2 Quad Q6600 (2.4GHz) : DKK 1.728 ...and the results I've seen, the recent AMD chips are a bit slower per MHz, and AMD seems to have trouble reaching high enough frequencies to even compete with intel's top line... If they had lower power consumption then that would be something, but it does look pretty bleak for AMD right now. Which is sad, really, I prefer some competition in the CPU market. |
|||
09 Dec 2007, 11:41 |
|
asmfan 09 Dec 2007, 11:55
The thread seems to be trashed, and i also add some) Before AMD fix TLB problem in hardware (not only turning off logic in their firmware) they will only the 2nd on the market.
|
|||
09 Dec 2007, 11:55 |
|
Xorpd! 11 Dec 2007, 21:22
The phenom results are very interesting. There is one obvious difference with the X3 code that I can recall. In this archive I have a test to see whether the problem is with the extra memory moves in that version. old_X3.exe is the original and does 1645.144 million iterations on my pc, whereas test_X3 is modified to use memory moves and only does 1602.798 million iterations per second. The latter version may run at normal rates on the phenom, however, and I would so much like to see the results of these two tests on that processor if possible.
|
|||
11 Dec 2007, 21:22 |
|
Madis731 12 Dec 2007, 08:14
I cannot wait for Agner to update the Penryn/Phenom part of his http://agner.org/optimize Instruction tables...
Does anyone know where does he get his CPUs from - donations or some other way? |
|||
12 Dec 2007, 08:14 |
|
rugxulo 12 Dec 2007, 13:06
f0dder wrote: Oh really? (further reading here): |
|||
12 Dec 2007, 13:06 |
|
f0dder 12 Dec 2007, 13:15
C7 might have low power consumption, but it also has pretty low performance :/ - and from sporadic googling, it sounds like there's all sorts of chipset bugs/deficiencies as well. Too bad, the Padlock (AES/Rijndael hardware) sounds cute.
|
|||
12 Dec 2007, 13:15 |
|
Kuemmel 12 Dec 2007, 19:18
@Xorpd! : I passed the link to the guy with the Phenom machine, hoping to get some results soon !
@Madis731 : I think if you are a 'relevant' member of the overclocking scene or of some online magazines then you'll get engineering samples in a while, also somehow in Asia there's lots of engineering samples on ebay, I read in some overclocker forums...Intel or AMD might even have interest of the cpu's are good to spread them somehow to create some 'hype' in the web... |
|||
12 Dec 2007, 19:18 |
|
Kuemmel 15 Dec 2007, 22:18
Xorpd! wrote: The phenom results are very interesting. There is one obvious difference with the X3 code that I can recall. In this archive I have a test to see whether the problem is with the extra memory moves in that version. old_X3.exe is the original and does 1645.144 million iterations on my pc, whereas test_X3 is modified to use memory moves and only does 1602.798 million iterations per second. The latter version may run at normal rates on the phenom, however, and I would so much like to see the results of these two tests on that processor if possible. Hi Xorpd!, I got the results: Phenom 2497MHz KMB0.57 MT_X3 - 1172.668 KMB0.58 MT_X3 - 2418.201 So it seems you got the point solved with the new version ! Can you go into a little more detail what was the limit of the old one and why probably a Core 2 Duo doesn't care and a Phenom does !!!!? |
|||
15 Dec 2007, 22:18 |
|
Madis731 15 Dec 2007, 23:54
@Kuemmel: I think 'relevant' was in order because I haven't actually clocked a CPU in my life (or anything else in that matter)...
...BUT (there's a big BUT) I am one very eager person to get the newest (espescially fastest) stuff on the market. That's for a few things: - Why do you think optimisations are so important? - Why do you think new technologies are so important (SSEx) - I am watching this thread, aren't I, so when I get a Penryn I will tell you the results |
|||
15 Dec 2007, 23:54 |
|
Xorpd! 16 Dec 2007, 03:52
@Kümmel: For reference, let me post the inner loop:
Code: .iteration_entry1: macro single_step { if TESTME = 1 movaps xmm3, xmm1 ; xmm3: iz | iz+dz mulpd xmm1, xmm8 ; xmm1: 2*iz | 2*(iz+dz) mulpd xmm3, xmm3 ; xmm3: iz^2 | (iz+dz)^2 mulpd xmm1, xmm0 ; xmm1: 2*iz*rz | 2*(iz+dz)*(rz+dz) mulpd xmm0, xmm0 ; xmm0: rz^2 | (rz + dz)^2 movaps [r11], xmm0 ; xmm2: rz^2 | (rz+dz)^2 subpd xmm0, xmm3 ; xmm0: rz^2-iz^2 | (rz+dz)^2-(iz+dz)^2 addpd xmm1, xmm5 ; xmm1: 2*iz*rz+iz0 | 2*(iz+dz)*(rz+dz)+iz0+dz addpd xmm3, [r11] ; xmm2: rz^2+iz^2 | (rz+dz)^2+(iz+dz)^2 cmplepd xmm3, xmm7 ; xmm2 <= 4.0 | 4.0 ? True -> QW = FFFFFFFFFFFFFFFFh else 0000000000000000h addpd xmm0, xmm4 ; xmm0: rz^2-iz^2+rz0| (rz+dz)^2-(iz+dz)^2+rz0+dz psubd xmm6, xmm3 ; Add iteration counts movaps xmm3, xmm10 ; xmm3: iz | iz+dz mulpd xmm10, xmm8 ; xmm10: 2*iz | 2*(iz+dz) mulpd xmm3, xmm3 ; xmm3: iz^2 | (iz+dz)^2 mulpd xmm10, xmm9 ; xmm10: 2*iz*rz | 2*(iz+dz)*(rz+dz) mulpd xmm9, xmm9 ; xmm9: rz^2 | (rz + dz)^2 movaps [r11], xmm9 ; xmm2: rz^2 | (rz+dz)^2 subpd xmm9, xmm3 ; xmm9: rz^2-iz^2 | (rz+dz)^2-(iz+dz)^2 addpd xmm10, xmm5 ; xmm10: 2*iz*rz+iz0 | 2*(iz+dz)*(rz+dz)+iz0+dz addpd xmm3, [r11] ; xmm2: rz^2+iz^2 | (rz+dz)^2+(iz+dz)^2 cmplepd xmm3, xmm7 ; xmm2 <= 4.0 | 4.0 ? True -> QW = FFFFFFFFFFFFFFFFh else 0000000000000000h addpd xmm9, xmm13 ; xmm9: rz^2-iz^2+rz0| (rz+dz)^2-(iz+dz)^2+rz0+dz psubd xmm15, xmm3 ; Add iteration counts movaps xmm3, xmm12 ; xmm3: iz | iz+dz mulpd xmm12, xmm8 ; xmm12: 2*iz | 2*(iz+dz) mulpd xmm3, xmm3 ; xmm3: iz^2 | (iz+dz)^2 mulpd xmm12, xmm11 ; xmm12: 2*iz*rz | 2*(iz+dz)*(rz+dz) mulpd xmm11, xmm11 ; xmm11: rz^2 | (rz + dz)^2 movaps [r11], xmm11 ; xmm2: rz^2 | (rz+dz)^2 subpd xmm11, xmm3 ; xmm11: rz^2-iz^2 | (rz+dz)^2-(iz+dz)^2 addpd xmm12, xmm5 ; xmm12: 2*iz*rz+iz0 | 2*(iz+dz)*(rz+dz)+iz0+dz addpd xmm3, [r11] ; xmm2: rz^2+iz^2 | (rz+dz)^2+(iz+dz)^2 cmplepd xmm3, xmm7 ; xmm2 <= 4.0 | 4.0 ? True -> QW = FFFFFFFFFFFFFFFFh else 0000000000000000h addpd xmm11, [rsp+288] ; xmm11: rz^2-iz^2+rz0| (rz+dz)^2-(iz+dz)^2+rz0+dz psubd xmm14, xmm3 ; Add iteration counts else if TESTME = 2 movaps xmm3, xmm1 ; xmm3: iz | iz+dz mulpd xmm1, xmm8 ; xmm1: 2*iz | 2*(iz+dz) mulpd xmm3, xmm3 ; xmm3: iz^2 | (iz+dz)^2 mulpd xmm1, xmm0 ; xmm1: 2*iz*rz | 2*(iz+dz)*(rz+dz) mulpd xmm0, xmm0 ; xmm0: rz^2 | (rz + dz)^2 movaps xmm2, xmm0 ; xmm2: rz^2 | (rz+dz)^2 subpd xmm0, xmm3 ; xmm0: rz^2-iz^2 | (rz+dz)^2-(iz+dz)^2 addpd xmm1, xmm5 ; xmm1: 2*iz*rz+iz0 | 2*(iz+dz)*(rz+dz)+iz0+dz addpd xmm3, xmm2 ; xmm2: rz^2+iz^2 | (rz+dz)^2+(iz+dz)^2 cmplepd xmm3, xmm7 ; xmm2 <= 4.0 | 4.0 ? True -> QW = FFFFFFFFFFFFFFFFh else 0000000000000000h addpd xmm0, xmm4 ; xmm0: rz^2-iz^2+rz0| (rz+dz)^2-(iz+dz)^2+rz0+dz psubd xmm6, xmm3 ; Add iteration counts movaps xmm3, xmm10 ; xmm3: iz | iz+dz mulpd xmm10, xmm8 ; xmm10: 2*iz | 2*(iz+dz) mulpd xmm3, xmm3 ; xmm3: iz^2 | (iz+dz)^2 mulpd xmm10, xmm9 ; xmm10: 2*iz*rz | 2*(iz+dz)*(rz+dz) mulpd xmm9, xmm9 ; xmm9: rz^2 | (rz + dz)^2 movaps xmm2, xmm9 ; xmm2: rz^2 | (rz+dz)^2 subpd xmm9, xmm3 ; xmm9: rz^2-iz^2 | (rz+dz)^2-(iz+dz)^2 addpd xmm10, xmm5 ; xmm10: 2*iz*rz+iz0 | 2*(iz+dz)*(rz+dz)+iz0+dz addpd xmm3, xmm2 ; xmm2: rz^2+iz^2 | (rz+dz)^2+(iz+dz)^2 cmplepd xmm3, xmm7 ; xmm2 <= 4.0 | 4.0 ? True -> QW = FFFFFFFFFFFFFFFFh else 0000000000000000h addpd xmm9, xmm13 ; xmm9: rz^2-iz^2+rz0| (rz+dz)^2-(iz+dz)^2+rz0+dz psubd xmm15, xmm3 ; Add iteration counts movaps xmm3, xmm12 ; xmm3: iz | iz+dz mulpd xmm12, xmm8 ; xmm12: 2*iz | 2*(iz+dz) mulpd xmm3, xmm3 ; xmm3: iz^2 | (iz+dz)^2 mulpd xmm12, xmm11 ; xmm12: 2*iz*rz | 2*(iz+dz)*(rz+dz) mulpd xmm11, xmm11 ; xmm11: rz^2 | (rz + dz)^2 movaps xmm2, xmm11 ; xmm2: rz^2 | (rz+dz)^2 subpd xmm11, xmm3 ; xmm11: rz^2-iz^2 | (rz+dz)^2-(iz+dz)^2 addpd xmm12, xmm5 ; xmm12: 2*iz*rz+iz0 | 2*(iz+dz)*(rz+dz)+iz0+dz addpd xmm3, xmm2 ; xmm2: rz^2+iz^2 | (rz+dz)^2+(iz+dz)^2 cmplepd xmm3, xmm7 ; xmm2 <= 4.0 | 4.0 ? True -> QW = FFFFFFFFFFFFFFFFh else 0000000000000000h addpd xmm11, [rsp+288] ; xmm11: rz^2-iz^2+rz0| (rz+dz)^2-(iz+dz)^2+rz0+dz psubd xmm14, xmm3 ; Add iteration counts else movaps xmm3, xmm1 ; xmm3: iz | iz+dz mulpd xmm1, xmm8 ; xmm1: 2*iz | 2*(iz+dz) mulpd xmm3, xmm3 ; xmm3: iz^2 | (iz+dz)^2 mulpd xmm1, xmm0 ; xmm1: 2*iz*rz | 2*(iz+dz)*(rz+dz) mulpd xmm0, xmm0 ; xmm0: rz^2 | (rz + dz)^2 movaps [r11], xmm0 ; xmm2: rz^2 | (rz+dz)^2 subpd xmm0, xmm3 ; xmm0: rz^2-iz^2 | (rz+dz)^2-(iz+dz)^2 addpd xmm1, xmm5 ; xmm1: 2*iz*rz+iz0 | 2*(iz+dz)*(rz+dz)+iz0+dz addpd xmm3, [r11] ; xmm2: rz^2+iz^2 | (rz+dz)^2+(iz+dz)^2 cmplepd xmm3, xmm7 ; xmm2 <= 4.0 | 4.0 ? True -> QW = FFFFFFFFFFFFFFFFh else 0000000000000000h addpd xmm0, xmm4 ; xmm0: rz^2-iz^2+rz0| (rz+dz)^2-(iz+dz)^2+rz0+dz psubd xmm6, xmm3 ; Add iteration counts movaps xmm3, xmm10 ; xmm3: iz | iz+dz mulpd xmm10, xmm8 ; xmm10: 2*iz | 2*(iz+dz) mulpd xmm3, xmm3 ; xmm3: iz^2 | (iz+dz)^2 mulpd xmm10, xmm9 ; xmm10: 2*iz*rz | 2*(iz+dz)*(rz+dz) mulpd xmm9, xmm9 ; xmm9: rz^2 | (rz + dz)^2 movaps [r11+16], xmm9 ; xmm2: rz^2 | (rz+dz)^2 subpd xmm9, xmm3 ; xmm9: rz^2-iz^2 | (rz+dz)^2-(iz+dz)^2 addpd xmm10, xmm5 ; xmm10: 2*iz*rz+iz0 | 2*(iz+dz)*(rz+dz)+iz0+dz addpd xmm3, [r11+16] ; xmm2: rz^2+iz^2 | (rz+dz)^2+(iz+dz)^2 cmplepd xmm3, xmm7 ; xmm2 <= 4.0 | 4.0 ? True -> QW = FFFFFFFFFFFFFFFFh else 0000000000000000h addpd xmm9, xmm13 ; xmm9: rz^2-iz^2+rz0| (rz+dz)^2-(iz+dz)^2+rz0+dz psubd xmm15, xmm3 ; Add iteration counts movaps xmm3, xmm12 ; xmm3: iz | iz+dz mulpd xmm12, xmm8 ; xmm12: 2*iz | 2*(iz+dz) mulpd xmm3, xmm3 ; xmm3: iz^2 | (iz+dz)^2 mulpd xmm12, xmm11 ; xmm12: 2*iz*rz | 2*(iz+dz)*(rz+dz) mulpd xmm11, xmm11 ; xmm11: rz^2 | (rz + dz)^2 movaps [r11+32], xmm11 ; xmm2: rz^2 | (rz+dz)^2 subpd xmm11, xmm3 ; xmm11: rz^2-iz^2 | (rz+dz)^2-(iz+dz)^2 addpd xmm12, xmm5 ; xmm12: 2*iz*rz+iz0 | 2*(iz+dz)*(rz+dz)+iz0+dz addpd xmm3, [r11+32] ; xmm2: rz^2+iz^2 | (rz+dz)^2+(iz+dz)^2 cmplepd xmm3, xmm7 ; xmm2 <= 4.0 | 4.0 ? True -> QW = FFFFFFFFFFFFFFFFh else 0000000000000000h addpd xmm11, [rsp+288] ; xmm11: rz^2-iz^2+rz0| (rz+dz)^2-(iz+dz)^2+rz0+dz psubd xmm14, xmm3 ; Add iteration counts end if } repeat 3 single_step end repeat The second possibility, TESTME = 2, is the most obvious choice. (Re z)² has been calculated and we need both (Re z)²-(Im z)² to get the next Re z and also (Re z)²+(Re z)² so that we may test |z|² to see whether we've diverged yet. The first thing you would think of is saving it , that is (Re z)², in another register (xmm2 in the actual code) and calculating the next Re z destroying the current register. Later, we restore the value of (Re z)² from the register we saved it and and use it to compute |z|². The problem with this approach on Core 2 Duo is that movaps (and its buddies movapd and movdqa) when used to move values between two registers can execute in ports 0, 1, or 5. If it issues to port 0, it can take up a slot that could be used by a floating point multiply. If it goes to port 1, it could get in the way of a floating point add. The slotting logic is not smart enough to see this coming and sometimes it does seem to interfere with computational ports. The solution to this problem on Core 2 Duo is to store (Re z)² in memory (at [r11] in the actual code) instead of a register (TESTME = 1). Of course this increases the latency of the save+restore sequence significantly from 3 to IIRC 5 clocks. This isn't a problem because the restored value of (Re z)² will be used in the out of band test for divergence rather than the sequential operation of computing the next value of z. As a consequence, the save and restore operations never get in the way of computational progress and the algorithm runs > 2% faster on Core 2 Duo. This minor optimization wasn't undertaken for the other (i.e. X1, X2, and X4) tests so there wasn't any problem with them on phenom. But it seems that phenom can't use a memory location the way it can a register and the load from the previous loop iteration must complete or at least be well under way before the next store can be issued to it. Considering the results we have gathered I would say that Core 2 Duo, Pentium D Presler, and even Athlon 64 class processors can handle operations to a memory location out of order, but the phenom can only do so when it's a register, not memory. As usual, an optimizer is never happy to leave things the way they are, and even though going back to a register temporary is twice as fast on phenom, it's a little slower on Core 2 Duo. The solution may be to store to a different memory location for each instruction stream (TESTME = 3). This is just as fast on Core 2 Duo as reusing the same memory location. Code: temporary TESTME executable M iter./s [r11] 1 testa_X3.exe 1645.144 xmm2 2 testb_X3.exe 1611.091 [r11+x] 3 testc_X3.exe 1645.144 This ZIP archive contains the two old and the one new version of the code. My hope is that phenom will be happy with testc_X3.exe because it uses different memory locations for different instruction streams. If your phenom owner is as curious as I about performance issues regarding this new processor, or at least has patience with us, you may be able to prevail on him to perform the tests with the new programs. @Madis731: It's not necessary to have a Penryn to test KMB_V0.57_MT: all you need is x64 Windows. I am always looking for results from processors that aren't in my table. |
|||
16 Dec 2007, 03:52 |
|
Madis731 16 Dec 2007, 09:51
Okay, so here are my results:
(EDIT: the final update, with all the tests) I conducted 2 tests on the same CPU type. The first was an HP nc6320 laptop with 1GB RAM and integrated videocard. The other one was Intel BTO laptop with 2GB (slow) RAM and GeForce 7600 videocard. The Intel tests were ran over a terminal connection. Seems that only the CPU is what matters so use the test you want for reference: Code: ;The HP lap with a T7200 (Server 2003 Enterprise x64 Ed.) ;----------------------------------------------------------------- 2T: 609,581 ;This is @2GHz x 2 then, okay 900,487 1260,500 1470,335 MT: 603,964 883,224 1231,824 1429,821 Kümmel: FPU: 250.482 SSE: 547.899 QuickMAN (as instructed): MIters/s: 532.4 Code: ;The Intel lap with a T7200 (Server 2003 Standard x64 Ed.) ;----------------------------------------------------------------- 2T: 609,581 903,099 1255,417 1470,335 MT: 601,630 885,757 1231,824 1429,821 Kümmel: FPU: 251.849 SSE: 550.807 QuickMAN (as instructed): MIters/s: 532.4 There's also a quadcore PC in the pack and it looks like this: Code: ;With a Q6600 (Server 2003 Standard x64 Ed.) 2T: 710,424 ;This is @2.4GHz x 2 ... 1049,178 1463,424 1715,391 MT: 1374,585 ; and this is @2.4GHz x 4 1945,896 2798,593 3129,080 Kümmel: FPU: 591.637 SSE:1281.249 QuickMAN (as instructed): MIters/s: 643.1 Last edited by Madis731 on 16 Dec 2007, 12:18; edited 4 times in total |
|||
16 Dec 2007, 09:51 |
|
Kuemmel 16 Dec 2007, 11:02
Madis731 wrote:
Hi Madis731, ...I think optimisations like seen here are very important because it could make people buy the slowest Core 2 Duo instead of the Extreme Edition, what would be 800 Euro difference, if just the coders would optimise a little more ...SSE is very important, when you look at the difference achieved here in contrast to the FPU-version KMB 0.53, you'll see that the use of FPU becomes totally obsolete if you don't need extended precision. So even if you have a calculation that can't be vectorized you are probably still faster when using SSE. And with SSE4 (Penryn) and SSE5 (AMD, somewhere 2009) things get even better. Some examples that I can directly think of being benefitial for the algorithm here: SSE4: PTEST (like 'TEST' for SSE instructions) SSE5: FMADD (multiply-add, finally !) @Xorpd!: I passed the link to the guy with the Phenom I'm in contact, let's see what he'll get. Thanks for the explanations...really interesting...as I'm more used to code for ARM-Cpu's I wouldn't haven even thought of using memory to store intermediates, as memory access is so slow...but the ARM already got 16 registers to use @EDIT: Got the results: KMB0.58a MT_X3 - 1177.102 KMB0.58b MT_X3 - 2363.138 KMB0.58c MT_X3 - 2310.526 Any more conclusions from that, Xorpd! ? By the way, to get people testing the latest stuff I always try to convince some guys from http://www.xtremesystems.org/forums/ ...crazy people with liquid nitrogen pushing Core 2 Duo beyond 6 GHz. Sometimes quite interesting to read...just total hardware 'optimisations' instead of software like here. |
|||
16 Dec 2007, 11:02 |
|
kohlrak 16 Dec 2007, 11:59
Quote: ...I think optimisations like seen here are very important because it could make people buy the slowest Core 2 Duo instead of the Extreme Edition, what would be 800 Euro difference, if just the coders would optimise a little more What about smoother running for other applications in a multi-tasking environment? Also, what about more enemies in a game. Take away all the bottle necks and you got some software that can up the requirements for itself. If i could ever sort out this rotation issue in opengl, i could probably make a small game that uses this to it's advantage. |
|||
16 Dec 2007, 11:59 |
|
Madis731 16 Dec 2007, 12:26
Kuemmel wrote:
Sorry if I didn't sound rhetorical enough but I didn't expect you to answer these questions Anyway, thank you for doing that and I agree totally! |
|||
16 Dec 2007, 12:26 |
|
Goto page Previous 1, 2, 3 ... 6, 7, 8 ... 18, 19, 20 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.