flat assembler
Message board for the users of flat assembler.

Index > Heap > Architecture bottlenecks?

Author
Thread Post new topic Reply to topic
DustWolf



Joined: 26 Jan 2006
Posts: 373
Location: Ljubljana, Slovenia
DustWolf
Hello,

I've relatively recently gone up to speed with brand new procesor / motherboard architectures and the differences between them. As I learned more, I wanted to know the true mechanisms behind it all and not myths or just numbers.

I found that most people who are into PC hardware don't really understand or care about the stuff that really makes processors fast or slow. It was rather dissapointing... thus I give up on trying to change the world in making people buy the really cool stuff as opposed to the stuff with the blue LEDs on it.

However, I am still very curious as to what are the true bottlenecks and when and with which companies they will be overcome (so that, for my own feel-good I know when and what to buy). Me and a friend were looking at memory bandwidth while I was trying to tell him why AMD feels more responsive than Intel and he pointed out that even with the very high bandwidth it has, the memory bandwidth is still way too slow!

My benchmarks with memtest+ resulted in numbers like... AMD 3800+ core speed 2.5 GHz, L1 cache: 20 GB/s, L2 cache: 5 GB/s, system RAM: 2.5 GB/s. The 20 GB/s running into a processor at 2.5 GHz is 8 bytes per cycle. Take statistics (old, but I think still rather accurate), about 50% of all CPU instructions are memory accesses. An average memory access opcode takes up, say, 6 (+/- 3) bytes for just the instruction, depending on wether the parameters are provided as fixed values, or registers + a typical memory access transfers 1 dword worth of data, so 4 bytes. So, guessingly speaking, even when executing perfectly cached code and data it still takes at least two cycles just to transfer everything you need from and to the CPU. It doesn't even matter if you use REP instructions or not. And consider the speed at which the CPU executes non-cached code, for example in some heavily multi-tasked environment or large-volume dataprocessing, which doesn't cache well (e.g. when editing very-high-resolution images).

I put the numbers togather for a 1.2 GHz Intel Pentium 3 laptop I had to fix the other day... it's benchmarked system memory bandwidth was 533 MB/s (according to memtest+ again). Take it's actual processing speed when processing a large volume of code/data from/to unpredictable memory locations, the number turned out to be 76 MHz. So what's the point of having a 1.2 GHz CPU, when you have situations in which it acts like a 76 MHz oldie? Why do processors evolve into multicore heavy processing, elaborate instructionsets, etc, etc, when the memory architectures they stick to can't by far support what they were offering 5 years ago?

Is there something big I'm missing about this?
Post 26 Dec 2006, 02:56
View user's profile Send private message AIM Address Yahoo Messenger MSN Messenger Reply with quote
donkey7



Joined: 31 Jan 2005
Posts: 127
Location: Poland, Malopolska
donkey7
basically you're right. memory is the main bottleneck. it's why processor designers invented caches.

btw, you've missed something big Wink

first, you forgot about memory delays - when you're accesing everything form main memory. delays are usually about 20 cycles, but at memory speed - so if you have 500 mhz ram then it takes 2500 mhz / 500 mhz * 20 memory cycles = 100 processor cycles, before you can start transfering anything from memory (when you're using rep or something like that the delay is counted only once).

second, processor doesn't access single bytes of memory. instead it prefetches entire cache lines. if cache line has 32 bytes, then processor treats memory as an array of 32 byte records. so if you want to access one non cached variable then processor transfers entire cache line. even worse if it spans those records (ie. lies on boundary between two records) - then processor must prefetch two cache lines.

but, on the other hand, there is a cache with memory controllers! what they do? cache hold copies of most frequently and recently accessed memory locations. additionaly, cache is looking ahead - ie. if you access variable from record x then memory controller is automatically looking for record x+1 so raw linear speed is maximized.

cache is a special type of memory that's much faster and have much shorter delays than main memory, but it's small because it's expensive.

with object oriented code, where objects are allocated randomly on heap, and nothing is clustered (by clustering i mean storing together variables that are used together), even 99 % of processor time can be spend on waiting for memory accesses.

the key secret is optimal use of cache. if you want maximum performance, forget about (independent) objects and think like memory controller ;]
Post 26 Dec 2006, 13:39
View user's profile Send private message Visit poster's website Reply with quote
Octavio



Joined: 21 Jun 2003
Posts: 366
Location: Spain
Octavio
Bloatware is the main bottleneck. A old computer with DOS booted in 10 seconds ,a modern computer with WINDOWS or LINUX needs 1 or 2 minutes.
Yes cpu can process data much quickly than memory chips (if code is optimiced), but manufacturers make what customers want ,and customers want a car that runs at 200km/h even if the speed is limited to 120km/h.Most people still worry about computer power, for me there are more important issues like the hardware compatibilty,open source drivers ,price and battery autonomy (wich mobile computers do not have).
I you want a computer for playing ,better buy a nintendo wii ,else i doubt you
need a powerful computer.
Post 26 Dec 2006, 15:12
View user's profile Send private message Visit poster's website Reply with quote
DustWolf



Joined: 26 Jan 2006
Posts: 373
Location: Ljubljana, Slovenia
DustWolf
donkey7 wrote:
with object oriented code, where objects are allocated randomly on heap, and nothing is clustered (by clustering i mean storing together variables that are used together), even 99 % of processor time can be spend on waiting for memory accesses.

the key secret is optimal use of cache. if you want maximum performance, forget about (independent) objects and think like memory controller ;]


Yes, caching is what gave me my problem. But realistically, it wasn't the object orientated programming that made me loose my temper with my computer most of the time. Last time, I tried scanning a full page at 32bit colors at 1200 dpi and the moment I used a processing tool on the image, the computer locked up at CPU usage 0% to 1% for 30 minutes. A clasic example of memory bottleneck.

The truth is, whenever real, usable work is being done, the data the computer is dealing with becomes truely random and at this point caching ahead via any smart mechanism just doesn't work well.

As for the discussion about actual memory speed, it got me thinking what is the truely correct way to benchmark memory bandwidth and wether or not whatever memtest+ uses produces real numbers or not. Since the whole point of providing benchmarked data was that there are no extra tricks to consider since the number you get IS the ACTUAL memory bandwidth (for example the L1 cache speed example I provided.. how good caching works is totally irrelevant if the L1 cache itself is too slow for the CPU... instructions take up bytes themselves).
Post 26 Dec 2006, 18:03
View user's profile Send private message AIM Address Yahoo Messenger MSN Messenger Reply with quote
DustWolf



Joined: 26 Jan 2006
Posts: 373
Location: Ljubljana, Slovenia
DustWolf
Octavio wrote:
Bloatware is the main bottleneck. A old computer with DOS booted in 10 seconds ,a modern computer with WINDOWS or LINUX needs 1 or 2 minutes.
Yes cpu can process data much quickly than memory chips (if code is optimiced), but manufacturers make what customers want ,and customers want a car that runs at 200km/h even if the speed is limited to 120km/h.Most people still worry about computer power, for me there are more important issues like the hardware compatibilty,open source drivers ,price and battery autonomy (wich mobile computers do not have).
I you want a computer for playing ,better buy a nintendo wii ,else i doubt you
need a powerful computer.


You can have a modern computer boot in an instant if only you make it hybernate to SRAM instead of using the shutdown & bootup procedure.

I think people who buy hardware like to think that they are actually buying the best there is. Now if you can get a car with an engine that accelerates superfast, but can't get tires that could transfer that torque into motion, then you can't do anything about it even if you are the Holy Costumer who pays for it all. It's true most people still go for the big numbers, but as I said, no point in arguing with people who go for the blue LEDs.

I understand that most people use sheer crunching power in order to suit the appetities of new games. I have built a few systems for people with those very intentions and sticking to a balanced system as opposed to big numbers and fameous names made very good, cost efficient, gaming machines. However for my personal usage, I do not ever play new games (is it just me or are the modern game trends falling into the abyss of total lack of immagination?). I do however run very very many independently rather undemanding applications and services to suit my varied, just as multi-task-styled interests... it is this that eventually forced me to upgrade to XP from 98SE, to upgrade my hardware and to consider upgrading it again... my computer just can't keep up with me I do too many things at once. Wink
Post 26 Dec 2006, 18:24
View user's profile Send private message AIM Address Yahoo Messenger MSN Messenger Reply with quote
kohlrak



Joined: 21 Jul 2006
Posts: 1421
Location: Uncle Sam's Pad
kohlrak
I didn't read everything here, but i noticed the memory thing too. I could overclock my processor all i wanted and it still won't make things faster, it's the memory that counts. The assembly tutorial i'm reading always makes a big point about keeping your instructions small so it can quickly take them from memory, not so your processor can digest them quickly. Perhaps if you could clock your memory to be as fast as your processor without melting everything, you could eliminate most of the need for the registers. Check your over clocking software, the memory is always clocked less than the processor itself.

Quote:
my computer just can't keep up with me I do too many things at once.


Halo pc and a few other pc games i have are pretty creative, but i must agree that most aren't. But i agree with you on this last line more than anything else, and i feel for you. I have resource hungry applications like trillian and such, and i sometimes have them running along with my PC games like Halo pc, and my pc just cannot handle it. I have little ram, so i end up having alot of paging going on, and i'm sadly getting used to it as the school computers are more developed than my workstation.
Post 27 Dec 2006, 06:17
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.