flat assembler
Message board for the users of flat assembler.
Index
> Main > Does paging slow down memory access? Goto page 1, 2 Next |
Author |
|
baldr 24 Oct 2010, 15:47
vid,
For the first access, it's probable. When PDE/PTE are cached in TLB, I doubt that significant difference can be detected. Lots of memory accesses should regard cache architecture, not paging. |
|||
24 Oct 2010, 15:47 |
|
edfed 24 Oct 2010, 16:02
i think paging induce speed loss, but gives a bettrer memory managment.
|
|||
24 Oct 2010, 16:02 |
|
baldr 24 Oct 2010, 16:14
edfed,
Better memory management is of no value regarding the original question. There is fine article "What every programmer should know about memory", available as .PDF — I recommend it. |
|||
24 Oct 2010, 16:14 |
|
vid 24 Oct 2010, 16:31
I'll take a look at it. So, do you think that for very random memory access, doing something like this might have a merit?
|
|||
24 Oct 2010, 16:31 |
|
edfed 24 Oct 2010, 17:01
for very random accesses, the better is to do "non paged/ flat linear segment" model, and maybe, in flat real mode or unreal.
all selectors sets to base = 0, limit = 4G (or any maximal value, IA32e is 64GB, X86-64 is 1Tbyte, phisical ram = XXXX MB). paging will always slower the memory accesses because of access to translation buffer. even if it only slows down by one cycle for 3000 accesses. |
|||
24 Oct 2010, 17:01 |
|
vid 24 Oct 2010, 17:32
edfed: You think that, or you know that?
|
|||
24 Oct 2010, 17:32 |
|
ouadji 24 Oct 2010, 18:31
paging is hardware and does not slow down memory access |
|||
24 Oct 2010, 18:31 |
|
Tyler 24 Oct 2010, 18:50
ouadji, how could it not? It requires extra rights checking before every memory access, and in cases where the page isn't in the TBL, it requires a memory access before the memory access in question. vid specifically mentioned that he's interested in how performance degrades in relation to the randomness of memory accesses. The more random, the more pages will need to be accessed, the more pages need to be accessed, the more TBL misses, the more TBL misses, the more memory accesses... etc.
|
|||
24 Oct 2010, 18:50 |
|
baldr 24 Oct 2010, 18:56
ouadji,
If it's so, why do we need invlpg? |
|||
24 Oct 2010, 18:56 |
|
edfed 24 Oct 2010, 19:38
vid wrote: edfed: You think that, or you know that? i think that because i've read approximattely 50 times the manual about paging and segmentation (the only part i've printed). everything in this doc (PIV system programming manual) tells that the faster is the flat segmented model. paging uses extra circuitry inside the CPU that require caching and extra instructions to manage. the interrupts generated by paging are those how will take the longer time to execute, inducing many latencies. and it is an evidence. win98 always craches because of paging (page fault). and sometimes about segmentation (out of memory) then, to have the faster, you should avoid sources of problems. flat linear segmentation and no paging. |
|||
24 Oct 2010, 19:38 |
|
f0dder 24 Oct 2010, 19:42
edfed: I think you're reading too much into a few things, and misinterpreting others... and ignoring yet other factors.
|
|||
24 Oct 2010, 19:42 |
|
edfed 24 Oct 2010, 19:51
other factors are:
the os. multitasking irq ram latency fsb speed CPU model but globaly, it is the same for every models. if the goal is to do real random accesses, even data cache have to be avoided because it will induce extra latency. i presume vid wants to do a very specific program, something bootable and stand alone, connected to a network. the only things needed for that will be: PIT to comute with time keyboard to control the local machine network driver to communicate the programm itself to compute some text mode console to show status |
|||
24 Oct 2010, 19:51 |
|
ouadji 24 Oct 2010, 20:32
More clock cycles to check the additional access rights ? yes, indeed, i agree. (I was not talking if the page isn't in the TBL ... In this case, yes, it's obvious) |
|||
24 Oct 2010, 20:32 |
|
LocoDelAssembly 24 Oct 2010, 21:02
Quote:
|
|||
24 Oct 2010, 21:02 |
|
Tyler 24 Oct 2010, 21:07
Are page tables permanently kept in cache? Or is that up to the OS? If it's possible to do that, it would be a good "investment" if you plan to make use of many pages in short amounts of time. It may limit even the worst case to negligible effects.
|
|||
24 Oct 2010, 21:07 |
|
ouadji 24 Oct 2010, 21:28
Quote:
Last edited by ouadji on 24 Oct 2010, 21:29; edited 1 time in total |
|||
24 Oct 2010, 21:28 |
|
f0dder 24 Oct 2010, 21:29
Tyler: CPU caches pagetable lookups in TLBs, otherwise everything would crawl even on our multi-gigahertz machines . You don't really have control over these, except the invlpg instruction, which you call when you've modified a pagetable entry (it invalidated the cache, if any, for that entry).
Different CPUs have different-size caches, and different caching strategies, and various other implementation differences... it's really hard saying anything about paging speed impact generally, especially if you're not limiting yourself to a single CPU. Given that one shouldn't be modifying page permission/mapping all the time, especially not in speed-critical code, the question should be split into two parts: 1) how much is "normal" code affected simply by running with or without paging (not comparing bare-metal to full-multitasking OS, but bare-metal OS with or without paging enabled). "Normal" code defined as something that does random-access work within a reasonable (aka TLB-friendly) working set, or does linear work over a large working set (letting prefetcher and prediction units do their work). 2) how much is "crazy" code affected - something that does random access over a large working set. This could further be split into two items: 2.A) really-random-but-realworld code where you're dealing with huge (non-TLB friendly) working sets, but probably do process several "clustered" items before working at data in a completely difference place. 2.B) really crazy code that goes out of it's way to attempt to not reusing TLB entries... I wonder if any normal code causes this kind of access pattern But of course in order to say anything about the stuff above, you'd need a little "kernel" capable of running with or without paging, a bunch of different tests that don't has any OS dependencies, and a wide range of hardware to test on. This kind of synthetic testing of course totally ignores a pretty important factor: multithreading, especially with threads from different processes (since those have different page tables). I don't know how interesting testing the effect of paging-versus-not for this is interesting, though. I definitely wouldn't want a general-purpose multitasking OS without page protection, and for a specialized bare-metal OS (if you want multiple running processes, and process separation) you probably want tight control of your threads, and won't be re-scheduling your compute-intensive threads to switch between cores all the time. |
|||
24 Oct 2010, 21:29 |
|
Tyler 24 Oct 2010, 21:53
I meant the memory caches. Even the L1 is much bigger than the TBL(I presume). If possible to control the memory caches, you could force the paging structures to stay resident, and improve the best and average time of TLB misses.
|
|||
24 Oct 2010, 21:53 |
|
LocoDelAssembly 24 Oct 2010, 22:06
I can volunteer (but not implement ) for testing this on my PCs. However a little warning in case no one considered this: 64-bit mode forcefully needs paging. Still, the test could include testing the speed with different page sizes and test with 1Gbyte page size on processors supporting it (mine can't), which would probably work almost as good as with no paging (not sure of this, maybe the TLBs don't handle 1 GByte pages efficiently enough).
|
|||
24 Oct 2010, 22:06 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.