flat assembler
Message board for the users of flat assembler.

Index > Main > Does paging slow down memory access?

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 24 Oct 2010, 15:41
I wonder - is non-paged memory access faster than with paging? For example consider machines in cluster performing highly specialized computation with lot of memory accesses. Would you gain something by writing your soft to work in non-paged environment?
Post 24 Oct 2010, 15:41
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 24 Oct 2010, 15:47
vid,

For the first access, it's probable. When PDE/PTE are cached in TLB, I doubt that significant difference can be detected.

Lots of memory accesses should regard cache architecture, not paging.
Post 24 Oct 2010, 15:47
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4334
Location: Now
edfed 24 Oct 2010, 16:02
i think paging induce speed loss, but gives a bettrer memory managment.
Post 24 Oct 2010, 16:02
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 24 Oct 2010, 16:14
edfed,

Better memory management is of no value regarding the original question.

There is fine article "What every programmer should know about memory", available as .PDF — I recommend it.
Post 24 Oct 2010, 16:14
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 24 Oct 2010, 16:31
I'll take a look at it. So, do you think that for very random memory access, doing something like this might have a merit?
Post 24 Oct 2010, 16:31
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4334
Location: Now
edfed 24 Oct 2010, 17:01
for very random accesses, the better is to do "non paged/ flat linear segment" model, and maybe, in flat real mode or unreal.

all selectors sets to base = 0, limit = 4G (or any maximal value, IA32e is 64GB, X86-64 is 1Tbyte, phisical ram = XXXX MB).

paging will always slower the memory accesses because of access to translation buffer. even if it only slows down by one cycle for 3000 accesses.
Post 24 Oct 2010, 17:01
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 24 Oct 2010, 17:32
edfed: You think that, or you know that?
Post 24 Oct 2010, 17:32
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
ouadji



Joined: 24 Dec 2008
Posts: 1081
Location: Belgium
ouadji 24 Oct 2010, 18:31

paging is hardware and does not slow down memory access

_________________
I am not young enough to know everything (Oscar Wilde)- Image
Post 24 Oct 2010, 18:31
View user's profile Send private message Send e-mail Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler 24 Oct 2010, 18:50
ouadji, how could it not? It requires extra rights checking before every memory access, and in cases where the page isn't in the TBL, it requires a memory access before the memory access in question. vid specifically mentioned that he's interested in how performance degrades in relation to the randomness of memory accesses. The more random, the more pages will need to be accessed, the more pages need to be accessed, the more TBL misses, the more TBL misses, the more memory accesses... etc.
Post 24 Oct 2010, 18:50
View user's profile Send private message Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 24 Oct 2010, 18:56
ouadji,

If it's so, why do we need invlpg? Wink
Post 24 Oct 2010, 18:56
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4334
Location: Now
edfed 24 Oct 2010, 19:38
vid wrote:
edfed: You think that, or you know that?


i think that because i've read approximattely 50 times the manual about paging and segmentation (the only part i've printed).

everything in this doc (PIV system programming manual) tells that the faster is the flat segmented model.
Image

paging uses extra circuitry inside the CPU that require caching and extra instructions to manage.

the interrupts generated by paging are those how will take the longer time to execute, inducing many latencies.

and it is an evidence.
win98 always craches because of paging (page fault). and sometimes about segmentation (out of memory)

then, to have the faster, you should avoid sources of problems.

flat linear segmentation and no paging.
Post 24 Oct 2010, 19:38
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 24 Oct 2010, 19:42
edfed: I think you're reading too much into a few things, and misinterpreting others... and ignoring yet other factors.
Post 24 Oct 2010, 19:42
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4334
Location: Now
edfed 24 Oct 2010, 19:51
other factors are:

the os.
multitasking
irq
ram latency
fsb speed
CPU model

but globaly, it is the same for every models.

if the goal is to do real random accesses, even data cache have to be avoided because it will induce extra latency.

i presume vid wants to do a very specific program, something bootable and stand alone, connected to a network.

the only things needed for that will be:

PIT to comute with time
keyboard to control the local machine
network driver to communicate
the programm itself to compute
some text mode console to show status
Post 24 Oct 2010, 19:51
View user's profile Send private message Visit poster's website Reply with quote
ouadji



Joined: 24 Dec 2008
Posts: 1081
Location: Belgium
ouadji 24 Oct 2010, 20:32

More clock cycles to check the additional access rights ?
yes, indeed, i agree.

(I was not talking if the page isn't in the TBL ... In this case, yes, it's obvious)

_________________
I am not young enough to know everything (Oscar Wilde)- Image
Post 24 Oct 2010, 20:32
View user's profile Send private message Send e-mail Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 24 Oct 2010, 21:02
Quote:

More clock cycles to check the additional access rights ?
yes, indeed, i agree.
Well, is this still true nowadays? Maybe the latency is hidden by some other task(s) that must always be made (in parallel)?
Post 24 Oct 2010, 21:02
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler 24 Oct 2010, 21:07
Are page tables permanently kept in cache? Or is that up to the OS? If it's possible to do that, it would be a good "investment" if you plan to make use of many pages in short amounts of time. It may limit even the worst case to negligible effects.
Post 24 Oct 2010, 21:07
View user's profile Send private message Reply with quote
ouadji



Joined: 24 Dec 2008
Posts: 1081
Location: Belgium
ouadji 24 Oct 2010, 21:28
Quote:

Well, is this still true nowadays?
Maybe the latency is hidden by some other task(s) that must always be made (in parallel)?
200% agree too.

_________________
I am not young enough to know everything (Oscar Wilde)- Image


Last edited by ouadji on 24 Oct 2010, 21:29; edited 1 time in total
Post 24 Oct 2010, 21:28
View user's profile Send private message Send e-mail Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 24 Oct 2010, 21:29
Tyler: CPU caches pagetable lookups in TLBs, otherwise everything would crawl even on our multi-gigahertz machines Smile. You don't really have control over these, except the invlpg instruction, which you call when you've modified a pagetable entry (it invalidated the cache, if any, for that entry).

Different CPUs have different-size caches, and different caching strategies, and various other implementation differences... it's really hard saying anything about paging speed impact generally, especially if you're not limiting yourself to a single CPU.

Given that one shouldn't be modifying page permission/mapping all the time, especially not in speed-critical code, the question should be split into two parts:

1) how much is "normal" code affected simply by running with or without paging (not comparing bare-metal to full-multitasking OS, but bare-metal OS with or without paging enabled). "Normal" code defined as something that does random-access work within a reasonable (aka TLB-friendly) working set, or does linear work over a large working set (letting prefetcher and prediction units do their work).

2) how much is "crazy" code affected - something that does random access over a large working set. This could further be split into two items:
2.A) really-random-but-realworld code where you're dealing with huge (non-TLB friendly) working sets, but probably do process several "clustered" items before working at data in a completely difference place.
2.B) really crazy code that goes out of it's way to attempt to not reusing TLB entries... I wonder if any normal code causes this kind of access pattern Smile

But of course in order to say anything about the stuff above, you'd need a little "kernel" capable of running with or without paging, a bunch of different tests that don't has any OS dependencies, and a wide range of hardware to test on.

This kind of synthetic testing of course totally ignores a pretty important factor: multithreading, especially with threads from different processes (since those have different page tables). I don't know how interesting testing the effect of paging-versus-not for this is interesting, though. I definitely wouldn't want a general-purpose multitasking OS without page protection, and for a specialized bare-metal OS (if you want multiple running processes, and process separation) you probably want tight control of your threads, and won't be re-scheduling your compute-intensive threads to switch between cores all the time.
Post 24 Oct 2010, 21:29
View user's profile Send private message Visit poster's website Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler 24 Oct 2010, 21:53
I meant the memory caches. Even the L1 is much bigger than the TBL(I presume). If possible to control the memory caches, you could force the paging structures to stay resident, and improve the best and average time of TLB misses.
Post 24 Oct 2010, 21:53
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 24 Oct 2010, 22:06
I can volunteer (but not implement Razz) for testing this on my PCs. However a little warning in case no one considered this: 64-bit mode forcefully needs paging. Still, the test could include testing the speed with different page sizes and test with 1Gbyte page size on processors supporting it (mine can't), which would probably work almost as good as with no paging (not sure of this, maybe the TLBs don't handle 1 GByte pages efficiently enough).
Post 24 Oct 2010, 22:06
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.