flat assembler
Message board for the users of flat assembler.

Index > OS Construction > Flush sticky global page

Author
Thread Post new topic Reply to topic
MaoKo



Joined: 07 May 2019
Posts: 98
Location: Paris/French
MaoKo
Hello. To flush global page I wrote this code:
Code:
_flush_tlb:
 ; preserves: ebx, edi, esi, ebp
 ; note: this function clear all the TLB even PG page entry except the page associated with the currently executed code
    test byte [_cpuid.mtrr], 1H
    jz _flush_tlb_mannual
    mov ecx, _MTRR_CAP_MSR ; XXX use of global variable would be better here
    rdmsr
    test eax, _MTRR_CAP_FIX
    jz _flush_tlb_mannual
    mov ecx, _MTRR_DEF_TYPE
    rdmsr
    test eax, _MTRR_DEF_TYPE_FE
    jnz _flush_tlb_mtrr
    or eax, _MTRR_DEF_TYPE_FE
    wrmsr
_flush_tlb_mtrr:
    mov ecx, _MTRR_FIX_64K_00000
    rdmsr
    wrmsr
    ret
_flush_tlb_mannual:
    mov eax, cr3
    mov cr3, eax
    mov ecx, _KERNEL_VIRTUAL_COUNT
    assert (_KERNEL_VIRTUAL_COUNT)
    mov eax, _KERNEL_VIRTUAL
_flush_tlb_mannual_loop:
    invlpg [eax]
    add eax, _PAGE_FRAME_SIZE
    loop _flush_tlb_mannual_loop
    ret
    

I want to flush global page principaly for benchmark stuff.
I use the 1th generation paging (2lvl). I don't known if invlpg in a loop is efficient.
Does anyone known another way to accomplish this?
Of course, I can disable PGE but it's not my goal here.
Post 08 Feb 2021, 03:03
View user's profile Send private message Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 479
Location: Czech republic, Slovak republic
Feryno
most OS-es flush all global pages by toggling the PGE bit, it is very fast, especially if the OS uses a lot of global pages
if you have only a few global pages then invlpg in a loop is fast too
you can increase any loop performance by aligning the beginning of the loop at 4 (32 bit OS) or at 16 (64 bit OS)
Code:
align 4 ; I assume you are using 32 bit OS as you reference virtual memory by 32 bit register EAX
_flush_tlb_mannual_loop:
invlpg [eax]
add eax, _PAGE_FRAME_SIZE
loop _flush_tlb_mannual_loop
ret
    


you can also perform 2 invalidations (or even 4, 8, etc) in 1 loop and eventually at the end perform the last one invalidation
Code:
mov edx,ecx
shr ecx,1
align 4
@@:
invlpg [eax]
invlpg [eax+_PAGE_FRAME_SIZE]
add eax,2*_PAGE_FRAME_SIZE
loop @b
test dl,1
jz @f
invlpg [eax]
@@:
ret    


you can easily measure the perfomance by reading timestamp counter (TSC) using the RDTSC instruction before and after and compare the 2 values reported by the instruction (ideally with disabled interrupts so nothing interrupts you and not only once but more times and calculate an average value from more iterations)
Post 09 Feb 2021, 20:51
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
MaoKo



Joined: 07 May 2019
Posts: 98
Location: Paris/French
MaoKo
Thank you Feryno for the tips. I've done some test and disabling PGE in cr4 seem faster than updating an MTRR.
I've not thinking about the loop unrolling technique, thank.
Post 10 Feb 2021, 14:12
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.