flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
LocoDelAssembly 14 May 2007, 02:03
I think it is branch predictor's job.
|
|||
![]() |
|
tigujo 14 May 2007, 13:03
Quote:
What I know branch prediction does even more - fetching code (page table walk plus caching) and some actions ahead. But only, if it can do it. There are cases, when prediction does not work at all. For those cases I'd like to know how to accelerate, if there is a possibility, tricky as it might be ![]() |
|||
![]() |
|
Hayden 15 May 2007, 03:32
_________________ New User.. Hayden McKay. |
|||
![]() |
|
tigujo 17 May 2007, 13:58
thanks, Hayden,
but I'm rather looking for a way to influence the loading of the instruction cache. Or, if this is not possible to do it directly a la 'prefetch' as with the data-cache - then I want to understand at least the built in automatisms when the instruction cache is loaded (or preloaded). |
|||
![]() |
|
MazeGen 17 May 2007, 15:26
tigujo wrote: I simply wonder how to prefetch code (when you know you will need it shortly). I've never played with it, but what about Branch taken hint prefix? Silly example: Code: <your code> stc db 3eh jc prefetched_code Just prepare different test pieces and measure them. Quote: I want to understand at least the built in automatisms when the instruction cache is loaded (or preloaded). Grab Agner Fog's manuals. |
|||
![]() |
|
tigujo 17 May 2007, 21:08
thanks MazeGen,
gonna try, grab the manuals ...the weekend is a good time for that ![]() |
|||
![]() |
|
Hayden 21 May 2007, 03:59
Probably not much help, but...
one way to load code into the instruction cache would be to align some code on a paragragh boundary the call it useing a far call. A far call will clear the intruction que, then the code at that location will be loaded into the cache. I would chose 16 byte alighment 'cause the cache is usualy read in 1 paragraph at a time. _________________ New User.. Hayden McKay. |
|||
![]() |
|
ATV 21 May 2007, 12:07
Here is little test program to test code prefetch queues.
On my 286 it gives 8 byte and on 486 it gives 30 bytes (that's why I have problems with my selfmodify code on 486 ![]() Current Pentium/AMD should give 0, becuse new processors know if code has changed. Code: count = 64 org 100h start: mov dx,txtpiq mov ah,09h ;Display string at (ds:dx). Dos 1+ int 21h call getpiq call shownumber int 20h shownumber: mov bx,000Ah ;Divisor xor cx,cx ;Clear counter push1: xor dx,dx ;Extend high word div bx ;dx=digit (0-9), ax=quotient push dx ;Save digit (remainder) on stack inc cx ;Update counter or ax,ax ;Is quotient zero? jnz push1 ;No, divide again pop1: pop dx ;Get most significant digit add dl,'0' ;Convert to ASCII digit mov ah,02h ;Display character in (dl). Dos 1+ int 21h loop pop1 ;Loop until counter is 0 mov dx,txtcrlf mov ah,09h ;Display string at (ds:dx). Dos 1+ int 21h ret getpiq: mov cx,count push cs pop es mov di,piq_01 - 1 std mov al,90h ;nop opcode xor dx,dx cli jmp $+2 rep stosb repeat count inc dx ;start overwriting from here on up end repeat piq_01: xchg ax,dx sti ret txtpiq db 'Prefetch queue size of the current processor is $' txtcrlf db 0Dh,0Ah,'$' |
|||
![]() |
|
MazeGen 23 May 2007, 07:46
tigujo, I've just realized that simple JMP should be enough to preload the cache
![]() Code: <your code> jmp prefetched_code This is how I imagine how the pipeline works. The decoder finds the JMP, which is always taken, and preloads target code into the cache. |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.