flat assembler
Message board for the users of flat assembler.
Index
> Main > How to flood a CPUs branch prediction buffers with garbage |
Author |
|
bitRAKE 11 Jan 2008, 18:15
Isn't having three branches in eight bytes also bad(tm)? Something to do with the way the data is stored in the chip - just rattling off the top of my head - I don't recall exactly.
|
|||
11 Jan 2008, 18:15 |
|
Tomasz Grysztar 11 Jan 2008, 22:00
Writing those things like this would be perhaps more flexible (and easier to modify):
Code: ;initialization part repeat 31 .current_cell = $ rdtsc bt eax,%-1 jc near .current_cell + (%-1) * .code_cell_size end repeat .code_cell_size = $-.current_cell No DD instruction making, no hand-calculated sizes, etc. |
|||
11 Jan 2008, 22:00 |
|
MCD 12 Jan 2008, 14:39
Tomasz Grysztar wrote: Writing those things like this would be perhaps more flexible (and easier to modify): this is what I initially tried, but it didn't worked for the following reason: The relative jump offsets are different and difficult to predict even at assembly time. Unfortunately, some of them fit into the short range and others don't (near range). Thus each JC instruction as you supposed it would have a different size and so the entire offset calculation as we both suggested it wouldn't work at all. A correct and working offset calculation would be incredibly complicated. _________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| || |
|||
12 Jan 2008, 14:39 |
|
Tomasz Grysztar 12 Jan 2008, 16:47
MCD wrote: The relative jump offsets are different and difficult to predict even at assembly time. Unfortunately, some of them fit into the short range and others don't (near range). Thus each JC instruction as you supposed it would have a different size and so the entire offset calculation as we both suggested it wouldn't work at all. That's why I put the "near" keyword there, to ensure the instruction always has the same length (I checked the code before posting, so that it generates exactly the same bytes). |
|||
12 Jan 2008, 16:47 |
|
Madis731 12 Jan 2008, 20:23
Have you consulted with Agner - I think when calculating the fused numbers he needs to have some code to deal with "zeroing" (or "trashing") the BTB.
You'll need to have a really clever algorithm to trash it all, because there might be some conditions under which BTB might be fushed and filling starts over again :S or ... I don't know much about the internals. |
|||
12 Jan 2008, 20:23 |
|
MCD 14 Jan 2008, 19:58
Tomasz Grysztar wrote:
okay. So how does it come that near jumps always have the same size while dword jumps haven't? I really expected for "jc dword some_where" to always have be relative dword jump? how does this come? |
|||
14 Jan 2008, 19:58 |
|
Tomasz Grysztar 14 Jan 2008, 21:53
See this thread: http://board.flatassembler.net/topic.php?t=5162
In short, you can have "jc near dword" or "jc near word" to generate full-size jumps using either the EIP or IP, and "jc short dword" or "jc short word" to generate short jumps, also either EIP or IP. |
|||
14 Jan 2008, 21:53 |
|
MCD 14 Jan 2008, 22:12
Tomasz Grysztar wrote: See this thread: http://board.flatassembler.net/topic.php?t=5162 But another problem I see with your suggested code is that it generates jumps out of the code into nowhere (if I'm right), this is mainly because you placed this line: Code: .code_cell_size = $-.current_cell consider the last jump in the initialization part: % = 31 .code_cell_size = 12 * 31 = 372 so the CPU may jump to: .current_cell + (31-1) * 372 = .current_cell + 11160 which is definitly out of memory |
|||
14 Jan 2008, 22:12 |
|
MCD 14 Jan 2008, 22:20
This is an updated version.
I removed the hand-crafted conditional jumps. I also rearranged the code to be more easy to maintain by automatically calculating the code_cell_size. I finally added another conditional jump in the code cells which depends on the same bit test. Code: use32 flood_BP_BTB: ;this should be the size of all instructions of 1 round of the repeat loop ;which must also be equal for the initialization, main and finitialization part ;we can't know their exact values at this point, so we preset them to 0 init_code_cell_size = 0 code_cell_size = 0 finit_code_cell_size = 0 ;the actual cell code macro init_code_cell i { rdtsc bt eax,i jc near $+6 + (i -1) * init_code_cell_size jnc near $+6 + (61-i) * init_code_cell_size } macro code_cell i { rdtsc bt eax,i and 1Fh jc near $+6 + (i shr 8 and 3Fh - 20h) * code_cell_size jnc near $+6 + (i shr 16 and 3Fh - 20h) * code_cell_size } macro finit_code_cell i { rdtsc bt eax,i jc near $+6 - (i + 1) * finit_code_cell_size jnc near $+6 - (63-(i*2)) * finit_code_cell_size } ;define actual code cell sizes now virtual at 0 init_code_cell 0 init_code_cell_size= $ end virtual virtual at 0 code_cell 0 code_cell_size= $ end virtual virtual at 0 finit_code_cell 0 finit_code_cell_size= $ end virtual ;check if all 3 code cell sizes are equal, else code will produce ;unreliable results if ( init_code_cell_size <> code_cell_size) | \ (finit_code_cell_size <> code_cell_size) display "ERROR: code cell sizes do not match." "ERROR: code cell sizes do not match." end if ;initialization part repeat 31 init_code_cell %-1 end repeat ;random seed: Uncomment the %t if you're feeling lucky and this will give you ;a different code each time you compile it d= 03191BCF4h ;03197BCF1h ;%t h= d ;main part repeat 080h h= h + 9E3779B97F4A7C15h ;~ MAX_INT * (sqrt(5)/2 - 1/2) d= ((d shl 43 + d shr 21) xor h) + d;some random random-number function code_cell d end repeat ;finitialization part repeat 32 finit_code_cell 32-% end repeat ret
_________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| || |
|||||||||||
14 Jan 2008, 22:20 |
|
Tomasz Grysztar 14 Jan 2008, 22:35
MCD wrote: But another problem I see with your suggested code is that it generates jumps out of the code into nowhere (if I'm right), this is mainly because you placed this line: The ".current_cell" is a variable updated on each repetition, so when I do "$-.current_cell" I calculate the size of the last cell, not the whole code. As I said above - I really checked that the code I gave assembles to exactly the same sequence of bytes. |
|||
14 Jan 2008, 22:35 |
|
Madis731 15 Jan 2008, 07:09
Okay, but how accurate theoretically is it now? How much can a benchmark with 0 instructions vary between CPUs & runs? I mean one RDTSC /push eax / RDTSC / sub eax,[esp] could potentially have many problems...
|
|||
15 Jan 2008, 07:09 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.