flat assembler
Message board for the users of flat assembler.

Index > Main > PIE friendly indexed jump table

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
sylware



Joined: 23 Oct 2020
Posts: 461
Location: Marseille/France
sylware 21 Aug 2021, 17:47
Since I stumbled against assembly code not PIE friendly, I started to port it, and since I am learning x86_64 assembly programming partterns, I was wondering how to do this one, the most "elegantly" possible:
A RIP-relative(PIE) jump table, with an index selecting the jump target. The jump is happening in only one place in the code, namely the jump target RIP-relative offsets could be computed at assembly time.
I am thinking to modify a jmp instruction with a pre-computed RIP-relative offset from an indexed table.
It means my code will become self-modifying. But on the general idea, how would you code this in an elegant way?
Post 21 Aug 2021, 17:47
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1041
Location: Russia
macomics 21 Aug 2021, 19:27
What is the problem to write a macro that replaces the necessary transition commands, collects a list of labels to create a table and insert code that performs these changes on this table?
Post 21 Aug 2021, 19:27
View user's profile Send private message Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 461
Location: Marseille/France
sylware 21 Aug 2021, 19:35
There is no 'problem' at all, only looking for the most elegant way to code such jump table.

In this case, RIP-relative offsets of jump targets can be pre-computed at assembly time, in such a table. Then I could update a jmp instruction with the jump target RIP-relative offset.
But maybe it is not a good way for very good reasons which I lack perspective and experience to be able to see, or there are "better" ways to code this, or it is, bluntly, not "worth it", etc.
Post 21 Aug 2021, 19:35
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1041
Location: Russia
macomics 21 Aug 2021, 20:02
see the Windows fixups implementation
Post 21 Aug 2021, 20:02
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 21 Aug 2021, 20:07
I would probably do it like this (avoiding self-modifying code):
Code:
        lea     rbx,[table]
        mov     ecx,1

        lea     rax,[origin]
  origin:
        add     rax,[rbx+rcx*8]
        jmp     rax

  table:
        dq      target0 - origin
        dq      target1 - origin    
There is in fact a similar scheme used in the core of fasm 1 (with only 16-bit offsets stored in table to save space).

It could also be simplified by using the address of table as the origin point for the stored offsets:
Code:
        mov     ecx,1

        lea     rax,[table]
        add     rax,[rax+rcx*8]
        jmp     rax

  table:
        dq      target0 - table
        dq      target1 - table    
But you need the table to reside in the same section then.
Post 21 Aug 2021, 20:07
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 21 Aug 2021, 20:09
macomics wrote:
see the Windows fixups implementation
Getting proper fixups to work with PIE is a little tricky, as you need to make the executable section at least temporarily writable. See the research done by revolution.
Post 21 Aug 2021, 20:09
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1041
Location: Russia
macomics 21 Aug 2021, 20:46
So does he have a problem coding a simple Switch-Case, or does he want to somehow gracefully break his head over self-modifying code?
Code:
SwitchCase:
 pop rax
 add rax, [rax+rcx*8]
 jmp rax
 . . .
 mov rcx, 1
call SwitchCase
@@:
  dq target0 - @b
  dq target1 - @b

    
Post 21 Aug 2021, 20:46
View user's profile Send private message Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 461
Location: Marseille/France
sylware 22 Aug 2021, 14:13
Tomasz Grysztar wrote:

It could also be simplified by using the address of table as the origin point for the stored offsets:
Code:
        mov     ecx,1

        lea     rax,[table]
        add     rax,[rax+rcx*8]
        jmp     rax

  table:
        dq      target0 - table
        dq      target1 - table    
But you need the table to reside in the same section then.

I thought of this one. So what would be "less bad"? Having the static table of offsets there (this table is way bigger that a 16 bytes paragraph), or in ".data" and aligned on a cache line?

Another issue, I don't know this code, then I may have to "free" some registers to do the work (basically restore those in the prolog of each jump :'( ). Any tips? Because I was thinking of an assembly writing assistant (= reverse-engineering) generating some "text" added to a line as a comment encoding a snapshot of what regs are currently used and for what.


Last edited by sylware on 22 Aug 2021, 18:02; edited 1 time in total
Post 22 Aug 2021, 14:13
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1041
Location: Russia
macomics 22 Aug 2021, 16:34
Quote:

Another issue, I don't know this code, then I may have to "free" some registers to do the work (basically restore those in the prolog of each jump :'( ). Any tips? Because I was thinking of an assembly writing assistant (= reverse-engineering) generating some "text" added to a line as a comment encoding a snapshot of what regs are currently used and for what.

That's what I'm talking about - I really want to break my brain for the sake of: "I don't want to save and restore values in registers."
Only the transition table does not need to be stored as 8-byte values. Why save a bunch of 0 bytes. There are enough words/double words for one jmps.
Post 22 Aug 2021, 16:34
View user's profile Send private message Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 461
Location: Marseille/France
sylware 22 Aug 2021, 18:15
Then, should I put the offset table in the code section, or should I put the offset table in data section (cache line aligned)?
Post 22 Aug 2021, 18:15
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1041
Location: Russia
macomics 22 Aug 2021, 21:38
For general-purpose registers, this does not play a particularly big role. (See: Intel SDM Vol 1 18-7: 18.3 INDIRECT BRANCH TRACKING)
https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html#combined
Look at it with at least one eye before you implement your big ideas.
Post 22 Aug 2021, 21:38
View user's profile Send private message Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 461
Location: Marseille/France
sylware 22 Aug 2021, 23:40
Naively, I would put the RIP-relative offset table in the .text section near the jmp instruction (.text section could be read only?)
Post 22 Aug 2021, 23:40
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1041
Location: Russia
macomics 23 Aug 2021, 14:07
sylware wrote:
Naively, I would put the RIP-relative offset table in the .text section near the jmp instruction (.text section could be read only?)

readable & executable Wink
Post 23 Aug 2021, 14:07
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 23 Aug 2021, 14:18
The table-less method is also possible: space code at aligned intervals.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 23 Aug 2021, 14:18
View user's profile Send private message Visit poster's website Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 461
Location: Marseille/France
sylware 23 Aug 2021, 15:05
bitRAKE wrote:
The table-less method is also possible: space code at aligned intervals.


What do you mean? Jump targets at pre-computable addresses from a base address?
Post 23 Aug 2021, 15:05
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1041
Location: Russia
macomics 23 Aug 2021, 15:46
sylware wrote:

What do you mean? Jump targets at pre-computable addresses from a base address?

Code:
    mov rax, 1
    lea rax, [label1 + rax * 8]
    jmp rax
align 8
label1:
    mov rax, rdx
    jmp quit
align 8
label2:
    mov rax, rcx
    jmp quit
quit:
   . . .
    
Post 23 Aug 2021, 15:46
View user's profile Send private message Reply with quote
sylware



Joined: 23 Oct 2020
Posts: 461
Location: Marseille/France
sylware 23 Aug 2021, 17:10
So which one is "better":
1 - the jump code does fit in a 16 bytes paragraph and the RIP-relative offsets in a cache line.
2 - the jump code does not fit in a 16 bytes paragraph, but include the RIP-relative jumps.

More data, or more code?
Post 23 Aug 2021, 17:10
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1041
Location: Russia
macomics 23 Aug 2021, 17:30
Choose different options for your implementation. Optimizing data by worrying about the cache at the user level is a useless hassle. These characteristics differ on different processors, and the task of the OS is to make this puzzle as invisible as possible for the application programmer. In any case, the memory for the user program is loaded with pages and it is worth worrying about cache lines only when using SSE/AVX-type technologies (instructions for working with long / packed data), but not for code using general-purpose registers.
Post 23 Aug 2021, 17:30
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20448
Location: In your JS exploiting you and your system
revolution 23 Aug 2021, 17:41
How do you define "better"?

Better for ease of programming.
Better for understanding.
Better for less typing.
Better for harder to make bugs.
Better for some other reason.

If it is important enough to spend time to make it "better" then I think it should be important enough to test it in different implementations to measure which one is better in your metric.

The simplest "better" I know of is generated code size. It is easy to measure with fasm because it shows how many bytes are in the final file. It is also the least subjective measure. And it doesn't depend upon which particular system it is run in.
Post 23 Aug 2021, 17:41
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1041
Location: Russia
macomics 23 Aug 2021, 17:46
revolution wrote:

How do you define "better"?

sylware wrote:

It means my code will become self-modifying. But on the general idea, how would you code this in an elegant way?

"better" Smile
Post 23 Aug 2021, 17:46
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.