flat assembler
Message board for the users of flat assembler.

Index > OS Construction > easy and fast call mechanism for kernel routines

Author
Thread Post new topic Reply to topic
Mat



Joined: 30 Sep 2008
Posts: 11
Mat
Currently developing a little kernel for a vm engine and search for a simple way to call my kernel routines. Cause the kernel is split into a hardware dependant and higher abstracted layer with message passing beetween kernel objects, i'm search for a call mechanism with minimal overhead.

My first idea was simply to use a call list and use registers for parameters but that would sacrifice flexibility.

Another idee is not to call a function but jump to it and load the instruction adress for return in one register e.g:

Code:
CALLN:     mov RDX,RETN
           jmp KERNEL_ROUTINE
           db FIRST_PARAMETER
           db SECOND_PARAMETER
           dd THIRD_PARAMETER
RETN:       ...
    


These way, its not needed to pass parameters to the stack but I'm not sure if this would be a better way then a conventional call .. ret sequence ?!?

_________________
make it yourself or you screwed !
Post 01 Oct 2008, 10:19
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17663
Location: In your JS exploiting you and your system
revolution
You mention minimal overhead but you don't mention how you judge that. Do you mean minimal memory usage? Minimal stack usage? Minimal register usage? Minimal clock tick? Minimal code size? Minimal BTB pollution? Minimal return stack pollution? Minimal cache pollution? You can't get all of those at the same time, many are mutually exclusive of the others.

It is really up to you what you want to do with calling conventions. I doubt there is any "best" or "optimal" convention. They all have their own different strengths and weaknesses.

The Windows 64bit OSes uses a variation of the Fastcall convention. Perhaps just for ease of porting code you can consider using that.
Post 01 Oct 2008, 12:04
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3043
Location: vpcmipstrm
bitRAKE
Look at sysenter instruction if going from ring3 to ring0. Otherwise just use call within the same ring. A fast and flexible convention is all parameters in registers - in the rare event more parameters are passed than registers - just pass a structure pointer. Organize register usage to ease management between kernel calls on the application end, and speed usage on the kernel side.

_________________
¯\(°_o)/¯ unlicense.org
Post 01 Oct 2008, 15:47
View user's profile Send private message Visit poster's website Reply with quote
Mat



Joined: 30 Sep 2008
Posts: 11
Mat
I'm searching for a method which don't utilisize the stack (because the kernel should build the base for a stack based vm), isn't dependant on register allocations (most of the 16 registers are reserved for other things like vm register caching) and don't consume to much clock ticks per call.

uh, sounds easy ;D
Post 01 Oct 2008, 15:51
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3043
Location: vpcmipstrm
bitRAKE
Code:
  call [KERNEL_001]
  dq param0
  dq param1
return_here:
...



KERNEL_001:
  mov rax,[rsp] ; access parameters
  add qword [rsp],16
...
  retn    
SYSENTER doesn't use the stack or registers.

_________________
¯\(°_o)/¯ unlicense.org
Post 01 Oct 2008, 16:18
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17663
Location: In your JS exploiting you and your system
revolution
Mat: do you have any other requirements like lowest level of CPU used? Because SYSENTER is only available on newer CPUs so if that is a problem you might need to look at other methods also.
Post 01 Oct 2008, 16:48
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
stack is usually fastest-to-access memory, much unlike code area which is usually slowest.
Post 01 Oct 2008, 18:05
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Mat



Joined: 30 Sep 2008
Posts: 11
Mat
bitRAKE wrote:

SYSENTER doesn't use the stack or registers.

Sadly, the AMD64 ISA lack support for sysenter and sysexit in both long modes (compatible and native).

_________________
make it yourself or you screwed !
Post 02 Oct 2008, 21:10
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
But it does support SYSCALL
Quote:
Long Mode. When long mode is activated, the behavior of the SYSCALL instruction
depends on whether the calling software is in 64-bit mode or compatibility mode. In
64-bit mode, SYSCALL saves the RIP of the instruction following the SYSCALL into
RCX and loads the new RIP from LSTAR bits 63–0. (The LSTAR register is modelspecific
register C000_0082h.) In compatibility mode, SYSCALL saves the RIP of the
instruction following the SYSCALL into RCX and loads the new RIP from CSTAR bits
63–0. (The CSTAR register is model-specific register C000_0083h.)
Post 02 Oct 2008, 21:32
View user's profile Send private message Reply with quote
Mat



Joined: 30 Sep 2008
Posts: 11
Mat
revolution: Yes, that's a little problem and after reading this: [link] http://en.wikipedia.org/wiki/X86-64#Differences_between_AMD64_and_Intel_64 [/link] it seems not the only one....

I have choose the following compromise:

- Parameters are passed following the call
- the r15 register is reserved for return adresses

Code:
mov r15,$
jmp KERNEL_HANDLER
dq FUNCTION_ID
dq PARAMETER_A
dq PARAMETER_B
...
    


Not the fastest but a simple, generic approach and it doesnt change the stack and sacrifies only one register.

thanks to all for the answers.

_________________
make it yourself or you screwed !
Post 02 Oct 2008, 21:43
View user's profile Send private message Reply with quote
Mat



Joined: 30 Sep 2008
Posts: 11
Mat
LocoDelAssembly wrote:
But it does support SYSCALL
Quote:
Long Mode. When long mode is activated, the behavior of the SYSCALL instruction
depends on whether the calling software is in 64-bit mode or compatibility mode. In
64-bit mode, SYSCALL saves the RIP of the instruction following the SYSCALL into
RCX and loads the new RIP from LSTAR bits 63–0. (The LSTAR register is modelspecific
register C000_0082h.) In compatibility mode, SYSCALL saves the RIP of the
instruction following the SYSCALL into RCX and loads the new RIP from CSTAR bits
63–0. (The CSTAR register is model-specific register C000_0083h.)


ok, but what's with Intel EMT64 cpus (using another, system specific register) ? I don't want to handle two SYSCALL behaviors just to implement function calls.

_________________
make it yourself or you screwed !
Post 02 Oct 2008, 22:22
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Intel Manuals wrote:
SYSCALL—Fast System Call
Description
SYSCALL saves the RIP of the instruction following SYSCALL to RCX and loads a new
RIP from the IA32_LSTAR (64-bit mode). Upon return, SYSRET copies the value
saved in RCX to the RIP.
SYSCALL saves RFLAGS (lower 32 bit only) in R11. It then masks RFLAGS with an
OS-defined value using the IA32_FMASK (MSR C000_0084). The actual mask value
used by the OS is the complement of the value written to the IA32_FMASK MSR.
None of the bits in RFLAGS are automatically cleared (except for RF). SYSRET
restores RFLAGS from R11 (the lower 32 bits only).
Software should not alter the CS or SS descriptors in a manner that violates the
following assumptions made by SYSCALL/SYSRET:
• The CS and SS base and limit remain the same for all processes, including the
operating system (the base is 0H and the limit is 0FFFFFFFFH).
• The CS of the SYSCALL target has a privilege level of 0.
• The CS of the SYSRET target has a privilege level of 3.
SYSCALL/SYSRET do not check for violations of these assumptions.

Operation
IF (CS.L ≠ 1 ) or (IA32_EFER.LMA ≠ 1) or (IA32_EFER.SCE ≠ 1)
(* Not in 64-Bit Mode or SYSCALL/SYSRET not enabled in IA32_EFER *)
THEN #UD; FI;
RCX ← RIP;
RIP ← LSTAR_MSR;
R11 ← EFLAGS;
EFLAGS ← (EFLAGS MASKED BY IA32_FMASK);
CPL ← 0;
CS(SEL) ← IA32_STAR_MSR[47:32];
CS(DPL) ← 0;
CS(BASE) ← 0;
CS(LIMIT) ← 0xFFFFF;
CS(GRANULAR) ← 1;
SS(SEL) ← IA32_STAR_MSR[47:32] + 8;
SS(DPL) ← 0;
SS(BASE) ← 0;
SS(LIMIT) ← 0xFFFFF;
SS(GRANULAR) ← 1;


However seems that Intel CPUs don't support SYSCALL/SYSRET on legacy mode nor 32-bit long mode, while AMD support it on both modes. On the other hand, AMD supports SYSENTER/SYSEXIT on 32-bit mode only.

Anyway, unless I missed something both architectures have the same behavior when them are running in 64-bit long mode.
Post 02 Oct 2008, 22:42
View user's profile Send private message Reply with quote
Mat



Joined: 30 Sep 2008
Posts: 11
Mat
after reading both specifications:

the behavior on Intel and AMD architectures is the same Very Happy
Thanks to remind me of this instruction ! I think the dependance on two registers (RCX and R11) is not so crucial (but I use now all registers in one or the other way).
Post 03 Oct 2008, 09:15
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.