flat assembler
Message board for the users of flat assembler.

flat assembler > Linux > sysenter/syscall

Author
Thread Post new topic Reply to topic
pjd



Joined: 15 Jul 2007
Posts: 47
Hi,
How do I use the fast system call instructions in my asm apps (sysenter and syscall)?
I have an AMD Athlon 64 and 32-bit linux (so that's syscall right?). I haven't been able to find anything on google about this and my attempts have created "illegal hardware instruction" from syscall and a seg fault from sysenter.

_________________
"For God so loved the world"
Post 26 May 2008, 15:28
View user's profile Send private message Reply with quote
Dex4u



Joined: 08 Feb 2005
Posts: 1601
Location: web
Quote:

4.6 Sysenter and the vsyscall page


It has been observed that a 2 GHz Pentium 4 was much slower than an 850 MHz Pentium III on certain tasks, and that this slowness is caused by the very large overhead of the traditional int 0x80 interrupt on a Pentium 4.


Some models of the i386 family do have faster ways to enter the kernel. On Pentium II there is the sysenter instruction. Also AMD has a syscall instruction. It would be good if these could be used.


Something else is that in some applications gettimeofday() is a done very often, for example for timestamping all transactions. It would be nice if it could be implemented with very low overhead.


One way of obtaining a fast gettimeofday() is by writing the current time in a fixed place, on a page mapped into the memory of all applications, and updating this location on each clock interrupt. These applications could then read this fixed location with a single instruction - no system call required.


There might be other data that the kernel could make available in a read-only way to the process, like perhaps the current process ID. A vsyscall is a "system" call that avoids crossing the userspace-kernel boundary.


Linux is in the process of implementing such ideas. Since Linux 2.5.53 there is a fixed page, called the vsyscall page, filled by the kernel. At kernel initialization time the routine sysenter_setup() is called. It sets up a non-writable page and writes code for the sysenter instruction if the CPU supports that, and for the classical int 0x80 otherwise. Thus, the C library can use the fastest type of system call by jumping to a fixed address in the vsyscall page.


Concerning gettimeofday(), a vsyscall version for the x86-64 is already part of the vanilla kernel. Patches for i386 exist. (An example of the kind of timing differences: John Stultz reports on an experiment where he measures gettimeofday() and finds 1.67 us for the int 0x80 way, 1.24 us for the sysenter way, and 0.88 us for the vsyscall.)



Some details





The kernel maps a page (0xffffe000-0xffffefff) in the memory of every process. (This is the one but last addressable page. The last is not mapped - maybe to avoid bugs related to wraparound.) We can read it:

/* get vsyscall page */
#include <unistd.h>
#include <string.h>

int main() {
char *p = (char *) 0xffffe000;
char buf[4096];
#if 0
write(1, p, 4096);
/* this gives EFAULT */
#else
memcpy(buf, p, 4096);
write(1, buf, 4096);
#endif
return 0;
}

and if we do, find an ELF binary.
% ./get_vsyscall_page > syspage
% file syspage
syspage: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), stripped
% objdump -h syspage

syspage: file format elf32-i386

Sections:
Idx Name Size VMA LMA File off Algn
0 .hash 00000050 ffffe094 ffffe094 00000094 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .dynsym 000000f0 ffffe0e4 ffffe0e4 000000e4 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .dynstr 00000056 ffffe1d4 ffffe1d4 000001d4 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .gnu.version 0000001e ffffe22a ffffe22a 0000022a 2**1
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .gnu.version_d 00000038 ffffe248 ffffe248 00000248 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .text 00000047 ffffe400 ffffe400 00000400 2**5
CONTENTS, ALLOC, LOAD, READONLY, CODE
6 .eh_frame_hdr 00000024 ffffe448 ffffe448 00000448 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 .eh_frame 0000010c ffffe46c ffffe46c 0000046c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 .dynamic 00000078 ffffe578 ffffe578 00000578 2**2
CONTENTS, ALLOC, LOAD, DATA
9 .useless 0000000c ffffe5f0 ffffe5f0 000005f0 2**2
CONTENTS, ALLOC, LOAD, DATA
% objdump -d syspage

syspage: file format elf32-i386

Disassembly of section .text:

ffffe400 <.text>:
ffffe400: 51 push %ecx
ffffe401: 52 push %edx
ffffe402: 55 push %ebp
ffffe403: 89 e5 mov %esp,%ebp
ffffe405: 0f 34 sysenter
ffffe407: 90 nop
ffffe408: 90 nop
... more nops ...
ffffe40d: 90 nop
ffffe40e: eb f3 jmp 0xffffe403
ffffe410: 5d pop %ebp
ffffe411: 5a pop %edx
ffffe412: 59 pop %ecx
ffffe413: c3 ret
... zero bytes ...
ffffe420: 58 pop %eax
ffffe421: b8 77 00 00 00 mov $0x77,%eax
ffffe426: cd 80 int $0x80
ffffe428: 90 nop
ffffe429: 90 nop
... more nops ...
ffffe43f: 90 nop
ffffe440: b8 ad 00 00 00 mov $0xad,%eax
ffffe445: cd 80 int $0x80



The interesting addresses here are found via

% grep ffffe System.map
ffffe000 A VSYSCALL_BASE
ffffe400 A __kernel_vsyscall
ffffe410 A SYSENTER_RETURN
ffffe420 A __kernel_sigreturn
ffffe440 A __kernel_rt_sigreturn
%



So __kernel_vsyscall pushes a few registers and does a sysenter instruction. And SYSENTER_RETURN pops the registers again and returns. And __kernel_sigreturn and __kernel_rt_sigreturn do system calls 119 and 173, that is, sigreturn and rt_sigreturn, respectively.


What about the jump just before SYSENTER_RETURN? It is a trick to handle restarting of system calls with 6 parameters. As Linus said: I'm a disgusting pig, and proud of it to boot.


The code involved is most easily seen from a slightly earlier patch.


A tiny demo program.

#include <stdio.h>

int pid;

int main() {
__asm__(
"movl $20, %eax \n"
"call 0xffffe400 \n"
"movl %eax, pid \n"
);
printf("pid is %d\n", pid);
return 0;
}

This does the getpid() system call (__NR_getpid is 20) using call 0xffffe400 instead of int 0x80.

However, the proper thing to do is not call 0xffffe400 but call *%gs:0x18. If %gs has been set up so that it addresses 0xffffe000, then at location 0xffffe018 we find the value of __kernel_vsyscall, the entry point of the kernel vsyscalls. Such general setup requires the parsing of the ELF headers of this vsyscall page, but then is future-proof.
Post 26 May 2008, 17:41
View user's profile Send private message Reply with quote
pjd



Joined: 15 Jul 2007
Posts: 47
so I don't do syscall directly right?

doing call 0xffffe400 or call [gs:0x18] both seg fault as does ./get_vsyscall_page.
do I have to link a special library in or something (I've seen mentions of linux.gate.so)?

do you have a hello world example in asm for this?
Post 27 May 2008, 15:13
View user's profile Send private message Reply with quote
pjd



Joined: 15 Jul 2007
Posts: 47
the vsyscall program in that article keeps seg faulting on the memcpy call no matter what memory page I put in on line 2. Do I have to do anything special to get it to work?
Post 29 May 2008, 17:09
View user's profile Send private message Reply with quote
pjd



Joined: 15 Jul 2007
Posts: 47
gs:10 (not gs:18 ) is only set in a dynamic executable of some sort so it works in a program linked to libc or in a shared library linked in without libc anywhere. Further more the location to call is no longer fixed to ffffe000h in the latest kernels. it is possible to find out what a single process is using for linux-gate.so.1 (the page in question) via cat /proc/self/maps however it also seems that on my kernel at least reading the memory pages of the program (via the program get_vsyscall above or by cat or dd on /proc/self/mem is impossible.
Post 05 Jun 2008, 13:34
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2018, Tomasz Grysztar.

Powered by rwasa.