flat assembler
Message board for the users of flat assembler.

Index > Main > fasm-metaprogramming

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
ander-skirnir



Joined: 18 Apr 2010
Posts: 4
ander-skirnir
Im extremely newbie to asm, so my question may sound pretty lame:
how to write fasm instructions with fasm in memory and execute em?
for eg i have bytes db 30 dup (0) and want to write to these bytes instructions
Code:
mov [some_other_memory], 1
mov [some_other_memory + 1], 2
jmp label_back_to_static_code    


I've googled hard, but found no solution Sad
Post 18 Apr 2010, 10:19
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler
Windows won't allow it to work, and you also have to consider the cache, if a part of memory is already in the instruction cache, I don't think changes in memory are reflected there. At least that's what I read somewhere around here.
Post 18 Apr 2010, 10:39
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17272
Location: In your JS exploiting you and your system
revolution
Tyler: Any change to memory will be reflected back into the CPU at all cache levels. This is by design so that self modifying code will work. It might not be fast or efficient but it will work.

ander-skirnir: Perhaps you are looking for fasm.dll, it exists on here somewhere. It has a few limitations but will basically allow on-the-fly assembly.
Post 18 Apr 2010, 10:46
View user's profile Send private message Visit poster's website Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler
It was the prefetch queue that I remembered reading about in the thread about me trying to find a virus. There's also an example of fake smc in that thread
Post 18 Apr 2010, 10:57
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17272
Location: In your JS exploiting you and your system
revolution
Tyler: No one runs 8086/286 systems anymore. And anyone that does can't run fasm anyway so the point is moot.
Post 18 Apr 2010, 11:17
View user's profile Send private message Visit poster's website Reply with quote
ander-skirnir



Joined: 18 Apr 2010
Posts: 4
ander-skirnir
> fasm.dll

Okay, ty, ill try.

Btw i need it to implement toy common-lisp (very small subset of) compiler, that will allow to defun (define functions) at runtime without any layering/bytecoding. Im wondering, is it right that asm-o-generation on-the-fly is best and fastest way to compile functions in dynamic compilers?
Post 18 Apr 2010, 11:30
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17272
Location: In your JS exploiting you and your system
revolution
LISP ---> assembly ---> binary ---> execute.

That is a common path for almost all native apps written today, just replace LISP with C, or C++, or whatever, and the remainder is still the same.

But as for on-the-fly, that is not the usual case. Most often for on-the-fly you would have this:

JAVA/C# ---> bytecode ---> JIT compiler ---> execute.

Or this:

PERL/JS ---> interpreter.
Post 18 Apr 2010, 11:37
View user's profile Send private message Visit poster's website Reply with quote
ander-skirnir



Joined: 18 Apr 2010
Posts: 4
ander-skirnir
But common-lisp have no compile-time / runtime difference. It must provide dynamic compilation to machine code in working system. I can define function that defines functions, and all of em must be honestly native-compiled in runtime. Can i done that without on-the-fly?
Post 18 Apr 2010, 11:54
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17272
Location: In your JS exploiting you and your system
revolution
Post 18 Apr 2010, 12:08
View user's profile Send private message Visit poster's website Reply with quote
cthug



Joined: 03 Apr 2009
Posts: 36
Location: /home/Australia
cthug
You could cheat and use tcc, and use its on-the-fly C compilation and assembly.
Post 18 Apr 2010, 12:33
View user's profile Send private message Visit poster's website Reply with quote
ander-skirnir



Joined: 18 Apr 2010
Posts: 4
ander-skirnir
> http://board.flatassembler.net/topic.php?t=6239
Ty again.

> tcc
Yeah, i like it so much - awsom compiler, but clever people say that way the c compiled in native code are much different from how good common-lisp compiler could do. CL has both static/dynamic scoping, not-emulated-by-structs/classes lambdas, closures and so-so on - things, that cannot be done efficient by translating to c.
Post 18 Apr 2010, 12:56
View user's profile Send private message Reply with quote
cthug



Joined: 03 Apr 2009
Posts: 36
Location: /home/Australia
cthug
tcc has a assembly(GAS compatible), built in, so you pass you dynamically generated code to libtcc to assembly and execute it. The only problem is bloat, it adds about 400KB to your executable, so you might have to edit tcc source Sad
Post 18 Apr 2010, 13:14
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
revolution wrote:
Tyler: Any change to memory will be reflected back into the CPU at all cache levels. This is by design so that self modifying code will work. It might not be fast or efficient but it will work.

It is not reflected back in all cases. The following test returns "cache not updated" on my PC:
Code:
format PE GUI
entry start

section '.text' code readable writeable executable

start:
 mov  al, 90h   ; NOP instruction
 mov     ecx, last_eip - get_eip
 call        get_eip
get_eip:
 mov edi, [esp]
 rep stosb   ; rewrite bytes from get_eip to last_eip by NOPs
is_40:
 DB 40h   ; dummy INC EAX
last_eip:
 pop edi

 ; if byte 40h was rewitten, REP STOSB didn't rewrite itself
 ; - code cache was not updated

 cmp byte [edi+(is_40-get_eip)], 40h
 jne cache_not_updated

cache_updated:
 push    0
 push      caption
 push        updated
 push        0
 call      [MessageBoxA]

 push      1
 call      [ExitProcess]

cache_not_updated:
 push    0
 push      caption
 push        not_updated
 push    0
 call      [MessageBoxA]

 push      0
 call      [ExitProcess]

section '.data' data readable writeable

 caption     db 'x',0
 not_updated db 'cache not updated',0
 updated     db 'cache updated',0

section '.idata' import data readable writeable

  dd 0,0,0,RVA kernel_name,RVA kernel_table
  dd 0,0,0,RVA user_name,RVA user_table
  dd 0,0,0,0,0

  kernel_table:
    ExitProcess dd RVA _ExitProcess
    dd 0
  user_table:
    MessageBoxA dd RVA _MessageBoxA
    dd 0

  kernel_name db 'KERNEL32.DLL',0
  user_name db 'USER32.DLL',0

  _ExitProcess dw 0
    db 'ExitProcess',0
  _MessageBoxA dw 0
    db 'MessageBoxA',0
    
Post 20 Apr 2010, 09:18
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17272
Location: In your JS exploiting you and your system
revolution
That is a neat trick. It goes against what the Intel manual states:
18.29.1 Self-Modifying Code with Cache Enabled wrote:
On the Intel486 processor, a write to an instruction in the cache will modify it in both the cache and memory. If the instruction was prefetched before the write, however, the old version of the instruction could be the one executed. To prevent this problem, it is necessary to flush the instruction prefetch unit of the Intel486 processor by coding a jump instruction immediately after any write that modifies an instruction. The P6 family and Pentium processors, however, check whether a write may modify an instruction that has been prefetched for execution. This check is based on the linear address of the instruction. If the linear address of an instruction is found to be present in the prefetch queue, the P6 family and Pentium processors flush the prefetch queue, eliminating the need to code a jump instruction after any writes that modify an instruction.
Is it enough the check if ecx is zero? Single stepping make ecx=2, and direct execution makes ecx=0.

I guess this is a consequence of the special circuitry for rep stosx speed-ups.
Post 20 Apr 2010, 09:54
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
revolution wrote:
Is it enough the check if ecx is zero? Single stepping make ecx=2, and direct execution makes ecx=0.

I think so.
revolution wrote:
I guess this is a consequence of the special circuitry for rep stosx speed-ups.

It seems so. I would need more processors to test it.
Post 20 Apr 2010, 10:48
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17272
Location: In your JS exploiting you and your system
revolution
Of note is that eax is not incremented during direct execution. So at least that instruction is properly flushed.
Post 20 Apr 2010, 11:29
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Same behaviour in Athlon (K7), and Phenom II.

Perhaps the manual explains in more detail the complete spec somewhere else? It wouldn't be the first time that Intel manuals puts a too much general description of something and latter in the manual a contradiction appears.

PS: Or perhaps "REP STOSB" is defined as an instruction rather than just a prefixed instruction equivalent to the code below?
Code:
rep_stosb:
test ecx, ecx
jz .out
.stosb:
stosb
loop .stosb
.out:    
Post 20 Apr 2010, 13:04
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17272
Location: In your JS exploiting you and your system
revolution
LocoDelAssembly wrote:
Or perhaps "REP STOSB" is defined as an instruction rather than just a prefixed instruction ...
Interesting idea. But then interrupts might screw you if you tried to rely upon that idea. If an interrupt happens during execution of rep stosx then I would expect that the effect would be different upon return from the interrupt. Although it would be really hard to arrange for an interrupt to occur just when you want it to for testing purposes!
Post 20 Apr 2010, 14:46
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
revolution wrote:
But then interrupts might screw you if you tried to rely upon that idea.
Yes, but note that this may still not breaking Intel's description, it is only that REP STOSB is interruptible that is causing this effect, and for that reason you should not rely on having a 100% reproducibility in all of your program runs.
Post 20 Apr 2010, 17:03
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17272
Location: In your JS exploiting you and your system
revolution
We can see the interrupt in action with this code:
Code:
MAXIMUM_LENGTH = 1 shl 26

      format pe console
   include 'win32ax.inc'

.code

    virtual
             inc     eax
         load instr_inc_eax byte from $$
     end virtual

     virtual
             ret
         load instr_ret byte from $$
 end virtual

     virtual
             nop
         load instr_nop byte from $$
 end virtual

     virtual
             nop
         nop
         rep     stosb
               load instr_rep_stosb dword from $$
  end virtual

     virtual
             nop
         rep     stosw
               load instr_rep_stosw dword from $$
  end virtual

     virtual
             nop
         nop
         rep     stosd
               load instr_rep_stosd dword from $$
  end virtual

proc begin uses ebx
      invoke  GetStdHandle,STD_OUTPUT_HANDLE
      mov     ebx,eax
     stdcall test_lengths,instr_rep_stosb,'STOSB',ebx,0
        stdcall print_string,ebx,<13,10>
      stdcall test_lengths,instr_rep_stosw,'STOSW',ebx,1
        stdcall print_string,ebx,<13,10>
      stdcall test_lengths,instr_rep_stosd,'STOSD',ebx,2
        stdcall print_string,ebx,<13,10>
      invoke  ExitProcess,0
endp

proc test_lengths uses ebx,rep_instruction,name,handle,shift
   mov     ebx,4
    .loop:
     mov     ecx,[shift]
 lea     eax,[ebx+4]
 shr     eax,cl
      stdcall make_code_section,eax,[rep_instruction]
     ccall   cprint_formatted_string,[handle],<'%s length = 0x%08x, bytes written before interrupt: 0x%08x',13,10>,[name],ebx,eax
    add     ebx,ebx
     cmp     ebx,MAXIMUM_LENGTH
  jbe     .loop
       ret
endp

proc make_code_section uses edi,run_length,rep_instruction
       mov     eax,[rep_instruction]
       mov     edi,rep_stos_test
   stosd
       mov     eax,instr_inc_eax * 0x01010101
      mov     ecx,MAXIMUM_LENGTH shr 2
    rep     stosd
       mov     eax,instr_ret * 0x01010101
  stosd
       mov     edi,rep_stos_test
   mov     ecx,[run_length]
    mov     eax,instr_nop * 0x01010101
  call    rep_stos_test
       sub     eax,instr_nop * 0x01010101
  sub     eax,MAXIMUM_LENGTH
  neg     eax
 ret
endp

proc cprint_formatted_string c handle,format,parameters
  stdcall print_formatted_string,[handle],[format],addr parameters
    ret
endp

proc print_formatted_string handle,format,parameters
     locals
              string rb 1024
      endl
        invoke  wvsprintf,addr string,[format],[parameters]
 stdcall print_string,[handle],addr string
   ret
endp

proc print_string handle,string
  locals
              written dd ?
        endl
        invoke  lstrlen,[string]
    invoke  WriteFile,[handle],[string],eax,addr written,NULL
   ret
endp

section 'rep_stos' code readable writeable executable

rep_stos_test: rb MAXIMUM_LENGTH + 1 shl 12

.end begin    
For shorter lengths the code usually manages to get in all the writes before the interrupt comes. But for the longer runs it becomes almost impossible to complete the run before the interrupt screws up the output.

A small section of screen dump:
Code:
...
STOSB length = 0x00100000, bytes written before interrupt: 0x00100000
STOSB length = 0x00200000, bytes written before interrupt: 0x001bda5c
STOSB length = 0x00400000, bytes written before interrupt: 0x00400000
STOSB length = 0x00800000, bytes written before interrupt: 0x004558dc
...    
The 1M length completes fine, the longer 2M gets interrupted at 0x001bda5c, the 4M completes, and then 8M gets interrupted. These results are inconsistent, no two runs will be likely to give the same results.

If you were to single step every test then all the "bytes written" figures will be zero, meaning that the rep stosx overwrites itself and can never even get one byte written past its own location.

So basically the CPU tries to completely run the rep stosx from its internal state and won't re-read the instruction from memory unless it gets interrupted and has to return to restart.
Post 21 Apr 2010, 08:41
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.