flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
Jessé 01 Jun 2025, 15:13
Today, I wake up with a very fun idea in mind, and that leads me to make this simple application, which can process standard escape sequences:
Code: format ELF64 executable 3 entry Start include 'anon_label.inc' include 'fastcall_BETA2.inc' include 'stdio.inc' _data ; 08 19 2A 3B 4C 5D 6E 7F escapetable: db 255, 255, '"', 255, 255, 255, 255, "'" ; 20 - 27 db 255, 255, 255, 255, 255, 255, 255, 255 ; 28 - 2F db 254, 254, 254, 254, 254, 254, 254, 254 ; 30 - 37 db 255, 255, 255, 255, 255, 255, 255, '?' ; 38 - 3F db 255, 255, 255, 255, 255, 255, 255, 255 ; 40 - 47 db 255, 255, 255, 255, 255, 255, 255, 255 ; 48 - 4F db 255, 255, 255, 255, 255, 255, 255, 255 ; 50 - 57 db 255, 255, 255, 255, '\', 255, 255, 255 ; 58 - 5F db 255, 007, 008, 255, 255, 027, 012, 255 ; 60 - 67 db 255, 255, 255, 255, 255, 255, 010, 255 ; 68 - 6F db 255, 255, 013, 255, 009, 255, 011, 255 ; 70 - 77 db 253, 255, 255, 255, 255, 255, 255, 255 ; 78 - 7F _code ParseEscapedString: endbr64 ; rdi = destination buffer; rsi = source string push rbx lea rbx, [escapetable-20h] .nextchar: lodsb test al, al jz .endsuccess cmp al, '\' je .escape stosb jmp .nextchar .endsuccess: clc pop rbx ret .escape: lodsb test al, al js .ignoreescape xlatb test al, -1 jns .store cmp al, 254 je .octal cmp al, 253 je .hex .ignoreescape: mov ax, [rsi-2] stosw jmp .nextchar .store: stosb jmp .nextchar .octal: mov eax, [rsi-1] mov ch, 1 ; Invalid octal flag before process cmp al, '0' jb .endoctal cmp al, '7' ja .endoctal xor ecx, ecx ; Valid octal + cl = number of octal chars sub al, '0' movzx edx, al inc cl shr eax, 8 cmp al, '0' jb .endoctal cmp al, '7' ja .endoctal sub al, '0' shl edx, 3 inc cl or dl, al shr eax, 8 cmp al, '0' jb .endoctal cmp al, '7' ja .endoctal sub al, '0' shl edx, 3 or dl, al inc cl .endoctal: test ch, ch jnz .ignoreescape lea rsi, [rsi+rcx-1] mov al, dl stosb jmp .nextchar .hex: mov dx, [rsi] ; supporting 2 char hex \xNN mov ch, 1 ; Set invalid flag before process cmp dx, '00' jb .endhex cmp dx, 'ff' ja .endhex sub dx, '00' cmp dl, 9 jbe @f sub dl, 7 cmp dl, 0Fh jbe @f sub dl, 20h cmp dl, 0Fh ja .endhex cmp dl, 0Ah jb .endhex @@ cmp dh, 9 jbe @f sub dh, 7 cmp dh, 0Fh jbe @f sub dh, 20h cmp dh, 0Fh ja .endhex cmp dh, 0Ah jb .endhex @@ xor ch, ch ; Valid hex escape .endhex: test ch, ch jnz .ignoreescape shl dl, 4 or dl, dh mov al, dl stosb add rsi, 2 jmp .nextchar Start: endbr64 cmp [rsp], dword 2 jne .err0 mov rdx, [stdout] mov rcx, [stderr] mov rdx, [rdx] mov rcx, [rcx] mov [stdout], rdx mov [stderr], rcx mov rbp, [rsp+16] ; argv[1] mov rdi, [rsp+16] ; argv[1] xor al, al mov ecx, -1 ; 4 GB string limit repne scasb not ecx ; String size with \0 char malloc(ecx); test rax, rax jz .err1 mov rsi, rbp mov rdi, rax mov rbp, rax call ParseEscapedString jc .err2 puts(rbp); free(rbp); exit(0); .err2: free(rbp); exit(3); .err1: perror("Parsing failed"); .err0: exit(1); The results are very interesting, because one may have the option of escaping any character inside a string token, without leaving the string, which is actually a practical way of defining a string. And, the single quoted in this case must not be parsed, staying the same, with no escape sequence support. I made the function 'ParseEscapedString' as portable as I can, the only exception is being 64 bit, but exchanging the few 64 bit registers for the 32 bit ones (to make it fit fasmg code) is a safe approach. I also avoid any exclusive 64-bit register inside the function, for the same reason. The guide for what I made available came from this link and also another materials I read online. All supported, but unicode (in my example code), because unicode can also be escaped as hex or octal sequences. Or the character can be directly typed in, as I show in the image with the emoji example. I also made this one does not modify the string if an invalid escape sequence is parsed, so, if the string has '\w'on it, '\w' will come out. Also if it has any single char that must be inside escaped sequence (e.g., '?' instead of '\?'), this char will be processed as normal text and go to the output. This is of course only a suggestion with a working example, I don't know if all of this fits the fasmg (also Tomasz and other users) philosophy for the fasmg assembler, but, in my opinion, it would be a good addition to the versatility of already versatile fasmg. I also don't know if adding this functionality will be too much challenge, because, so far, I still have not figure out the internals of fasmg assembler by reading its source code. Despite being assembly, it is such a sophisticated "machine"... ![]()
_________________ jesse6 |
||||||||||
![]() |
|
Jessé 01 Jun 2025, 16:42
Indeed, you're right.
This idea is taken straight from what is a C standard. In other words, completely aside of what is pure assembly. I also fully agree with the switchable option (default being off), because when I posted it here, I think if it is a permanent feature, it might mess up some previous code from people that already uses fasmg since then, forcing them to adapt their codes. That's not great, for sure. |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.