flat assembler
Message board for the users of flat assembler.
  
|  Index
      > Main > Skip BOM in sources | 
| Author | 
 | 
| Jin X 10 Feb 2024, 15:45 Hello Tomasz.
 Please let fasm 1 to skip BOM unicode signature in sources. I often use unicode in comments and I would like to add BOM signatures. | |||
|  10 Feb 2024, 15:45 | 
 | 
| revolution 11 Feb 2024, 06:55 You can also combine the colon with the first instruction/directive.     Code: :format elf executable mov eax, 1 int 0x80 Code: ~ hd BOM.asm 00000000 ef bb bf 3a 66 6f 72 6d 61 74 20 65 6c 66 20 65 |...:format elf e| 00000010 78 65 63 75 74 61 62 6c 65 0a 6d 6f 76 20 65 61 |xecutable.mov ea| 00000020 78 2c 20 31 0a 69 6e 74 20 30 78 38 30 |x, 1.int 0x80| 0000002d ~ fasm BOM.asm && ./BOM flat assembler version 1.73.31 (16384 kilobytes memory) 1 passes, 91 bytes. ~ | |||
|  11 Feb 2024, 06:55 | 
 | 
| macomics 11 Feb 2024, 11:11 And then what about this?
 main.asm Code: : ; BOM format ELF64 executable 3 segment executable entry $ call secondProc mov eax, 60 xor dil, dil syscall include "second.asm" second.asm Code: : ; BOM secondProc: mov edx, .length lea rsi, [.hello] push 1 pop rdi mov eax, edi syscall .hello db 'Hello world!' .length = $ - .hello Code: $ fasm -m 102400 main.asm flat assembler version 1.73.32 (102400 kilobytes memory, x64) second.asm [1]: : ; BOM processed: : error: symbol already defined. You can fix it like this Code: =0 ; BOM format ELF64 executable 3 segment executable entry $ call secondProc mov eax, 60 xor dil, dil syscall include "second.asm" Code: =0 ; BOM secondProc: mov edx, .length lea rsi, [.hello] push 1 pop rdi mov eax, edi syscall .hello db 'Hello world!' .length = $ - .hello Code: $ fasm -m 102400 main.asm flat assembler version 1.73.32 (102400 kilobytes memory, x64) 2 passes, 166 bytes. But in the absence of a BOM, we get this: Code: $ fasm -m 102400 main.asm flat assembler version 1.73.32 (102400 kilobytes memory, x64) main.asm [1]: =0 ; BOM processed: =0 error: illegal instruction. | |||
|  11 Feb 2024, 11:11 | 
 | 
| revolution 11 Feb 2024, 11:24 The fix for BOM and BOM-less sources?     Code: BOM=0 ; works whether invisible BOM is present or not ;... Last edited by revolution on 11 Feb 2024, 17:09; edited 1 time in total | |||
|  11 Feb 2024, 11:24 | 
 | 
| Furs 11 Feb 2024, 16:41 revolution wrote: The fix for BOM and BOM-less sources? | |||
|  11 Feb 2024, 16:41 | 
 | 
| revolution 11 Feb 2024, 16:54 Furs wrote: Some text editors don't display the file in UTF-8 without the BOM and they assume it's ASCII instead. Which to be honest is a sane default because ASCII files definitely don't have any BOM, so it's the most backwards compatible solution. The editor I use the most works perfectly fine to detect UTF-8 vs ASCII vs ISO-8859 without any BOM. It isn't hard, it only needs is small amount of logic. Requiring a BOM would be worse. No scratch that, it is worse. Very few apps add a BOM for UTF-8 (because it isn't needed), so then it would be an awful experience for the user to manually try to figure out what they are looking at. | |||
|  11 Feb 2024, 16:54 | 
 | 
| Jin X 12 Feb 2024, 10:24 It works. But this is a crutch solution..
 I think it's quite easy to add BOM support to compiler. | |||
|  12 Feb 2024, 10:24 | 
 | 
| revolution 12 Feb 2024, 10:53 Jin X wrote: I think it's quite easy to add BOM support to compiler. | |||
|  12 Feb 2024, 10:53 | 
 | 
| Furs 12 Feb 2024, 16:46 revolution wrote: Sure, some editors that are annoying. Last edited by Furs on 12 Feb 2024, 16:46; edited 1 time in total | |||
|  12 Feb 2024, 16:46 | 
 | 
| Jin X 12 Feb 2024, 16:46 Where to post, here? | |||
|  12 Feb 2024, 16:46 | 
 | 
| revolution 12 Feb 2024, 17:14 Furs wrote: Heuristics are never perfect. Jin X wrote: Where to post, here? | |||
|  12 Feb 2024, 17:14 | 
 | 
| Jin X 18 Feb 2024, 17:12 BOM checker is done.
 I made 2 versions: normal and extended (with support of extra BOMs) in PREPROCE.EXT.INC. All my inserts are marked as "Jin X". The main code from PREPROCE.INC: Code: mov eax,[esi] cmp ax,0FEFFh ; UTF-16 (LE) / UTF-32 (LE) je unsuppoted_bom cmp ax,0FFFEh ; UTF-16 (BE) je unsuppoted_bom cmp eax,0FFFE0000h ; UTF-32 (BE) je unsuppoted_bom cmp eax,3ABFBBEFh ; UTF-8 + colon char je bom_no_skip ; don't skip if colon trick is used (for backward compatibility) and eax,00FFFFFFh cmp eax,00BFBBEFh ; UTF-8 jne bom_no_skip add esi,3 ; skip BOM bom_no_skip: mov ebx,esi ; moved down by Jin X 
 | |||||||||||
|  18 Feb 2024, 17:12 | 
 | 
| macomics 18 Feb 2024, 17:34 This trick has already been discussed. It doesn't work for BOM in multiple files.     Code: cmp byte [esi+ecx],':' ; BOM + colon trick je bom_no_skip ; don't skip for backward compatibility add esi,ecx ; skip BOM bom_no_skip: | |||
|  18 Feb 2024, 17:34 | 
 | 
| Jin X 18 Feb 2024, 18:08 Ok, fixed and checked (for both versions). 
 | |||||||||||
|  18 Feb 2024, 18:08 | 
 | 
| < Last Thread | Next Thread > | 
| Forum Rules: 
 | 
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.